MyArxiv
Robotics
FIMD: Fast Isolated Marker Detection for UV-Based Visual Relative Localisation in Agile UAV Swarms
A novel approach for the fast onboard detection of isolated markers for visual relative localisation of multiple teammates in agile UAV swarms is introduced in this paper. As the detection forms a key component of real-time localisation systems, a three-fold innovation is presented, consisting of an optimised procedure for CPUs, a GPU shader program, and a functionally equivalent FPGA streaming architecture. For the proposed CPU and GPU solutions, the mean processing time per pixel of input camera frames was accelerated by two to three orders of magnitude compared to the \rev{unoptimised state-of-the-art approach}. For the localisation task, the proposed FPGA architecture offered the most significant overall acceleration by minimising the total delay from camera exposure to detection results. Additionally, the proposed solutions were evaluated on various 32-bit and 64-bit embedded platforms to demonstrate their efficiency, as well as their feasibility for applications using low-end UAVs and MAVs. Thus, it has become a crucial enabling technology for agile UAV swarming.
MoTVLA: A Vision-Language-Action Model with Unified Fast-Slow Reasoning
Integrating visual-language instructions into visuomotor policies is gaining momentum in robot learning for enhancing open-world generalization. Despite promising advances, existing approaches face two challenges: limited language steerability when no generated reasoning is used as a condition, or significant inference latency when reasoning is incorporated. In this work, we introduce MoTVLA, a mixture-of-transformers (MoT)-based vision-language-action (VLA) model that integrates fast-slow unified reasoning with behavior policy learning. MoTVLA preserves the general intelligence of pre-trained VLMs (serving as the generalist) for tasks such as perception, scene understanding, and semantic planning, while incorporating a domain expert, a second transformer that shares knowledge with the pretrained VLM, to generate domain-specific fast reasoning (e.g., robot motion decomposition), thereby improving policy execution efficiency. By conditioning the action expert on decomposed motion instructions, MoTVLA can learn diverse behaviors and substantially improve language steerability. Extensive evaluations across natural language processing benchmarks, robotic simulation environments, and real-world experiments confirm the superiority of MoTVLA in both fast-slow reasoning and manipulation task performance.
VO-DP: Semantic-Geometric Adaptive Diffusion Policy for Vision-Only Robotic Manipulation
In the context of imitation learning, visuomotor-based diffusion policy learning is one of the main directions in robotic manipulation. Most of these approaches rely on point clouds as observation inputs and construct scene representations through point clouds feature learning, which enables them to achieve remarkable accuracy. However, the existing literature lacks an in-depth exploration of vision-only solutions that have significant potential. In this paper, we propose a Vision-Only and single-view Diffusion Policy learning method (VO-DP) that leverages pretrained visual foundation models to achieve effective fusion of semantic and geometric features. We utilize intermediate features from VGGT incorporating semantic features from DINOv2 and geometric features from Alternating Attention blocks. Features are fused via cross-attention and spatially compressed with a CNN to form the input to the policy head. Extensive experiments demonstrate that VO-DP not only outperforms the vision-only baseline DP significantly but also exhibits distinct performance trends against the point cloud-based method DP3: in simulation tasks, VO-DP achieves an average success rate of 64.6% on par with DP3 64.0% and far higher than DP 34.8%, while in real-world tasks, it reaches 87.9%, outperforming both DP3 67.5% and DP 11.2% by a notable margin. Further robustness evaluations confirm that VO-DP remains highly stable under varying conditions including color, size, background, and lighting. Lastly, we open-source a training library for robotic manipulation. Built on Accelerate, this library supports multi-machine and multi-GPU parallel training, as well as mixed precision training. It is compatible with visuomotor policies such as DP, DP3 and VO-DP, and also supports the RoboTwin simulator.
Local Guidance for Configuration-Based Multi-Agent Pathfinding
Guidance is an emerging concept that improves the empirical performance of real-time, sub-optimal multi-agent pathfinding (MAPF) methods. It offers additional information to MAPF algorithms to mitigate congestion on a global scale by considering the collective behavior of all agents across the entire workspace. This global perspective helps reduce agents' waiting times, thereby improving overall coordination efficiency. In contrast, this study explores an alternative approach: providing local guidance in the vicinity of each agent. While such localized methods involve recomputation as agents move and may appear computationally demanding, we empirically demonstrate that supplying informative spatiotemporal cues to the planner can significantly improve solution quality without exceeding a moderate time budget. When applied to LaCAM, a leading configuration-based solver, this form of guidance establishes a new performance frontier for MAPF.
comment: 10 pages
Multiagent Systems
RADAR: A Risk-Aware Dynamic Multi-Agent Framework for LLM Safety Evaluation via Role-Specialized Collaboration
Existing safety evaluation methods for large language models (LLMs) suffer from inherent limitations, including evaluator bias and detection failures arising from model homogeneity, which collectively undermine the robustness of risk evaluation processes. This paper seeks to re-examine the risk evaluation paradigm by introducing a theoretical framework that reconstructs the underlying risk concept space. Specifically, we decompose the latent risk concept space into three mutually exclusive subspaces: the explicit risk subspace (encompassing direct violations of safety guidelines), the implicit risk subspace (capturing potential malicious content that requires contextual reasoning for identification), and the non-risk subspace. Furthermore, we propose RADAR, a multi-agent collaborative evaluation framework that leverages multi-round debate mechanisms through four specialized complementary roles and employs dynamic update mechanisms to achieve self-evolution of risk concept distributions. This approach enables comprehensive coverage of both explicit and implicit risks while mitigating evaluator bias. To validate the effectiveness of our framework, we construct an evaluation dataset comprising 800 challenging cases. Extensive experiments on our challenging testset and public benchmarks demonstrate that RADAR significantly outperforms baseline evaluation methods across multiple dimensions, including accuracy, stability, and self-evaluation risk sensitivity. Notably, RADAR achieves a 28.87% improvement in risk identification accuracy compared to the strongest baseline evaluation method.
Local Guidance for Configuration-Based Multi-Agent Pathfinding
Guidance is an emerging concept that improves the empirical performance of real-time, sub-optimal multi-agent pathfinding (MAPF) methods. It offers additional information to MAPF algorithms to mitigate congestion on a global scale by considering the collective behavior of all agents across the entire workspace. This global perspective helps reduce agents' waiting times, thereby improving overall coordination efficiency. In contrast, this study explores an alternative approach: providing local guidance in the vicinity of each agent. While such localized methods involve recomputation as agents move and may appear computationally demanding, we empirically demonstrate that supplying informative spatiotemporal cues to the planner can significantly improve solution quality without exceeding a moderate time budget. When applied to LaCAM, a leading configuration-based solver, this form of guidance establishes a new performance frontier for MAPF.
comment: 10 pages
Robotics
Semantic World Models
Planning with world models offers a powerful paradigm for robotic control. Conventional approaches train a model to predict future frames conditioned on current frames and actions, which can then be used for planning. However, the objective of predicting future pixels is often at odds with the actual planning objective; strong pixel reconstruction does not always correlate with good planning decisions. This paper posits that instead of reconstructing future frames as pixels, world models only need to predict task-relevant semantic information about the future. For such prediction the paper poses world modeling as a visual question answering problem about semantic information in future frames. This perspective allows world modeling to be approached with the same tools underlying vision language models. Thus vision language models can be trained as "semantic" world models through a supervised finetuning process on image-action-text data, enabling planning for decision-making while inheriting many of the generalization and robustness properties from the pretrained vision-language models. The paper demonstrates how such a semantic world model can be used for policy improvement on open-ended robotics tasks, leading to significant generalization improvements over typical paradigms of reconstruction-based action-conditional world modeling. Website available at https://weirdlabuw.github.io/swm.
SEA: Semantic Map Prediction for Active Exploration of Uncertain Areas
In this paper, we propose SEA, a novel approach for active robot exploration through semantic map prediction and a reinforcement learning-based hierarchical exploration policy. Unlike existing learning-based methods that rely on one-step waypoint prediction, our approach enhances the agent's long-term environmental understanding to facilitate more efficient exploration. We propose an iterative prediction-exploration framework that explicitly predicts the missing areas of the map based on current observations. The difference between the actual accumulated map and the predicted global map is then used to guide exploration. Additionally, we design a novel reward mechanism that leverages reinforcement learning to update the long-term exploration strategies, enabling us to construct an accurate semantic map within limited steps. Experimental results demonstrate that our method significantly outperforms state-of-the-art exploration strategies, achieving superior coverage ares of the global map within the same time constraints.
Learning Affordances at Inference-Time for Vision-Language-Action Models
Solving complex real-world control tasks often takes multiple tries: if we fail at first, we reflect on what went wrong, and change our strategy accordingly to avoid making the same mistake. In robotics, Vision-Language-Action models (VLAs) offer a promising path towards solving complex control tasks, but lack the ability to contextually and dynamically readjust behavior when they fail to accomplish a task. In this work, we introduce Learning from Inference-Time Execution (LITEN), which connects a VLA low-level policy to a high-level VLM that conditions on past experiences by including them in-context, allowing it to learn the affordances and capabilities of the low-level VLA. Our approach iterates between a reasoning phase that generates and executes plans for the low-level VLA, and an assessment phase that reflects on the resulting execution and draws useful conclusions to be included in future reasoning contexts. Unlike similar approaches to self-refinement in non-robotics domains, LITEN must reflect on unstructured real-world robot trajectories (e.g., raw videos), which requires structured guiderails during assessment. Our experimental results demonstrate LITEN is able to effectively learn from past experience to generate plans that use high-affordance instructions to accomplish long-horizon tasks.
comment: 7 pages and appendix
Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning NeurIPS 2025
To enable embodied agents to operate effectively over extended timeframes, it is crucial to develop models that form and access memories to stay contextualized in their environment. In the current paradigm of training transformer-based policies for embodied sequential decision-making tasks, visual inputs often overwhelm the context limits of transformers, while humans can maintain and utilize a lifetime of experience compressed as memories. Significant compression is possible in principle, as much of the input is irrelevant and can be abstracted. However, existing approaches predominantly focus on either recurrent models with fixed-size memory or transformers with full-context reliance. In this work, we propose Memo, a transformer-based architecture and training recipe for reinforcement learning (RL) on memory-intensive, long-horizon tasks. Memo incorporates the creation and retrieval of memory by interleaving periodic summarization tokens with the inputs of a model during training. We demonstrate Memo's effectiveness on a gridworld meta-RL benchmark and a multi-object navigation task in photo-realistic indoor settings. Memo outperforms naive long-context transformer baselines while being more compute and storage efficient. Additionally, Memo generalizes better to longer contexts at inference time and remains robust in streaming settings, where historical context must be truncated to fit inference constraints.
comment: Accepted for Spotlight Presentation at NeurIPS 2025
Fast Marker Detection for UV-Based Visual Relative Localisation in Agile UAV Swarms
A novel approach for the fast onboard detection of isolated markers for visual relative localisation of multiple teammates in agile UAV swarms is introduced in this paper. As the detection forms a key component of real-time localisation systems, a three-fold innovation is presented, consisting of an optimised procedure for CPUs, a GPU shader program, and a functionally equivalent FPGA streaming architecture. For the proposed CPU and GPU solutions, the mean processing time per pixel of input camera frames was accelerated by two to three orders of magnitude compared to the state of the art. For the localisation task, the proposed FPGA architecture offered the most significant overall acceleration by minimising the total delay from camera exposure to detection results. Additionally, the proposed solutions were evaluated on various 32-bit and 64-bit embedded platforms to demonstrate their efficiency, as well as their feasibility for applications using low-end UAVs and MAVs. Thus, it has become a crucial enabling technology for agile UAV swarming.
LaViRA: Language-Vision-Robot Actions Translation for Zero-Shot Vision Language Navigation in Continuous Environments
Zero-shot Vision-and-Language Navigation in Continuous Environments (VLN-CE) requires an agent to navigate unseen environments based on natural language instructions without any prior training. Current methods face a critical trade-off: either rely on environment-specific waypoint predictors that limit scene generalization, or underutilize the reasoning capabilities of large models during navigation. We introduce LaViRA, a simple yet effective zero-shot framework that addresses this dilemma by decomposing action into a coarse-to-fine hierarchy: Language Action for high-level planning, Vision Action for perceptual grounding, and Robot Action for robust navigation. This modular decomposition allows us to leverage the distinct strengths of different scales of Multimodal Large Language Models (MLLMs) at each stage, creating a system that is powerful in its reasoning, grounding and practical control. LaViRA significantly outperforms existing state-of-the-art methods on the VLN-CE benchmark, demonstrating superior generalization capabilities in unseen environments, while maintaining transparency and efficiency for real-world deployment.
From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction
Despite remarkable progress in driving world models, their potential for autonomous systems remains largely untapped: the world models are mostly learned for world simulation and decoupled from trajectory planning. While recent efforts aim to unify world modeling and planning in a single framework, the synergistic facilitation mechanism of world modeling for planning still requires further exploration. In this work, we introduce a new driving paradigm named Policy World Model (PWM), which not only integrates world modeling and trajectory planning within a unified architecture, but is also able to benefit planning using the learned world knowledge through the proposed action-free future state forecasting scheme. Through collaborative state-action prediction, PWM can mimic the human-like anticipatory perception, yielding more reliable planning performance. To facilitate the efficiency of video forecasting, we further introduce a dynamically enhanced parallel token generation mechanism, equipped with a context-guided tokenizer and an adaptive dynamic focal loss. Despite utilizing only front camera input, our method matches or exceeds state-of-the-art approaches that rely on multi-view and multi-modal inputs. Code and model weights will be released at https://github.com/6550Zhao/Policy-World-Model.
comment: Accepted by NuerIPS 2025 (Poster)
Optimizing Prosthetic Wrist Movement: A Model Predictive Control Approach
The integration of advanced control strategies into prosthetic hands is essential to improve their adaptability and performance. In this study, we present an implementation of a Model Predictive Control (MPC) strategy to regulate the motions of a soft continuum wrist section attached to a tendon-driven prosthetic hand with less computational effort. MPC plays a crucial role in enhancing the functionality and responsiveness of prosthetic hands. By leveraging predictive modeling, this approach enables precise movement adjustments while accounting for dynamic user interactions. This advanced control strategy allows for the anticipation of future movements and adjustments based on the current state of the prosthetic device and the intentions of the user. Kinematic and dynamic modelings are performed using Euler-Bernoulli beam and Lagrange methods respectively. Through simulation and experimental validations, we demonstrate the effectiveness of MPC in optimizing wrist articulation and user control. Our findings suggest that this technique significantly improves the prosthetic hand dexterity, making movements more natural and intuitive. This research contributes to the field of robotics and biomedical engineering by offering a promising direction for intelligent prosthetic systems.
comment: International Conference on Social Robotics + AI 2025
Using Non-Expert Data to Robustify Imitation Learning via Offline Reinforcement Learning
Imitation learning has proven effective for training robots to perform complex tasks from expert human demonstrations. However, it remains limited by its reliance on high-quality, task-specific data, restricting adaptability to the diverse range of real-world object configurations and scenarios. In contrast, non-expert data -- such as play data, suboptimal demonstrations, partial task completions, or rollouts from suboptimal policies -- can offer broader coverage and lower collection costs. However, conventional imitation learning approaches fail to utilize this data effectively. To address these challenges, we posit that with right design decisions, offline reinforcement learning can be used as a tool to harness non-expert data to enhance the performance of imitation learning policies. We show that while standard offline RL approaches can be ineffective at actually leveraging non-expert data under the sparse data coverage settings typically encountered in the real world, simple algorithmic modifications can allow for the utilization of this data, without significant additional assumptions. Our approach shows that broadening the support of the policy distribution can allow imitation algorithms augmented by offline RL to solve tasks robustly, showing considerably enhanced recovery and generalization behavior. In manipulation tasks, these innovations significantly increase the range of initial conditions where learned policies are successful when non-expert data is incorporated. Moreover, we show that these methods are able to leverage all collected data, including partial or suboptimal demonstrations, to bolster task-directed policy performance. This underscores the importance of algorithmic techniques for using non-expert data for robust policy learning in robotics.
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
Training Vision-Language-Action (VLA) models for generalist robots typically requires large-scale real-world robot data, which is expensive and time-consuming to collect. The inefficiency of physical data collection severely limits the scalability, and generalization capacity of current VLA systems. To address this challenge, we introduce GigaBrain-0, a novel VLA foundation model empowered by world model-generated data (e.g., video generation, real2real transfer, human transfer, view transfer, sim2real transfer data). By leveraging world models to generate diverse data at scale, GigaBrain-0 significantly reduces reliance on real robot data while improving cross-task generalization. Our approach further improves policy robustness through RGBD input modeling and embodied Chain-of-Thought (CoT) supervision, enabling the model to reason about spatial geometry, object states, and long-horizon dependencies during task execution. This leads to substantial gains in real-world performance on dexterous, long-horizon, and mobile manipulation tasks. Extensive experiments demonstrate that GigaBrain-0 achieves superior generalization across variations in appearances (e.g., textures, colors), object placements, and camera viewpoints. Additionally, we present GigaBrain-0-Small, an optimized lightweight variant designed to run efficiently on devices such as the NVIDIA Jetson AGX Orin.
comment: https://gigabrain0.github.io/
Risk Assessment of an Autonomous Underwater Snake Robot in Confined Operations
The growing interest in ocean discovery imposes a need for inspection and intervention in confined and demanding environments. Eely's slender shape, in addition to its ability to change its body configurations, makes articulated underwater robots an adequate option for such environments. However, operation of Eely in such environments imposes demanding requirements on the system, as it must deal with uncertain and unstructured environments, extreme environmental conditions, and reduced navigational capabilities. This paper proposes a Bayesian approach to assess the risks of losing Eely during two mission scenarios. The goal of this work is to improve Eely's performance and the likelihood of mission success. Sensitivity analysis results are presented in order to demonstrate the causes having the highest impact on losing Eely.
comment: 9 pages, 6 figures, Accepted for publication in OCEANS 2023 - Limerick
A Radius of Robust Feasibility Approach to Directional Sensors in Uncertain Terrain
A sensor has the ability to probe its surroundings. However, uncertainties in its exact location can significantly compromise its sensing performance. The radius of robust feasibility defines the maximum range within which robust feasibility is ensured. This work introduces a novel approach integrating it with the directional sensor networks to enhance coverage using a distributed greedy algorithm. In particular, we provide an exact formula for the radius of robust feasibility of sensors in a directional sensor network. The proposed model strategically orients the sensors in regions with high coverage potential, accounting for robustness in the face of uncertainty. We analyze the algorithm's adaptability in dynamic environments, demonstrating its ability to enhance efficiency and robustness. Experimental results validate its efficacy in maximizing coverage and optimizing sensor orientations, highlighting its practical advantages for real-world scenarios.
Using Temperature Sampling to Effectively Train Robot Learning Policies on Imbalanced Datasets
Increasingly large datasets of robot actions and sensory observations are being collected to train ever-larger neural networks. These datasets are collected based on tasks and while these tasks may be distinct in their descriptions, many involve very similar physical action sequences (e.g., 'pick up an apple' versus 'pick up an orange'). As a result, many datasets of robotic tasks are substantially imbalanced in terms of the physical robotic actions they represent. In this work, we propose a simple sampling strategy for policy training that mitigates this imbalance. Our method requires only a few lines of code to integrate into existing codebases and improves generalization. We evaluate our method in both pre-training small models and fine-tuning large foundational models. Our results show substantial improvements on low-resource tasks compared to prior state-of-the-art methods, without degrading performance on high-resource tasks. This enables more effective use of model capacity for multi-task policies. We also further validate our approach in a real-world setup on a Franka Panda robot arm across a diverse set of tasks.
ProTerrain: Probabilistic Physics-Informed Rough Terrain World Modeling ICRA
Uncertainty-aware robot motion prediction is crucial for downstream traversability estimation and safe autonomous navigation in unstructured, off-road environments, where terrain is heterogeneous and perceptual uncertainty is high. Most existing methods assume deterministic or spatially independent terrain uncertainties, ignoring the inherent local correlations of 3D spatial data and often producing unreliable predictions. In this work, we introduce an efficient probabilistic framework that explicitly models spatially correlated aleatoric uncertainty over terrain parameters as a probabilistic world model and propagates this uncertainty through a differentiable physics engine for probabilistic trajectory forecasting. By leveraging structured convolutional operators, our approach provides high-resolution multivariate predictions at manageable computational cost. Experimental evaluation on a publicly available dataset shows significantly improved uncertainty estimation and trajectory prediction accuracy over aleatoric uncertainty estimation baselines.
comment: This paper is submitted to IEEE International Conference on Robotics and Automation (ICRA) 2026
Imitation Learning Policy based on Multi-Step Consistent Integration Shortcut Model
The wide application of flow-matching methods has greatly promoted the development of robot imitation learning. However, these methods all face the problem of high inference time. To address this issue, researchers have proposed distillation methods and consistency methods, but the performance of these methods still struggles to compete with that of the original diffusion models and flow-matching models. In this article, we propose a one-step shortcut method with multi-step integration for robot imitation learning. To balance the inference speed and performance, we extend the multi-step consistency loss on the basis of the shortcut model, split the one-step loss into multi-step losses, and improve the performance of one-step inference. Secondly, to solve the problem of unstable optimization of the multi-step loss and the original flow-matching loss, we propose an adaptive gradient allocation method to enhance the stability of the learning process. Finally, we evaluate the proposed method in two simulation benchmarks and five real-world environment tasks. The experimental results verify the effectiveness of the proposed algorithm.
ConvXformer: Differentially Private Hybrid ConvNeXt-Transformer for Inertial Navigation
Data-driven inertial sequence learning has revolutionized navigation in GPS-denied environments, offering superior odometric resolution compared to traditional Bayesian methods. However, deep learning-based inertial tracking systems remain vulnerable to privacy breaches that can expose sensitive training data. \hl{Existing differential privacy solutions often compromise model performance by introducing excessive noise, particularly in high-frequency inertial measurements.} In this article, we propose ConvXformer, a hybrid architecture that fuses ConvNeXt blocks with Transformer encoders in a hierarchical structure for robust inertial navigation. We propose an efficient differential privacy mechanism incorporating adaptive gradient clipping and gradient-aligned noise injection (GANI) to protect sensitive information while ensuring model performance. Our framework leverages truncated singular value decomposition for gradient processing, enabling precise control over the privacy-utility trade-off. Comprehensive performance evaluations on benchmark datasets (OxIOD, RIDI, RoNIN) demonstrate that ConvXformer surpasses state-of-the-art methods, achieving more than 40% improvement in positioning accuracy while ensuring $(\epsilon,\delta)$-differential privacy guarantees. To validate real-world performance, we introduce the Mech-IO dataset, collected from the mechanical engineering building at KAIST, where intense magnetic fields from industrial equipment induce significant sensor perturbations. This demonstrated robustness under severe environmental distortions makes our framework well-suited for secure and intelligent navigation in cyber-physical systems.
comment: 14 pages, 8 figures, 3 tables
TARMAC: A Taxonomy for Robot Manipulation in Chemistry
Chemistry laboratory automation aims to increase throughput, reproducibility, and safety, yet many existing systems still depend on frequent human intervention. Advances in robotics have reduced this dependency, but without a structured representation of the required skills, autonomy remains limited to bespoke, task-specific solutions with little capacity to transfer beyond their initial design. Current experiment abstractions typically describe protocol-level steps without specifying the robotic actions needed to execute them. This highlights the lack of a systematic account of the manipulation skills required for robots in chemistry laboratories. To address this gap, we introduce TARMAC - a Taxonomy for Robot Manipulation in Chemistry - a domain-specific framework that defines and organizes the core manipulations needed in laboratory practice. Based on annotated teaching-lab demonstrations and supported by experimental validation, TARMAC categorizes actions according to their functional role and physical execution requirements. Beyond serving as a descriptive vocabulary, TARMAC can be instantiated as robot-executable primitives and composed into higher-level macros, enabling skill reuse and supporting scalable integration into long-horizon workflows. These contributions provide a structured foundation for more flexible and autonomous laboratory automation. More information is available at https://tarmac-paper.github.io/
Hierarchical DLO Routing with Reinforcement Learning and In-Context Vision-language Models
Long-horizon routing tasks of deformable linear objects (DLOs), such as cables and ropes, are common in industrial assembly lines and everyday life. These tasks are particularly challenging because they require robots to manipulate DLO with long-horizon planning and reliable skill execution. Successfully completing such tasks demands adapting to their nonlinear dynamics, decomposing abstract routing goals, and generating multi-step plans composed of multiple skills, all of which require accurate high-level reasoning during execution. In this paper, we propose a fully autonomous hierarchical framework for solving challenging DLO routing tasks. Given an implicit or explicit routing goal expressed in language, our framework leverages vision-language models~(VLMs) for in-context high-level reasoning to synthesize feasible plans, which are then executed by low-level skills trained via reinforcement learning. To improve robustness in long horizons, we further introduce a failure recovery mechanism that reorients the DLO into insertion-feasible states. Our approach generalizes to diverse scenes involving object attributes, spatial descriptions, as well as implicit language commands. It outperforms the next best baseline method by nearly 50% and achieves an overall success rate of 92.5% across long-horizon routing scenarios.
comment: 8 pages, 6 figures, 3 tables
Background Fades, Foreground Leads: Curriculum-Guided Background Pruning for Efficient Foreground-Centric Collaborative Perception
Collaborative perception enhances the reliability and spatial coverage of autonomous vehicles by sharing complementary information across vehicles, offering a promising solution to long-tail scenarios that challenge single-vehicle perception. However, the bandwidth constraints of vehicular networks make transmitting the entire feature map impractical. Recent methods, therefore, adopt a foreground-centric paradigm, transmitting only predicted foreground-region features while discarding the background, which encodes essential context. We propose FadeLead, a foreground-centric framework that overcomes this limitation by learning to encapsulate background context into compact foreground features during training. At the core of our design is a curricular learning strategy that leverages background cues early on but progressively prunes them away, forcing the model to internalize context into foreground representations without transmitting background itself. Extensive experiments on both simulated and real-world benchmarks show that FadeLead outperforms prior methods under different bandwidth settings, underscoring the effectiveness of context-enriched foreground sharing.
GRASPLAT: Enabling dexterous grasping through novel view synthesis IROS 2025
Achieving dexterous robotic grasping with multi-fingered hands remains a significant challenge. While existing methods rely on complete 3D scans to predict grasp poses, these approaches face limitations due to the difficulty of acquiring high-quality 3D data in real-world scenarios. In this paper, we introduce GRASPLAT, a novel grasping framework that leverages consistent 3D information while being trained solely on RGB images. Our key insight is that by synthesizing physically plausible images of a hand grasping an object, we can regress the corresponding hand joints for a successful grasp. To achieve this, we utilize 3D Gaussian Splatting to generate high-fidelity novel views of real hand-object interactions, enabling end-to-end training with RGB data. Unlike prior methods, our approach incorporates a photometric loss that refines grasp predictions by minimizing discrepancies between rendered and real images. We conduct extensive experiments on both synthetic and real-world grasping datasets, demonstrating that GRASPLAT improves grasp success rates up to 36.9% over existing image-based methods. Project page: https://mbortolon97.github.io/grasplat/
comment: Accepted IROS 2025
MoTVLA: A Vision-Language-Action Model with Unified Fast-Slow Reasoning
Integrating visual-language instructions into visuomotor policies is gaining momentum in robot learning for enhancing open-world generalization. Despite promising advances, existing approaches face two challenges: limited language steerability when no generated reasoning is used as a condition, or significant inference latency when reasoning is incorporated.In this work, we introduce MoTVLA, a mixture-of-transformers (MoT)-based vision-language-action (VLA) model that integrates fast-slow unified reasoning with behavior policy learning. MoTVLA preserves the general intelligence of pre-trained VLMs (serving as the generalist) for tasks such as perception, scene understanding, and semantic planning, while incorporating a domain expert, a second transformer that shares knowledge with the pretrained VLM, to generate domain-specific fast reasoning (e.g., robot motion decomposition), thereby improving policy execution efficiency. By conditioning the action expert on decomposed motion instructions, MoTVLA can learn diverse behaviors and substantially improve language steerability. Extensive evaluations across natural language processing benchmarks, robotic simulation environments, and real-world experiments confirm the superiority of MoTVLA in both fast-slow reasoning and manipulation task performance.
OmniVIC: A Self-Improving Variable Impedance Controller with Vision-Language In-Context Learning for Safe Robotic Manipulation
We present OmniVIC, a universal variable impedance controller (VIC) enhanced by a vision language model (VLM), which improves safety and adaptation in any contact-rich robotic manipulation task to enhance safe physical interaction. Traditional VIC have shown advantages when the robot physically interacts with the environment, but lack generalization in unseen, complex, and unstructured safe interactions in universal task scenarios involving contact or uncertainty. To this end, the proposed OmniVIC interprets task context derived reasoning from images and natural language and generates adaptive impedance parameters for a VIC controller. Specifically, the core of OmniVIC is a self-improving Retrieval-Augmented Generation(RAG) and in-context learning (ICL), where RAG retrieves relevant prior experiences from a structured memory bank to inform the controller about similar past tasks, and ICL leverages these retrieved examples and the prompt of current task to query the VLM for generating context-aware and adaptive impedance parameters for the current manipulation scenario. Therefore, a self-improved RAG and ICL guarantee OmniVIC works in universal task scenarios. The impedance parameter regulation is further informed by real-time force/torque feedback to ensure interaction forces remain within safe thresholds. We demonstrate that our method outperforms baselines on a suite of complex contact-rich tasks, both in simulation and on real-world robotic tasks, with improved success rates and reduced force violations. OmniVIC takes a step towards bridging high-level semantic reasoning and low-level compliant control, enabling safer and more generalizable manipulation. Overall, the average success rate increases from 27% (baseline) to 61.4% (OmniVIC).
comment: Code, video and RAG dataset are available at \url{https://sites.google.com/view/omni-vic}
Efficient Vision-Language-Action Models for Embodied Manipulation: A Systematic Survey
Vision-Language-Action (VLA) models extend vision-language models to embodied control by mapping natural-language instructions and visual observations to robot actions. Despite their capabilities, VLA systems face significant challenges due to their massive computational and memory demands, which conflict with the constraints of edge platforms such as on-board mobile manipulators that require real-time performance. Addressing this tension has become a central focus of recent research. In light of the growing efforts toward more efficient and scalable VLA systems, this survey provides a systematic review of approaches for improving VLA efficiency, with an emphasis on reducing latency, memory footprint, and training and inference costs. We categorize existing solutions into four dimensions: model architecture, perception feature, action generation, and training/inference strategies, summarizing representative techniques within each category. Finally, we discuss future trends and open challenges, highlighting directions for advancing efficient embodied intelligence.
What Foundation Models can Bring for Robot Learning in Manipulation : A Survey
The realization of universal robots is an ultimate goal of researchers. However, a key hurdle in achieving this goal lies in the robots' ability to manipulate objects in their unstructured surrounding environments according to different tasks. The learning-based approach is considered an effective way to address generalization. The impressive performance of foundation models in the fields of computer vision and natural language suggests the potential of embedding foundation models into manipulation tasks as a viable path toward achieving general manipulation capability. However, we believe achieving general manipulation capability requires an overarching framework akin to auto driving. This framework should encompass multiple functional modules, with different foundation models assuming distinct roles in facilitating general manipulation capability. This survey focuses on the contributions of foundation models to robot learning for manipulation. We propose a comprehensive framework and detail how foundation models can address challenges in each module of the framework. What's more, we examine current approaches, outline challenges, suggest future research directions, and identify potential risks associated with integrating foundation models into this domain.
AttentionSwarm: Reinforcement Learning with Attention Control Barier Function for Crazyflie Drones in Dynamic Environments
We introduce AttentionSwarm, a novel benchmark designed to evaluate safe and efficient swarm control in a dynamic drone racing scenario. Central to our approach is the Attention Model-Based Control Barrier Function (CBF) framework, which integrates attention mechanisms with safety-critical control theory to enable real-time collision avoidance and trajectory optimization. This framework dynamically prioritizes critical obstacles and agents in the swarm's vicinity using attention weights, while CBFs formally guarantee safety by enforcing collision-free constraints. The AttentionSwarm algorithm was developed and evaluated using a swarm of Crazyflie 2.1 micro quadrotors, which were tested indoors with the Vicon motion capture system to ensure precise localization and control. Experimental results show that our system achieves a 95-100% collision-free navigation rate in a dynamic multi-agent drone racing environment, underscoring its effectiveness and robustness in real-world scenarios. This work offers a promising foundation for safe, high-speed multi-robot applications in logistics, inspection, and racing.
comment: 6 pages, 6 figures
Action Tokenizer Matters in In-Context Imitation Learning IROS 2025
In-context imitation learning (ICIL) is a new paradigm that enables robots to generalize from demonstrations to unseen tasks without retraining. A well-structured action representation is the key to capturing demonstration information effectively, yet action tokenizer (the process of discretizing and encoding actions) remains largely unexplored in ICIL. In this work, we first systematically evaluate existing action tokenizer methods in ICIL and reveal a critical limitation: while they effectively encode action trajectories, they fail to preserve temporal smoothness, which is crucial for stable robotic execution. To address this, we propose LipVQ-VAE, a variational autoencoder that enforces the Lipschitz condition in the latent action space via weight normalization. By propagating smoothness constraints from raw action inputs to a quantized latent codebook, LipVQ-VAE generates more stable and smoother actions. When integrating into ICIL, LipVQ-VAE improves performance by more than 5.3% in high-fidelity simulators, with real-world experiments confirming its ability to produce smoother, more reliable trajectories. Code and checkpoints are available at https://action-tokenizer-matters.github.io/
comment: IROS 2025
RoboGPT-R1: Enhancing Robot Planning with Reinforcement Learning
Improving the reasoning capabilities of embodied agents is crucial for robots to complete complex human instructions in long-view manipulation tasks successfully. Despite the success of large language models and vision language models based on Supervised Fine-Tuning (SFT) in planning tasks, they continue facing challenges in performing long-horizon manipulation tasks in complex real-world environments, owing to their restricted common sense and reasoning capabilities. Considering that aligning general-purpose vision language models to robotic planning tasks via supervised fine-tuning suffers from poor generalization and insufficient physical understanding, we propose RoboGPT-R1, a two-stage fine-tuning framework for embodied planning. In this framework, supervised training acquires foundational knowledge through expert sequences, followed by RL to address the model's shortcomings in visual-spatial understanding and reasoning. To achieve physical understanding and action sequence consistency in multi-step reasoning tasks, we design a rule-based reward function that simultaneously considers long-horizon performance and action constraint in the environment. The reasoning model, trained on Qwen2.5-VL-3B, significantly outperforms the larger-scale model, GPT-4o-mini, by 21.33% and surpasses other work trained on Qwen2.5-VL-7B by 20.33% on the EmbodiedBench benchmark.
RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Interactive Environmental Learning in Physical Embodied Systems
Embodied agents face persistent challenges in real-world environments, including partial observability, limited spatial reasoning, and high-latency multi-memory integration. We present RoboMemory, a brain-inspired framework that unifies Spatial, Temporal, Episodic, and Semantic memory under a parallelized architecture for efficient long-horizon planning and interactive environmental learning. A dynamic spatial knowledge graph (KG) ensures scalable and consistent memory updates, while a closed-loop planner with a critic module supports adaptive decision-making in dynamic settings. Experiments on EmbodiedBench show that RoboMemory, built on Qwen2.5-VL-72B-Ins, improves average success rates by 25% over its baseline and exceeds the closed-source state-of-the-art (SOTA) Gemini-1.5-Pro by 3%. Real-world trials further confirm its capacity for cumulative learning, with performance improving across repeated tasks. These results highlight RoboMemory as a scalable foundation for memory-augmented embodied intelligence, bridging the gap between cognitive neuroscience and robotic autonomy.
Coordinated Strategies in Realistic Air Combat by Hierarchical Multi-Agent Reinforcement Learning
Achieving mission objectives in a realistic simulation of aerial combat is highly challenging due to imperfect situational awareness and nonlinear flight dynamics. In this work, we introduce a novel 3D multi-agent air combat environment and a Hierarchical Multi-Agent Reinforcement Learning framework to tackle these challenges. Our approach combines heterogeneous agent dynamics, curriculum learning, league-play, and a newly adapted training algorithm. To this end, the decision-making process is organized into two abstraction levels: low-level policies learn precise control maneuvers, while high-level policies issue tactical commands based on mission objectives. Empirical results show that our hierarchical approach improves both learning efficiency and combat performance in complex dogfight scenarios.
comment: 2025 IEEE International Conference on Agentic AI (ICA)
Open-World Drone Active Tracking with Goal-Centered Rewards NeurIPS 2025
Drone Visual Active Tracking aims to autonomously follow a target object by controlling the motion system based on visual observations, providing a more practical solution for effective tracking in dynamic environments. However, accurate Drone Visual Active Tracking using reinforcement learning remains challenging due to the absence of a unified benchmark and the complexity of open-world environments with frequent interference. To address these issues, we pioneer a systematic solution. First, we propose DAT, the first open-world drone active air-to-ground tracking benchmark. It encompasses 24 city-scale scenes, featuring targets with human-like behaviors and high-fidelity dynamics simulation. DAT also provides a digital twin tool for unlimited scene generation. Additionally, we propose a novel reinforcement learning method called GC-VAT, which aims to improve the performance of drone tracking targets in complex scenarios. Specifically, we design a Goal-Centered Reward to provide precise feedback across viewpoints to the agent, enabling it to expand perception and movement range through unrestricted perspectives. Inspired by curriculum learning, we introduce a Curriculum-Based Training strategy that progressively enhances the tracking performance in complex environments. Besides, experiments on simulator and real-world images demonstrate the superior performance of GC-VAT, achieving a Tracking Success Rate of approximately 72% on the simulator. The benchmark and code are available at https://github.com/SHWplus/DAT_Benchmark.
comment: NeurIPS 2025
On the Importance of Tactile Sensing for Imitation Learning: A Case Study on Robotic Match Lighting
The field of robotic manipulation has advanced significantly in recent years. At the sensing level, several novel tactile sensors have been developed, capable of providing accurate contact information. On a methodological level, learning from demonstrations has proven an efficient paradigm to obtain performant robotic manipulation policies. The combination of both holds the promise to extract crucial contact-related information from the demonstration data and actively exploit it during policy rollouts. However, this integration has so far been underexplored, most notably in dynamic, contact-rich manipulation tasks where precision and reactivity are essential. This work therefore proposes a multimodal, visuotactile imitation learning framework that integrates a modular transformer architecture with a flow-based generative model, enabling efficient learning of fast and dexterous manipulation policies. We evaluate our framework on the dynamic, contact-rich task of robotic match lighting - a task in which tactile feedback influences human manipulation performance. The experimental results highlight the effectiveness of our approach and show that adding tactile information improves policy performance, thereby underlining their combined potential for learning dynamic manipulation from few demonstrations. Project website: https://sites.google.com/view/tactile-il .
ComDrive: Comfort-Oriented End-to-End Autonomous Driving IROS 2025
We propose ComDrive: the first comfort-oriented end-to-end autonomous driving system to generate temporally consistent and comfortable trajectories. Recent studies have demonstrated that imitation learning-based planners and learning-based trajectory scorers can effectively generate and select safety trajectories that closely mimic expert demonstrations. However, such trajectory planners and scorers face the challenge of generating temporally inconsistent and uncomfortable trajectories. To address these issues, ComDrive first extracts 3D spatial representations through sparse perception, which then serves as conditional inputs. These inputs are used by a Conditional Denoising Diffusion Probabilistic Model (DDPM)-based motion planner to generate temporally consistent multi-modal trajectories. A dual-stream adaptive trajectory scorer subsequently selects the most comfortable trajectory from these candidates to control the vehicle. Experiments demonstrate that ComDrive achieves state-of-the-art performance in both comfort and safety, outperforming UniAD by 17% in driving comfort and reducing collision rates by 25% compared to SparseDrive. More results are available on our project page: https://jmwang0117.github.io/ComDrive/.
comment: IROS 2025
Flow with the Force Field: Learning 3D Compliant Flow Matching Policies from Force and Demonstration-Guided Simulation Data
While visuomotor policy has made advancements in recent years, contact-rich tasks still remain a challenge. Robotic manipulation tasks that require continuous contact demand explicit handling of compliance and force. However, most visuomotor policies ignore compliance, overlooking the importance of physical interaction with the real world, often leading to excessive contact forces or fragile behavior under uncertainty. Introducing force information into vision-based imitation learning could help improve awareness of contacts, but could also require a lot of data to perform well. One remedy for data scarcity is to generate data in simulation, yet computationally taxing processes are required to generate data good enough not to suffer from the Sim2Real gap. In this work, we introduce a framework for generating force-informed data in simulation, instantiated by a single human demonstration, and show how coupling with a compliant policy improves the performance of a visuomotor policy learned from synthetic data. We validate our approach on real-robot tasks, including non-prehensile block flipping and a bi-manual object moving, where the learned policy exhibits reliable contact maintenance and adaptation to novel conditions. Project Website: https://flow-with-the-force-field.github.io/webpage/
VO-DP: Semantic-Geometric Adaptive Diffusion Policy for Vision-Only Robotic Manipulation
In the context of imitation learning, visuomotor-based diffusion policy learning is one of the main directions in robotic manipulation. Most of these approaches rely on point clouds as observation inputs and construct scene representations through point clouds feature learning, which enables them to achieve remarkable accuracy. However, the existing literature lacks an in-depth exploration of vision-only solutions that have significant potential. In this paper, we propose a Vision-Only and single-view Diffusion Policy learning method (VO-DP) that leverages pretrained visual foundation models to achieve effective fusion of semantic and geometric features. We utilize intermediate features from VGGT incorporating semantic features from DINOv2 and geometric features from Alternating Attention blocks. Features are fused via cross-attention and spatially compressed with a CNN to form the input to the policy head. Extensive experiments demonstrate that VO-DP not only outperforms the vision-only baseline DP significantly but also exhibits distinct performance trends against the point cloud-based method DP3: in simulation tasks, VO-DP achieves an average success rate of 64.6% on par with DP3 64.0% and far higher than DP 34.8%, while in real-world tasks, it reaches 87.9%, outperforming both DP3 67.5% and DP 11.2% by a notable margin. Further robustness evaluations confirm that VO-DP remains highly stable under varying conditions including color, size, background, and lighting. Lastly, we open-source a training library for robotic manipulation. Built on Accelerate, this library supports multi-machine and multi-GPU parallel training, as well as mixed precision training. It is compatible with visuomotor policies such as DP, DP3 and VO-DP, and also supports the RoboTwin simulator.
Multiagent Systems
Toward Agentic Software Engineering Beyond Code: Framing Vision, Values, and Vocabulary
Agentic AI is poised to usher in a seismic paradigm shift in Software Engineering (SE). As technologists rush head-along to make agentic AI a reality, SE researchers are driven to establish agentic SE as a research area. While early visions of agentic SE are primarily focused on code-related activities, early empirical evidence calls for a consideration of a range of socio-technical concerns to make it work in practice. This paper contributes to the emerging community vision by: (a) recommending an expansion of its scope beyond code, toward a 'whole of process' vision, grounding it in SE foundations and evolution and emerging agentic SE frameworks, (b) proposing a preliminary set of values and principles to guide efforts, and (c) sharing guidance on designing/using well-defined vocabulary for agentic SE. It is hoped that these ideas will encourage community collaborations and steer the SE community towards laying strong foundations of agentic SE so its not only inevitable but also deliberate and desirable in the long run.
comment: 5 pages
HSCodeComp: A Realistic and Expert-level Benchmark for Deep Search Agents in Hierarchical Rule Application
Effective deep search agents must not only access open-domain and domain-specific knowledge but also apply complex rules-such as legal clauses, medical manuals and tariff rules. These rules often feature vague boundaries and implicit logic relationships, making precise application challenging for agents. However, this critical capability is largely overlooked by current agent benchmarks. To fill this gap, we introduce HSCodeComp, the first realistic, expert-level e-commerce benchmark designed to evaluate deep search agents in hierarchical rule application. In this task, the deep reasoning process of agents is guided by these rules to predict 10-digit Harmonized System Code (HSCode) of products with noisy but realistic descriptions. These codes, established by the World Customs Organization, are vital for global supply chain efficiency. Built from real-world data collected from large-scale e-commerce platforms, our proposed HSCodeComp comprises 632 product entries spanning diverse product categories, with these HSCodes annotated by several human experts. Extensive experimental results on several state-of-the-art LLMs, open-source, and closed-source agents reveal a huge performance gap: best agent achieves only 46.8% 10-digit accuracy, far below human experts at 95.0%. Besides, detailed analysis demonstrates the challenges of hierarchical rule application, and test-time scaling fails to improve performance further.
Polynomial-time Configuration Generator for Connected Unlabeled Multi-Agent Pathfinding
We consider Connected Unlabeled Multi-Agent Pathfinding (CUMAPF), a variant of MAPF where the agents must maintain connectivity at all times. This problem is fundamental to swarm robotics applications like self-reconfiguration and marching, where standard MAPF is insufficient as it does not guarantee the required connectivity between agents. While unlabeled MAPF is tractable in optimization, CUMAPF is NP-hard even on highly restricted graph classes. To tackle this challenge, we propose PULL, a complete and polynomial-time algorithm with a simple design. It is based on a rule-based one-step function that computes a subsequent configuration that preserves connectivity and advances towards the target configuration. PULL is lightweight, and runs in $O(n^2)$ time per step in 2D grid, where $n$ is the number of agents. Our experiments further demonstrate its practical performance: PULL finds competitive solution qualities against trivial solutions for hundreds of agents, in randomly generated instances. Furthermore, we develop an eventually optimal solver that integrates PULL into an existing search-based MAPF algorithm, providing a valuable tool for small-scale instances.
Modeling realistic human behavior using generative agents in a multimodal transport system: Software architecture and Application to Toulouse
Modeling realistic human behaviour to understand people's mode choices in order to propose personalised mobility solutions remains challenging. This paper presents an architecture for modeling realistic human mobility behavior in complex multimodal transport systems, demonstrated through a case study in Toulouse, France. We apply Large Language Models (LLMs) within an agent-based simulation to capture decision-making in a real urban setting. The framework integrates the GAMA simulation platform with an LLM-based generative agent, along with General Transit Feed Specification (GTFS) data for public transport, and OpenTripPlanner for multimodal routing. GAMA platform models the interactive transport environment, providing visualization and dynamic agent interactions while eliminating the need to construct the simulation environment from scratch. This design enables a stronger focus on developing generative agents and evaluating their performance in transport decision-making processes. Over a simulated month, results show that agents not only make context-aware transport decisions but also form habits over time. We conclude that combining LLMs with agent-based simulation offers a promising direction for advancing intelligent transportation systems and personalised multimodal mobility solutions. We also discuss some limitations of this approach and outline future work on scaling to larger regions, integrating real-time data, and refining memory models.
Monitoring LLM-based Multi-Agent Systems Against Corruptions via Node Evaluation
Large Language Model (LLM)-based Multi-Agent Systems (MAS) have become a popular paradigm of AI applications. However, trustworthiness issues in MAS remain a critical concern. Unlike challenges in single-agent systems, MAS involve more complex communication processes, making them susceptible to corruption attacks. To mitigate this issue, several defense mechanisms have been developed based on the graph representation of MAS, where agents represent nodes and communications form edges. Nevertheless, these methods predominantly focus on static graph defense, attempting to either detect attacks in a fixed graph structure or optimize a static topology with certain defensive capabilities. To address this limitation, we propose a dynamic defense paradigm for MAS graph structures, which continuously monitors communication within the MAS graph, then dynamically adjusts the graph topology, accurately disrupts malicious communications, and effectively defends against evolving and diverse dynamic attacks. Experimental results in increasingly complex and dynamic MAS environments demonstrate that our method significantly outperforms existing MAS defense mechanisms, contributing an effective guardrail for their trustworthy applications. Our code is available at https://github.com/ChengcanWu/Monitoring-LLM-Based-Multi-Agent-Systems.
ColorAgent: Building A Robust, Personalized, and Interactive OS Agent
With the advancements in hardware, software, and large language model technologies, the interaction between humans and operating systems has evolved from the command-line interface to the rapidly emerging AI agent interactions. Building an operating system (OS) agent capable of executing user instructions and faithfully following user desires is becoming a reality. In this technical report, we present ColorAgent, an OS agent designed to engage in long-horizon, robust interactions with the environment while also enabling personalized and proactive user interaction. To enable long-horizon interactions with the environment, we enhance the model's capabilities through step-wise reinforcement learning and self-evolving training, while also developing a tailored multi-agent framework that ensures generality, consistency, and robustness. In terms of user interaction, we explore personalized user intent recognition and proactive engagement, positioning the OS agent not merely as an automation tool but as a warm, collaborative partner. We evaluate ColorAgent on the AndroidWorld and AndroidLab benchmarks, achieving success rates of 77.2% and 50.7%, respectively, establishing a new state of the art. Nonetheless, we note that current benchmarks are insufficient for a comprehensive evaluation of OS agents and propose further exploring directions in future work, particularly in the areas of evaluation paradigms, agent collaboration, and security. Our code is available at https://github.com/MadeAgents/mobile-use.
SORA-ATMAS: Adaptive Trust Management and Multi-LLM Aligned Governance for Future Smart Cities
The rapid evolution of smart cities has increased the reliance on intelligent interconnected services to optimize infrastructure, resources, and citizen well-being. Agentic AI has emerged as a key enabler by supporting autonomous decision-making and adaptive coordination, allowing urban systems to respond in real time to dynamic conditions. Its benefits are evident in areas such as transportation, where the integration of traffic data, weather forecasts, and safety sensors enables dynamic rerouting and a faster response to hazards. However, its deployment across heterogeneous smart city ecosystems raises critical governance, risk, and compliance (GRC) challenges, including accountability, data privacy, and regulatory alignment within decentralized infrastructures. Evaluation of SORA-ATMAS with three domain agents (Weather, Traffic, and Safety) demonstrated that its governance policies, including a fallback mechanism for high-risk scenarios, effectively steer multiple LLMs (GPT, Grok, DeepSeek) towards domain-optimized, policy-aligned outputs, producing an average MAE reduction of 35% across agents. Results showed stable weather monitoring, effective handling of high-risk traffic plateaus 0.85, and adaptive trust regulation in Safety/Fire scenarios 0.65. Runtime profiling of a 3-agent deployment confirmed scalability, with throughput between 13.8-17.2 requests per second, execution times below 72~ms, and governance delays under 100 ms, analytical projections suggest maintained performance at larger scales. Cross-domain rules ensured safe interoperability, with traffic rerouting permitted only under validated weather conditions. These findings validate SORA-ATMAS as a regulation-aligned, context-aware, and verifiable governance framework that consolidates distributed agent outputs into accountable, real-time decisions, offering a resilient foundation for smart-city management.
Collaborative penetration testing suite for emerging generative AI algorithms
Problem Space: AI Vulnerabilities and Quantum Threats Generative AI vulnerabilities: model inversion, data poisoning, adversarial inputs. Quantum threats Shor Algorithm breaking RSA ECC encryption. Challenge Secure generative AI models against classical and quantum cyberattacks. Proposed Solution Collaborative Penetration Testing Suite Five Integrated Components: DAST SAST OWASP ZAP, Burp Suite, SonarQube, Fortify. IAST Contrast Assess integrated with CI CD pipeline. Blockchain Logging Hyperledger Fabric for tamper-proof logs. Quantum Cryptography Lattice based RLWE protocols. AI Red Team Simulations Adversarial ML & Quantum-assisted attacks. Integration Layer: Unified workflow for AI, cybersecurity, and quantum experts. Key Results 300+ vulnerabilities identified across test environments. 70% reduction in high-severity issues within 2 weeks. 90% resolution efficiency for blockchain-logged vulnerabilities. Quantum-resistant cryptography maintained 100% integrity in tests. Outcome: Quantum AI Security Protocol integrating Blockchain Quantum Cryptography AI Red Teaming.
Learning to Make Friends: Coaching LLM Agents toward Emergent Social Ties
Can large language model (LLM) agents reproduce the complex social dynamics that characterize human online behavior -- shaped by homophily, reciprocity, and social validation -- and what memory and learning mechanisms enable such dynamics to emerge? We present a multi-agent LLM simulation framework in which agents repeatedly interact, evaluate one another, and adapt their behavior through in-context learning accelerated by a coaching signal. To model human social behavior, we design behavioral reward functions that capture core drivers of online engagement, including social interaction, information seeking, self-presentation, coordination, and emotional support. These rewards align agent objectives with empirically observed user motivations, enabling the study of how network structures and group formations emerge from individual decision-making. Our experiments show that coached LLM agents develop stable interaction patterns and form emergent social ties, yielding network structures that mirror properties of real online communities. By combining behavioral rewards with in-context adaptation, our framework establishes a principled testbed for investigating collective dynamics in LLM populations and reveals how artificial agents may approximate or diverge from human-like social behavior.
The Emergence of Complex Behavior in Large-Scale Ecological Environments
We explore how physical scale and population size shape the emergence of complex behaviors in open-ended ecological environments. In our setting, agents are unsupervised and have no explicit rewards or learning objectives but instead evolve over time according to reproduction, mutation, and natural selection. As they act, agents also shape their environment and the population around them in an ongoing dynamic ecology. Our goal is not to optimize a single high-performance policy, but instead to examine how behaviors emerge and evolve across large populations due to natural competition and environmental pressures. In an effort to discover how complex behaviors naturally emerge, we conduct experiments in large-scale worlds that reach populations of more than 60,000 individual agents, each with their own evolved neural network policy. We identify various emergent behaviors such as long-range resource extraction, vision-based foraging, and predation that arise under competitive and survival pressures. We examine how sensing modalities and environmental scale affect the emergence of these behaviors, finding that some appear only in sufficiently large environments and populations, with larger scales increasing behavioral stability and consistency. While there is a rich history of research in evolutionary settings, our scaling results provide promising new directions to explore ecology as an instrument of machine learning in an era of abundant computational resources. Experimental code is available at https://github.com/jbejjani2022/ecological-emergent-behavior.
comment: 18 pages, 11 figures, 6 tables, experiment code available at https://github.com/jbejjani2022/ecological-emergent-behavior
Adaptive Coopetition: Leveraging Coarse Verifier Signals for Resilient Multi-Agent LLM Reasoning NeurIPS 2025
Inference-time computation is a critical yet challenging paradigm for enhancing the reasoning performance of large language models (LLMs). While existing strategies improve reasoning stability and consistency, they suffer from notable limitations: self-correction often reinforces the model's initial biases, and Multi-Agent Collaboration (MAC) often fails due to the lack of efficient coordination mechanisms, leading to collective errors. Although high-performing verifiers can detect reasoning errors, making them reliable requires substantial training. To address these challenges, we introduce a novel inference-time framework, Adaptive Coopetition (AdCo), in which LLM agents utilize an adaptive, UCB-based "coopetition" mechanism. At each round, agents leverage coarse verifier signals to determine whether to collaborate or compete, and iteratively refine their reasoning based on peer feedback. Without relying on high-performance verifiers, our adaptive strategy achieves significant performance gains on mathematical reasoning benchmarks, yielding a 20% relative improvement over baselines on the more challenging dataset. Our approach remains robust and consistent in terms of accuracy under different sample sizes and configurations. This adaptive, signal-guided "coopetition" framework enhances reasoning robustness by leveraging both model knowledge diversity and reasoning trace measures, while also promoting uncertainty-driven exploration, especially when participants have comparable capabilities. From this perspective, our work offers a fresh lens on inference-time computation and paves the way for more resilient multi-agent LLM systems. Our code is available at: https://github.com/AdCo-Research/adaptive-coopetition.
comment: 13 pages, 8 figures. Accepted for presentation at the 5th Workshop on Mathematical Reasoning and AI at NeurIPS 2025
PLAGUE: Plug-and-play framework for Lifelong Adaptive Generation of Multi-turn Exploits
Large Language Models (LLMs) are improving at an exceptional rate. With the advent of agentic workflows, multi-turn dialogue has become the de facto mode of interaction with LLMs for completing long and complex tasks. While LLM capabilities continue to improve, they remain increasingly susceptible to jailbreaking, especially in multi-turn scenarios where harmful intent can be subtly injected across the conversation to produce nefarious outcomes. While single-turn attacks have been extensively explored, adaptability, efficiency and effectiveness continue to remain key challenges for their multi-turn counterparts. To address these gaps, we present PLAGUE, a novel plug-and-play framework for designing multi-turn attacks inspired by lifelong-learning agents. PLAGUE dissects the lifetime of a multi-turn attack into three carefully designed phases (Primer, Planner and Finisher) that enable a systematic and information-rich exploration of the multi-turn attack family. Evaluations show that red-teaming agents designed using PLAGUE achieve state-of-the-art jailbreaking results, improving attack success rates (ASR) by more than 30% across leading models in a lesser or comparable query budget. Particularly, PLAGUE enables an ASR (based on StrongReject) of 81.4% on OpenAI's o3 and 67.3% on Claude's Opus 4.1, two models that are considered highly resistant to jailbreaks in safety literature. Our work offers tools and insights to understand the importance of plan initialization, context optimization and lifelong learning in crafting multi-turn attacks for a comprehensive model vulnerability evaluation.
comment: First two authors have equal author contributions
RADAR: A Risk-Aware Dynamic Multi-Agent Framework for LLM Safety Evaluation via Role-Specialized Collaboration
Existing safety evaluation methods for large language models (LLMs) suffer from inherent limitations, including evaluator bias and detection failures arising from model homogeneity, which collectively undermine the robustness of risk evaluation processes. This paper seeks to re-examine the risk evaluation paradigm by introducing a theoretical framework that reconstructs the underlying risk concept space. Specifically, we decompose the latent risk concept space into three mutually exclusive subspaces: the explicit risk subspace (encompassing direct violations of safety guidelines), the implicit risk subspace (capturing potential malicious content that requires contextual reasoning for identification), and the non-risk subspace. Furthermore, we propose RADAR, a multi-agent collaborative evaluation framework that leverages multi-round debate mechanisms through four specialized complementary roles and employs dynamic update mechanisms to achieve self-evolution of risk concept distributions. This approach enables comprehensive coverage of both explicit and implicit risks while mitigating evaluator bias. To validate the effectiveness of our framework, we construct an evaluation dataset comprising 800 challenging cases. Extensive experiments on our challenging testset and public benchmarks demonstrate that RADAR significantly outperforms baseline evaluation methods across multiple dimensions, including accuracy, stability, and self-evaluation risk sensitivity. Notably, RADAR achieves a 28.87% improvement in risk identification accuracy compared to the strongest baseline evaluation method.
Coordinated Strategies in Realistic Air Combat by Hierarchical Multi-Agent Reinforcement Learning
Achieving mission objectives in a realistic simulation of aerial combat is highly challenging due to imperfect situational awareness and nonlinear flight dynamics. In this work, we introduce a novel 3D multi-agent air combat environment and a Hierarchical Multi-Agent Reinforcement Learning framework to tackle these challenges. Our approach combines heterogeneous agent dynamics, curriculum learning, league-play, and a newly adapted training algorithm. To this end, the decision-making process is organized into two abstraction levels: low-level policies learn precise control maneuvers, while high-level policies issue tactical commands based on mission objectives. Empirical results show that our hierarchical approach improves both learning efficiency and combat performance in complex dogfight scenarios.
comment: 2025 IEEE International Conference on Agentic AI (ICA)
IM-Chat: A Multi-agent LLM Framework Integrating Tool-Calling and Diffusion Modeling for Knowledge Transfer in Injection Molding Industry
The injection molding industry faces critical challenges in preserving and transferring field knowledge, particularly as experienced workers retire and multilingual barriers hinder effective communication. This study introduces IM-Chat, a multi-agent framework based on large language models (LLMs), designed to facilitate knowledge transfer in injection molding. IM-Chat integrates both limited documented knowledge (e.g., troubleshooting tables, manuals) and extensive field data modeled through a data-driven process condition generator that infers optimal manufacturing settings from environmental inputs such as temperature and humidity, enabling robust and context-aware task resolution. By adopting a retrieval-augmented generation (RAG) strategy and tool-calling agents within a modular architecture, IM-Chat ensures adaptability without the need for fine-tuning. Performance was assessed across 100 single-tool and 60 hybrid tasks for GPT-4o, GPT-4o-mini, and GPT-3.5-turbo by domain experts using a 10-point rubric focused on relevance and correctness, and was further supplemented by automated evaluation using GPT-4o guided by a domain-adapted instruction prompt. The evaluation results indicate that more capable models tend to achieve higher accuracy, particularly in complex, tool-integrated scenarios. In addition, compared with the fine-tuned single-agent LLM, IM-Chat demonstrated superior accuracy, particularly in quantitative reasoning, and greater scalability in handling multiple information sources. Overall, these findings demonstrate the viability of multi-agent LLM systems for industrial knowledge workflows and establish IM-Chat as a scalable and generalizable approach to AI-assisted decision support in manufacturing.
Vahana.jl -- A framework (not only) for large-scale agent-based models
Agent-based models (ABMs) offer a powerful framework for understanding complex systems. However, their computational demands often become a significant barrier as the number of agents and complexity of the simulation increase. Traditional ABM platforms often struggle to fully exploit modern computing resources, hindering the development of large-scale simulations. This paper presents Vahana.jl, a high performance computing open source framework that aims to address these limitations. Building on the formalism of synchronous graph dynamical systems, Vahana.jl is especially well suited for models with a focus on (social) networks. The framework seamlessly supports distribution across multiple compute nodes, enabling simulations that would otherwise be beyond the capabilities of a single machine. Implemented in Julia, Vahana.jl leverages the interactive Read-Eval-Print Loop (REPL) environment, facilitating rapid model development and experimentation.
Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow
Multi-Agent System (MAS) powered by Visual Language Models (VLMs) enables challenging tasks but suffers from a novel failure term, multi-agent visual hallucination snowballing, where hallucinations are seeded in a single agent and amplified by following ones due to the over-reliance on textual flow to relay visual information. Through turn-, layer-, and token-wise attention analyses, we provide detailed insights into the essence of hallucination snowballing regarding the reduction of visual attention allocation. It leads us to identify a subset of vision tokens with a unimodal attention peak in middle layers that best preserve visual evidence but gradually diminish in deeper agent turns, resulting in the visual hallucination snowballing in MAS. Thus, we propose ViF, a lightweight, plug-and-play mitigation paradigm that relays inter-agent messages with Visual Flow powered by the selected visual relay tokens and applies attention reallocation to amplify this pattern. The experiment results demonstrate that our method markedly reduces hallucination snowballing, consistently improving the performance across eight benchmarks based on four common MAS structures and ten base models. The source code is publicly available at: https://github.com/YU-deep/ViF.git.
PARCO: Parallel AutoRegressive Models for Multi-Agent Combinatorial Optimization NeurIPS 2025
Combinatorial optimization problems involving multiple agents are notoriously challenging due to their NP-hard nature and the necessity for effective agent coordination. Despite advancements in learning-based methods, existing approaches often face critical limitations, including suboptimal agent coordination, poor generalization, and high computational latency. To address these issues, we propose PARCO (Parallel AutoRegressive Combinatorial Optimization), a general reinforcement learning framework designed to construct high-quality solutions for multi-agent combinatorial tasks efficiently. To this end, PARCO integrates three key novel components: (1) transformer-based communication layers to enable effective agent collaboration during parallel solution construction, (2) a multiple pointer mechanism for low-latency, parallel agent decision-making, and (3) priority-based conflict handlers to resolve decision conflicts via learned priorities. We evaluate PARCO in multi-agent vehicle routing and scheduling problems, where our approach outperforms state-of-the-art learning methods, demonstrating strong generalization ability and remarkable computational efficiency. We make our source code publicly available to foster future research: https://github.com/ai4co/parco.
comment: Accepted at NeurIPS 2025
ROTATE: Regret-driven Open-ended Training for Ad Hoc Teamwork
Learning to collaborate with previously unseen partners is a fundamental generalization challenge in multi-agent learning, known as Ad Hoc Teamwork (AHT). Existing AHT approaches often adopt a two-stage pipeline, where first, a fixed population of teammates is generated with the idea that they should be representative of the teammates that will be seen at deployment time, and second, an AHT agent is trained to collaborate well with agents in the population. To date, the research community has focused on designing separate algorithms for each stage. This separation has led to algorithms that generate teammates with limited coverage of possible behaviors, and that ignore whether the generated teammates are easy to learn from for the AHT agent. Furthermore, algorithms for training AHT agents typically treat the set of training teammates as static, thus attempting to generalize to previously unseen partner agents without assuming any control over the set of training teammates. This paper presents a unified framework for AHT by reformulating the problem as an open-ended learning process between an AHT agent and an adversarial teammate generator. We introduce ROTATE, a regret-driven, open-ended training algorithm that alternates between improving the AHT agent and generating teammates that probe its deficiencies. Experiments across diverse two-player environments demonstrate that ROTATE significantly outperforms baselines at generalizing to an unseen set of evaluation teammates, thus establishing a new standard for robust and generalizable teamwork.
A Principle of Targeted Intervention for Multi-Agent Reinforcement Learning NeurIPS 2025
Steering cooperative multi-agent reinforcement learning (MARL) towards desired outcomes is challenging, particularly when the global guidance from a human on the whole multi-agent system is impractical in a large-scale MARL. On the other hand, designing external mechanisms (e.g., intrinsic rewards and human feedback) to coordinate agents mostly relies on empirical studies, lacking a easy-to-use research tool. In this work, we employ multi-agent influence diagrams (MAIDs) as a graphical framework to address the above issues. First, we introduce the concept of MARL interaction paradigms, using MAIDs to analyze and visualize both unguided self-organization and global guidance mechanisms in MARL. Then, we design a new MARL interaction paradigm, referred to as the targeted intervention paradigm that is applied to only a single targeted agent, so the problem of global guidance can be mitigated. In our implementation, we introduce a causal inference technique, referred to as Pre-Strategy Intervention (PSI), to realize the targeted intervention paradigm. Since MAIDs can be regarded as a special class of causal diagrams, a composite desired outcome that integrates the primary task goal and an additional desired outcome can be achieved by maximizing the corresponding causal effect through the PSI. Moreover, the bundled relevance graph analysis of MAIDs provides a tool to identify whether an MARL learning paradigm is workable under the design of an MARL interaction paradigm. In experiments, we demonstrate the effectiveness of our proposed targeted intervention, and verify the result of relevance graph analysis.
comment: Published in NeurIPS 2025
Systems and Control (CS)
Nodal Capacity Expansion Planning with Flexible Large-Scale Load Siting
We propose explicitly incorporating large-scale load siting into a stochastic nodal power system capacity expansion planning model that concurrently co-optimizes generation, transmission and storage expansion. The potential operational flexibility of some of these large loads is also taken into account by considering them as consisting of a set of tranches with different reliability requirements, which are modeled as a constraint on expected served energy across operational scenarios. We implement our model as a two-stage stochastic mixed-integer optimization problem with cross-scenario expectation constraints. To overcome the challenge of scalability, we build upon existing work to implement this model on a high performance computing platform and exploit scenario parallelization using an augmented Progressive Hedging Algorithm. The algorithm is implemented using the bounding features of mpisppy, which have shown to provide satisfactory provable optimality gaps despite the absence of theoretical guarantees of convergence. We test our approach to assess the value of this proactive planning framework on total system cost and reliability metrics using realistic testcases geographically assigned to San Diego and South Carolina, with datacenter and direct air capture facilities as large loads.
Bridging Earth and Space: A Survey on HAPS for Non-Terrestrial Networks
HAPS are emerging as key enablers in the evolution of 6G wireless networks, bridging terrestrial and non-terrestrial infrastructures. Operating in the stratosphere, HAPS can provide wide-area coverage, low-latency, energy-efficient broadband communications with flexible deployment options for diverse applications. This survey delivers a comprehensive overview of HAPS use cases, technologies, and integration strategies within the 6G ecosystem. The roles of HAPS in extending connectivity to underserved regions, supporting dynamic backhauling, enabling massive IoT, and delivering reliable low-latency communications for autonomous and immersive services are discussed. The paper reviews state-of-the-art architectures for terrestrial and non-terrestrial network integration, highlights recent field trials. Furthermore, key enabling technologies such as channel modeling, AI-driven resource allocation, interference control, mobility management, and energy-efficient communications are examined. The paper also outlines open research challenges. By addressing existing gaps in the literature, this survey positions HAPS as a foundational component of globally integrated, resilient, and sustainable 6G networks.
comment: 30 pages. This work has been submitted to IEEE Communications Surveys & Tutorials (under review)
Optimal Kron-based Reduction of Networks (Opti-KRON) for Three-phase Distribution Feeders
This paper presents a novel structure-preserving, Kron-based reduction framework for unbalanced distribution feeders. The method aggregates electrically similar nodes within a mixed-integer optimization (MIP) problem to produce reduced networks that optimally reproduce the voltage profiles of the original full network. To overcome computational bottlenecks of MIP formulations, we propose an exhaustive-search formulation to identify optimal aggregation decisions while enforcing voltage margin limits. The proposed exhaustive network reduction algorithm is parallelizable on GPUs, which enables scalable network reduction. The resulting reduced networks approximate the full system's voltage profiles with low errors and are suitable for steady-state analysis and optimal power flow studies. The framework is validated on two real utility distribution feeders with 5,991 and 8,381 nodes. The reduced models achieve up to 90% and 80% network reduction, respectively, while the maximum voltage-magnitude error remains below 0.003 p.u. Furthermore, on a 1000-node version of the network, the GPU-accelerated reduction algorithm runs up to 15x faster than its CPU-based counterpart.
Control Barrier Functions for the Full Class of Signal Temporal Logic Tasks using Spatiotemporal Tubes
This paper introduces a new framework for synthesizing time-varying control barrier functions (TV-CBFs) for general Signal Temporal Logic (STL) specifications using spatiotemporal tubes (STT). We first formulate the STT synthesis as a robust optimization problem (ROP) and solve it through a scenario optimization problem (SOP), providing formal guarantees that the resulting tubes capture the given STL specifications. These STTs are then used to construct TV-CBFs, ensuring that under any control law rendering them invariant, the system satisfies the STL tasks. We demonstrate the framework through case studies on a differential-drive mobile robot and a quadrotor, and provide a comparative analysis showing improved efficiency over existing approaches.
Multi-UAV Flood Monitoring via CVT with Gaussian Mixture of Density Functions for Coverage Control
This study presents a control strategy for coordinating multiple unmanned aerial vehicles (UAVs) to monitor unknown flood regions and estimate the extent of inundation. The proposed method adopts a density-driven coverage framework based on Centroidal Voronoi Tessellation (CVT), in which the density function is modeled using a Gaussian Mixture of Density Functions (GMDF). This formulation provides a more accurate characterization of inundated areas compared to conventional axis-aligned Gaussian models. The performance of the two density modeling approaches is systematically evaluated under different UAV fleet sizes (16, 20, and 24), with multiple simulation trials conducted in the ROS/Gazebo environment. The results show that the GMDF-based formulation consistently achieves higher coverage rates, demonstrating its effectiveness in enhancing flood monitoring and improving UAV spatial distribution.
comment: 9 pages,6 figures
Risk Assessment of an Autonomous Underwater Snake Robot in Confined Operations
The growing interest in ocean discovery imposes a need for inspection and intervention in confined and demanding environments. Eely's slender shape, in addition to its ability to change its body configurations, makes articulated underwater robots an adequate option for such environments. However, operation of Eely in such environments imposes demanding requirements on the system, as it must deal with uncertain and unstructured environments, extreme environmental conditions, and reduced navigational capabilities. This paper proposes a Bayesian approach to assess the risks of losing Eely during two mission scenarios. The goal of this work is to improve Eely's performance and the likelihood of mission success. Sensitivity analysis results are presented in order to demonstrate the causes having the highest impact on losing Eely.
comment: 9 pages, 6 figures, Accepted for publication in OCEANS 2023 - Limerick
Managing Charging Induced Grid Stress and Battery Degradation in Electric Taxi Fleets
Operating fleets of electric vehicles (EVs) introduces several challenges, some of which are borne by the fleet operator, and some of which are borne by the power grid. To maximize short-term profit a fleet operator could always charge EVs at the maximum rate to ensure vehicles are ready to service ride demand. However, due to the stochastic nature of electricity demand, charging EVs at their maximum rate may potentially increase the grid stress and lead to overall instability. Furthermore, high-rate charging of EVs can accelerate battery degradation, thereby reducing the service lifespan of the fleet. This study aims to reconcile the conflicting incentives of fleet longevity, short-term profitability, and grid stability by simulating a taxi fleet throughout its lifespan in relation to its charging policies and service conditions. We develop an EV fleet simulator to evaluate the battery degradation due to unpredictable charging and ride demand. Consequently, the impact on the power grid through the charging infrastructure is assessed due to these activities. This simulation utilizes publicly accessible real-world travel data from the NYC taxi dataset. We compare a baseline 80-20 fleet charging policy with a reinforcement learning-based policy designed to prolong the fleet's service life and alleviate grid stress. We monitor grid stress, battery degradation, and profitability over five years and find that our learned policy outperforms the baseline. This simulator enables fleet operators to assess the impact of different charging policies on these indicators to make informed decisions in the future.
comment: 7 pages, 8 figures, to be published in the proceedings of 2025 IEEE Innovative Smart Grid Technologies - Asia (ISGT-Asia)
Magnetic field estimation using Gaussian process regression for interactive wireless power system design
Wireless power transfer (WPT) with coupled resonators offers a promising solution for the seamless powering of electronic devices. Interactive design approaches that visualize the magnetic field and power transfer efficiency based on system geometry adjustments can facilitate the understanding and exploration of the behavior of these systems for dynamic applications. However, typical electromagnetic field simulation methods, such as the Method of Moments (MoM), require significant computational resources, limiting the rate at which computation can be performed for acceptable interactivity. Furthermore, the system's sensitivity to positional and geometrical changes necessitates a large number of simulations, and structures such as ferromagnetic shields further complicate these simulations. Here, we introduce a machine learning approach using Gaussian Process Regression (GPR), demonstrating for the first time the rapid estimation of the entire magnetic field and power transfer efficiency for near-field coupled systems. To achieve quick and accurate estimation, we develop 3D adaptive grid systems and an active learning strategy to effectively capture the nonlinear interactions between complex system geometries and magnetic fields. By training a regression model, our approach achieves magnetic field computation with sub-second latency and with an average error of less than 6% when validated against independent electromagnetic simulation results.
comment: 29 pages, 8 figures, 1 table
Spatiotemporal Tubes based Control of Unknown Multi-Agent Systems for Temporal Reach-Avoid-Stay Tasks
The paper focuses on designing a controller for unknown dynamical multi-agent systems to achieve temporal reach-avoid-stay tasks for each agent while preventing inter-agent collisions. The main objective is to generate a spatiotemporal tube (STT) for each agent and thereby devise a closed-form, approximation-free, and decentralized control strategy that ensures the system trajectory reaches the target within a specific time while avoiding time-varying unsafe sets and collisions with other agents. In order to achieve this, the requirements of STTs are formulated as a robust optimization problem (ROP) and solved using a sampling-based scenario optimization problem (SOP) to address the issue of infeasibility caused by the infinite number of constraints in ROP. The STTs are generated by solving the SOP, and the corresponding closed-form control is designed to fulfill the specified task. Finally, the effectiveness of our approach is demonstrated through two case studies, one involving omnidirectional robots and the other involving multiple drones modelled as Euler-Lagrange systems.
Query-Efficient Zeroth-Order Algorithms for Nonconvex Optimization
Zeroth-order optimization (ZO) has been a powerful framework for solving black-box problems, which estimates gradients using zeroth-order data to update variables iteratively. The practical applicability of ZO critically depends on the efficiency of single-step gradient estimation and the overall query complexity. However, existing ZO algorithms cannot achieve efficiency on both simultaneously. In this work, we consider a general constrained optimization model with black-box objective and constraint functions. To solve it, we propose novel algorithms that can achieve the state-of-the-art overall query complexity bound of $\mathcal{O}(d/\epsilon^4)$ to find an $\epsilon$-stationary solution ($d$ is the dimension of variable space), while reducing the queries for estimating a single-step gradient from $\mathcal{O}(d)$ to $\mathcal{O}(1)$. Specifically, we integrate block updates with gradient descent ascent and a block gradient estimator, which leads to two algorithms, ZOB-GDA and ZOB-SGDA, respectively. Instead of constructing full gradients, they estimate only partial gradients along random blocks of dimensions, where the adjustable block sizes enable high single-step efficiency without sacrificing convergence guarantees. Our theoretical results establish the finite-sample convergence of the proposed algorithms for nonconvex optimization. Finally, numerical experiments on a practical problem demonstrate that our algorithms require over ten times fewer queries than existing methods.
comment: 34 pages, 4 figures
Policy Gradient Method for LQG Control via Input-Output-History Representation: Convergence to $O(ε)$-Stationary Points
We study the policy gradient method (PGM) for the linear quadratic Gaussian (LQG) dynamic output-feedback control problem using an input-output-history (IOH) representation of the closed-loop system. First, we show that any dynamic output-feedback controller is equivalent to a static partial-state feedback gain for a new system representation characterized by a finite-length IOH. Leveraging this equivalence, we reformulate the search for an optimal dynamic output feedback controller as an optimization problem over the corresponding partial-state feedback gain. Next, we introduce a relaxed version of the IOH-based LQG problem by incorporating a small process noise with covariance $\epsilon I$ into the new system to ensure coerciveness, a key condition for establishing gradient-based convergence guarantees. Consequently, we show that a vanilla PGM for the relaxed problem converges to an $\mathcal{O}(\epsilon)$-stationary point, i.e., $\overline{K}$ satisfying $\|\nabla J(\overline{K})\|_F \leq \mathcal{O}(\epsilon)$, where $J$ denotes the original LQG cost. Numerical experiments empirically indicate convergence to the vicinity of the globally optimal LQG controller.
Modeling and Simulation of an Active Car Suspension with a Robust LQR Controller under Road Disturbance, Parameter Uncertainty and White Noise
Vehicle suspension is important for passengers to travel comfortably and to be less exposed to effects such as vibration and shock. A good suspension system increases the road holding of vehicles, allows them to take turns safely, and reduces the risk of traffic accidents. A passive suspension system is the most widely used suspension system in vehicles due to its simple structure and low cost. Passive suspension systems do not have an actuator and therefore do not have a controller. Active suspension systems have an actuator and a controller. Although their structures are more complex and costly, they are safer. PID controller is widely used in active suspension systems due to its simple structure, reasonable cost, and easy adjustment of coefficients. In this study, a more robust LQR-controlled active suspension was designed than a passive suspension and a PID-controlled active suspension. Robustness analyses were performed for passive suspension, PID-controlled active suspension, and LQR-controlled active suspension. Suspension travel, sprung mass acceleration, and sprung mass motion simulations were performed for all three suspensions under road disturbance, under simultaneous road disturbance and parameter uncertainty and under road disturbance with white noise. A comparative analysis was performed by obtaining the rise time, overshoot, and settling time data of the suspensions under different conditions. It was observed that the LQR-controlled active suspension showed the fastest rise time, the least overshoot and had the shortest settling time. In this case, it was proven that the LQR controlled active suspension provided a more comfortable and safe ride compared to the other two suspension systems.
comment: 20 pages, 19 figures
AttentionSwarm: Reinforcement Learning with Attention Control Barier Function for Crazyflie Drones in Dynamic Environments
We introduce AttentionSwarm, a novel benchmark designed to evaluate safe and efficient swarm control in a dynamic drone racing scenario. Central to our approach is the Attention Model-Based Control Barrier Function (CBF) framework, which integrates attention mechanisms with safety-critical control theory to enable real-time collision avoidance and trajectory optimization. This framework dynamically prioritizes critical obstacles and agents in the swarm's vicinity using attention weights, while CBFs formally guarantee safety by enforcing collision-free constraints. The AttentionSwarm algorithm was developed and evaluated using a swarm of Crazyflie 2.1 micro quadrotors, which were tested indoors with the Vicon motion capture system to ensure precise localization and control. Experimental results show that our system achieves a 95-100% collision-free navigation rate in a dynamic multi-agent drone racing environment, underscoring its effectiveness and robustness in real-world scenarios. This work offers a promising foundation for safe, high-speed multi-robot applications in logistics, inspection, and racing.
comment: 6 pages, 6 figures
A Data-Driven Method to Identify Major Contributors to Low-Frequency Oscillations
We present a purely data-driven method to pinpoint generation plants that significantly contribute to poorly damped oscillations as part of post-event analysis. First, Extended Dynamic Mode Decomposition (EDMD) is applied on PMU data from the point of interconnection (POI) of the plants to obtain the finite-dimensional Koopman operator. Then, modal analysis is performed on a reduced-order Koopman operator to extract spatio-temporal patterns. The data-driven eigenvalues and eigenvectors quantify each plant's contribution to critical oscillatory modes without requiring any system model information. We demonstrate the effectiveness of this method through simulated case studies on modified IEEE 39-bus and WECC 179-bus test systems by benchmarking the data-driven results against ground-truth models. Its performance is further validated using PMU data from real oscillation events in the ISO-New England system. This data-driven method offers a practical tool for both planning-stage simulations and post-event analysis of real oscillation events, enabling effective mitigation.
comment: 10 pages, 11 figures, Journal paper.Submitted to IEEE Transactions on Power System
Planning of Off-Grid Renewable Power to Ammonia Systems with Heterogeneous Flexibility: A Multistakeholder Equilibrium Perspective
Off-grid renewable power to ammonia (ReP2A) systems present a promising pathway toward carbon neutrality in both the energy and chemical industries. However, due to chemical safety requirements, the limited flexibility of ammonia synthesis poses a challenge when attempting to align with the variable hydrogen flow produced from renewable power. This necessitates the optimal sizing of equipment capacity for effective and coordinated production across the system. Additionally, an ReP2A system may involve multiple stakeholders with varying degrees of operational flexibility, complicating the planning problem. This paper first examines the multistakeholder sizing equilibrium (MSSE) of the ReP2A system. First, we propose an MSSE model that accounts for individual planning decisions and the competing economic interests of the stakeholders of power generation, hydrogen production, and ammonia synthesis. We then construct an equivalent optimization problem based on Karush-Kuhn-Tucker (KKT) conditions to determine the equilibrium. Following this, we decompose the problem in the temporal dimension and solve it via multicut generalized Benders decomposition (GBD) to address long-term balancing issues. Case studies based on a realistic project reveal that the equilibrium does not naturally balance the interests of all stakeholders due to their heterogeneous characteristics. Our findings suggest that benefit transfer or re-arrangement ensure mutual benefits and the successful implementation of ReP2A projects.
comment: Accepted in IEEE Transactions on Power Systems \copyright2025 IEEE
Optimal Investment Portfolio of Thyristor- and IGBT-based Electrolysis Rectifiers in Utility-scale Renewable P2H Systems
Renewable power-to-hydrogen (ReP2H) systems require rectifiers to supply power to electrolyzers (ELZs). Two main types of rectifiers, insulated-gate bipolar transistor rectifiers (IGBT-Rs) and thyristor rectifiers (TRs), offer distinct tradeoffs. IGBT-Rs provide flexible reactive power control but are costly, whereas TRs are more affordable with lower power loss but consume a large amount of uncontrollable reactive power. A mixed configuration of rectifiers in utility-scale ReP2H systems could achieve a decent tradeoff and increase overall profitability. To explore this potential, this paper proposes an optimal investment portfolio model. First, we model and compare the active and reactive power characteristics of ELZs powered by TRs and IGBT-Rs. Second, we consider the investment of ELZs, rectifiers, and var resources and coordinate the operation of renewables, energy storage, var resources, and the on-off switching and load allocation of multiple ELZs. Subsequently, a two-stage stochastic programming (SP) model based on weighted information gap decision theory (W-IGDT) is developed to address the uncertainties of the renewable power and hydrogen price, and we apply the progressive hedging (PH) algorithm to accelerate its solution. Case studies demonstrate that optimal rectifier configurations increase revenue by at most 13.78% compared with configurations using only TRs or IGBT-Rs, existing project setups, or intuitive designs. Under the optimal portfolio, reactive power compensation investment is nearly eliminated, with a preferred TR-to-IGBT-R ratio of 3:1.
comment: Accepted in IEEE Transactions on Sustainable Energy \copyright 2025 IEEE
QPPG: Quantum-Preconditioned Policy Gradient for Link Adaptation in Rayleigh Fading Channels
Reliable link adaptation is critical for efficient wireless communications in dynamic fading environments. However, reinforcement learning (RL) solutions often suffer from unstable convergence due to poorly conditioned policy gradients, hindering their practical application. We propose the quantum-preconditioned policy gradient (QPPG) algorithm, which leverages Fisher-information-based preconditioning to stabilise and accelerate policy updates. Evaluations in Rayleigh fading scenarios show that QPPG achieves faster convergence, a 28.6% increase in average throughput, and a 43.8% decrease in average transmit power compared to classical methods. This work introduces quantum-geometric conditioning to link adaptation, marking a significant advance in developing robust, quantum-inspired reinforcement learning for future 6G networks, thereby enhancing communication reliability and energy efficiency.
comment: Submitted to IEEE Wireless Communications Letters
Time-causal and time-recursive wavelets
When to apply wavelet analysis to real-time temporal signals, where the future cannot be accessed, it is essential to base all the steps in the signal processing pipeline on computational mechanisms that are truly time-causal. This paper describes how a time-causal wavelet analysis can be performed based on concepts developed in the area of temporal scale-space theory, originating from a complete classification of temporal smoothing kernels that guarantee non-creation of new structures from finer to coarser temporal scale levels. By necessity, convolution with truncated exponential kernels in cascade constitutes the only permissable class of kernels, as well as their temporal derivatives as a natural complement to fulfil the admissibility conditions of wavelet representations. For a particular way of choosing the time constants in the resulting infinite convolution of truncated exponential kernels, to ensure temporal scale covariance and thus self-similarity over temporal scales, we describe how mother wavelets can be chosen as temporal derivatives of the resulting time-causal limit kernel. By developing connections between wavelet theory and scale-space theory, we characterize and quantify how the continuous scaling properties transfer to the discrete implementation, demonstrating how the proposed time-causal wavelet representation can reflect the duration of locally dominant temporal structures in the input signals. We propose that this notion of time-causal wavelet analysis could be a valuable tool for signal processing tasks, where streams of signals are to be processed in real time, specifically for signals that may contain local variations over a rich span of temporal scales, or more generally for analysing physical or biophysical temporal phenomena, where a fully time-causal analysis is called for to be physically realistic.
comment: 25 pages, 10 figures
Physics-Informed Reinforcement Learning for Large-Scale EV Smart Charging Considering Distribution Network Voltage Constraints
Electric Vehicles (EVs) offer substantial flexibility for grid services, yet large-scale, uncoordinated charging can threaten voltage stability in distribution networks. Existing Reinforcement Learning (RL) approaches for smart charging often disregard physical grid constraints or have limited performance for complex large-scale tasks, limiting their scalability and real-world applicability. This paper introduces a physics-informed (PI) RL algorithm that integrates a differentiable power flow model and voltage-based reward design into the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm, enabling EVs to deliver real-time voltage support while meeting user demands. The resulting PI-TD3 algorithm achieves faster convergence, improved sample efficiency, and reliable voltage magnitude regulation under uncertain and overloaded conditions. Benchmarks on the IEEE 34-bus and 123-bus networks show that the proposed PI-TD3 outperforms both model-free RL and optimization-based baselines in grid constraint management, user satisfaction, and economic metrics, even as the system scales to hundreds of EVs. These advances enable robust, scalable, and practical EV charging strategies that enhance grid resilience and support distribution networks operation.
Optimization and Control Technologies for Renewable-Dominated Hydrogen-Blended Integrated Gas-Electricity System: A Review
The growing coupling among electricity, gas, and hydrogen systems is driven by green hydrogen blending into existing natural gas pipelines, paving the way toward a renewable-dominated energy future. However, the integration poses significant challenges, particularly ensuring efficient and safe operation under varying hydrogen penetration and infrastructure adaptability. This paper reviews progress in optimization and control technologies for hydrogen-blended integrated gas-electricity system. First, key technologies and international demonstration projects are introduced to provide an overview of current developments. Besides, advances in gas-electricity system integration, including modeling, scheduling, planning and market design, are reviewed respectively. Then, the potential for cross-system fault propagation is highlighted, and practical methods for safety analysis and control are proposed. Finally, several possible research directions are introduced, aiming to ensure efficient renewable integration and reliable operation.
comment: Accepted by CSEE Journal of Power and Energy Systems in Oct. 2025
Systems and Control (EESS)
Nodal Capacity Expansion Planning with Flexible Large-Scale Load Siting
We propose explicitly incorporating large-scale load siting into a stochastic nodal power system capacity expansion planning model that concurrently co-optimizes generation, transmission and storage expansion. The potential operational flexibility of some of these large loads is also taken into account by considering them as consisting of a set of tranches with different reliability requirements, which are modeled as a constraint on expected served energy across operational scenarios. We implement our model as a two-stage stochastic mixed-integer optimization problem with cross-scenario expectation constraints. To overcome the challenge of scalability, we build upon existing work to implement this model on a high performance computing platform and exploit scenario parallelization using an augmented Progressive Hedging Algorithm. The algorithm is implemented using the bounding features of mpisppy, which have shown to provide satisfactory provable optimality gaps despite the absence of theoretical guarantees of convergence. We test our approach to assess the value of this proactive planning framework on total system cost and reliability metrics using realistic testcases geographically assigned to San Diego and South Carolina, with datacenter and direct air capture facilities as large loads.
Bridging Earth and Space: A Survey on HAPS for Non-Terrestrial Networks
HAPS are emerging as key enablers in the evolution of 6G wireless networks, bridging terrestrial and non-terrestrial infrastructures. Operating in the stratosphere, HAPS can provide wide-area coverage, low-latency, energy-efficient broadband communications with flexible deployment options for diverse applications. This survey delivers a comprehensive overview of HAPS use cases, technologies, and integration strategies within the 6G ecosystem. The roles of HAPS in extending connectivity to underserved regions, supporting dynamic backhauling, enabling massive IoT, and delivering reliable low-latency communications for autonomous and immersive services are discussed. The paper reviews state-of-the-art architectures for terrestrial and non-terrestrial network integration, highlights recent field trials. Furthermore, key enabling technologies such as channel modeling, AI-driven resource allocation, interference control, mobility management, and energy-efficient communications are examined. The paper also outlines open research challenges. By addressing existing gaps in the literature, this survey positions HAPS as a foundational component of globally integrated, resilient, and sustainable 6G networks.
comment: 30 pages. This work has been submitted to IEEE Communications Surveys & Tutorials (under review)
Optimal Kron-based Reduction of Networks (Opti-KRON) for Three-phase Distribution Feeders
This paper presents a novel structure-preserving, Kron-based reduction framework for unbalanced distribution feeders. The method aggregates electrically similar nodes within a mixed-integer optimization (MIP) problem to produce reduced networks that optimally reproduce the voltage profiles of the original full network. To overcome computational bottlenecks of MIP formulations, we propose an exhaustive-search formulation to identify optimal aggregation decisions while enforcing voltage margin limits. The proposed exhaustive network reduction algorithm is parallelizable on GPUs, which enables scalable network reduction. The resulting reduced networks approximate the full system's voltage profiles with low errors and are suitable for steady-state analysis and optimal power flow studies. The framework is validated on two real utility distribution feeders with 5,991 and 8,381 nodes. The reduced models achieve up to 90% and 80% network reduction, respectively, while the maximum voltage-magnitude error remains below 0.003 p.u. Furthermore, on a 1000-node version of the network, the GPU-accelerated reduction algorithm runs up to 15x faster than its CPU-based counterpart.
Control Barrier Functions for the Full Class of Signal Temporal Logic Tasks using Spatiotemporal Tubes
This paper introduces a new framework for synthesizing time-varying control barrier functions (TV-CBFs) for general Signal Temporal Logic (STL) specifications using spatiotemporal tubes (STT). We first formulate the STT synthesis as a robust optimization problem (ROP) and solve it through a scenario optimization problem (SOP), providing formal guarantees that the resulting tubes capture the given STL specifications. These STTs are then used to construct TV-CBFs, ensuring that under any control law rendering them invariant, the system satisfies the STL tasks. We demonstrate the framework through case studies on a differential-drive mobile robot and a quadrotor, and provide a comparative analysis showing improved efficiency over existing approaches.
Multi-UAV Flood Monitoring via CVT with Gaussian Mixture of Density Functions for Coverage Control
This study presents a control strategy for coordinating multiple unmanned aerial vehicles (UAVs) to monitor unknown flood regions and estimate the extent of inundation. The proposed method adopts a density-driven coverage framework based on Centroidal Voronoi Tessellation (CVT), in which the density function is modeled using a Gaussian Mixture of Density Functions (GMDF). This formulation provides a more accurate characterization of inundated areas compared to conventional axis-aligned Gaussian models. The performance of the two density modeling approaches is systematically evaluated under different UAV fleet sizes (16, 20, and 24), with multiple simulation trials conducted in the ROS/Gazebo environment. The results show that the GMDF-based formulation consistently achieves higher coverage rates, demonstrating its effectiveness in enhancing flood monitoring and improving UAV spatial distribution.
comment: 9 pages,6 figures
Risk Assessment of an Autonomous Underwater Snake Robot in Confined Operations
The growing interest in ocean discovery imposes a need for inspection and intervention in confined and demanding environments. Eely's slender shape, in addition to its ability to change its body configurations, makes articulated underwater robots an adequate option for such environments. However, operation of Eely in such environments imposes demanding requirements on the system, as it must deal with uncertain and unstructured environments, extreme environmental conditions, and reduced navigational capabilities. This paper proposes a Bayesian approach to assess the risks of losing Eely during two mission scenarios. The goal of this work is to improve Eely's performance and the likelihood of mission success. Sensitivity analysis results are presented in order to demonstrate the causes having the highest impact on losing Eely.
comment: 9 pages, 6 figures, Accepted for publication in OCEANS 2023 - Limerick
Managing Charging Induced Grid Stress and Battery Degradation in Electric Taxi Fleets
Operating fleets of electric vehicles (EVs) introduces several challenges, some of which are borne by the fleet operator, and some of which are borne by the power grid. To maximize short-term profit a fleet operator could always charge EVs at the maximum rate to ensure vehicles are ready to service ride demand. However, due to the stochastic nature of electricity demand, charging EVs at their maximum rate may potentially increase the grid stress and lead to overall instability. Furthermore, high-rate charging of EVs can accelerate battery degradation, thereby reducing the service lifespan of the fleet. This study aims to reconcile the conflicting incentives of fleet longevity, short-term profitability, and grid stability by simulating a taxi fleet throughout its lifespan in relation to its charging policies and service conditions. We develop an EV fleet simulator to evaluate the battery degradation due to unpredictable charging and ride demand. Consequently, the impact on the power grid through the charging infrastructure is assessed due to these activities. This simulation utilizes publicly accessible real-world travel data from the NYC taxi dataset. We compare a baseline 80-20 fleet charging policy with a reinforcement learning-based policy designed to prolong the fleet's service life and alleviate grid stress. We monitor grid stress, battery degradation, and profitability over five years and find that our learned policy outperforms the baseline. This simulator enables fleet operators to assess the impact of different charging policies on these indicators to make informed decisions in the future.
comment: 7 pages, 8 figures, to be published in the proceedings of 2025 IEEE Innovative Smart Grid Technologies - Asia (ISGT-Asia)
Magnetic field estimation using Gaussian process regression for interactive wireless power system design
Wireless power transfer (WPT) with coupled resonators offers a promising solution for the seamless powering of electronic devices. Interactive design approaches that visualize the magnetic field and power transfer efficiency based on system geometry adjustments can facilitate the understanding and exploration of the behavior of these systems for dynamic applications. However, typical electromagnetic field simulation methods, such as the Method of Moments (MoM), require significant computational resources, limiting the rate at which computation can be performed for acceptable interactivity. Furthermore, the system's sensitivity to positional and geometrical changes necessitates a large number of simulations, and structures such as ferromagnetic shields further complicate these simulations. Here, we introduce a machine learning approach using Gaussian Process Regression (GPR), demonstrating for the first time the rapid estimation of the entire magnetic field and power transfer efficiency for near-field coupled systems. To achieve quick and accurate estimation, we develop 3D adaptive grid systems and an active learning strategy to effectively capture the nonlinear interactions between complex system geometries and magnetic fields. By training a regression model, our approach achieves magnetic field computation with sub-second latency and with an average error of less than 6% when validated against independent electromagnetic simulation results.
comment: 29 pages, 8 figures, 1 table
Spatiotemporal Tubes based Control of Unknown Multi-Agent Systems for Temporal Reach-Avoid-Stay Tasks
The paper focuses on designing a controller for unknown dynamical multi-agent systems to achieve temporal reach-avoid-stay tasks for each agent while preventing inter-agent collisions. The main objective is to generate a spatiotemporal tube (STT) for each agent and thereby devise a closed-form, approximation-free, and decentralized control strategy that ensures the system trajectory reaches the target within a specific time while avoiding time-varying unsafe sets and collisions with other agents. In order to achieve this, the requirements of STTs are formulated as a robust optimization problem (ROP) and solved using a sampling-based scenario optimization problem (SOP) to address the issue of infeasibility caused by the infinite number of constraints in ROP. The STTs are generated by solving the SOP, and the corresponding closed-form control is designed to fulfill the specified task. Finally, the effectiveness of our approach is demonstrated through two case studies, one involving omnidirectional robots and the other involving multiple drones modelled as Euler-Lagrange systems.
Query-Efficient Zeroth-Order Algorithms for Nonconvex Optimization
Zeroth-order optimization (ZO) has been a powerful framework for solving black-box problems, which estimates gradients using zeroth-order data to update variables iteratively. The practical applicability of ZO critically depends on the efficiency of single-step gradient estimation and the overall query complexity. However, existing ZO algorithms cannot achieve efficiency on both simultaneously. In this work, we consider a general constrained optimization model with black-box objective and constraint functions. To solve it, we propose novel algorithms that can achieve the state-of-the-art overall query complexity bound of $\mathcal{O}(d/\epsilon^4)$ to find an $\epsilon$-stationary solution ($d$ is the dimension of variable space), while reducing the queries for estimating a single-step gradient from $\mathcal{O}(d)$ to $\mathcal{O}(1)$. Specifically, we integrate block updates with gradient descent ascent and a block gradient estimator, which leads to two algorithms, ZOB-GDA and ZOB-SGDA, respectively. Instead of constructing full gradients, they estimate only partial gradients along random blocks of dimensions, where the adjustable block sizes enable high single-step efficiency without sacrificing convergence guarantees. Our theoretical results establish the finite-sample convergence of the proposed algorithms for nonconvex optimization. Finally, numerical experiments on a practical problem demonstrate that our algorithms require over ten times fewer queries than existing methods.
comment: 34 pages, 4 figures
Policy Gradient Method for LQG Control via Input-Output-History Representation: Convergence to $O(ε)$-Stationary Points
We study the policy gradient method (PGM) for the linear quadratic Gaussian (LQG) dynamic output-feedback control problem using an input-output-history (IOH) representation of the closed-loop system. First, we show that any dynamic output-feedback controller is equivalent to a static partial-state feedback gain for a new system representation characterized by a finite-length IOH. Leveraging this equivalence, we reformulate the search for an optimal dynamic output feedback controller as an optimization problem over the corresponding partial-state feedback gain. Next, we introduce a relaxed version of the IOH-based LQG problem by incorporating a small process noise with covariance $\epsilon I$ into the new system to ensure coerciveness, a key condition for establishing gradient-based convergence guarantees. Consequently, we show that a vanilla PGM for the relaxed problem converges to an $\mathcal{O}(\epsilon)$-stationary point, i.e., $\overline{K}$ satisfying $\|\nabla J(\overline{K})\|_F \leq \mathcal{O}(\epsilon)$, where $J$ denotes the original LQG cost. Numerical experiments empirically indicate convergence to the vicinity of the globally optimal LQG controller.
Modeling and Simulation of an Active Car Suspension with a Robust LQR Controller under Road Disturbance, Parameter Uncertainty and White Noise
Vehicle suspension is important for passengers to travel comfortably and to be less exposed to effects such as vibration and shock. A good suspension system increases the road holding of vehicles, allows them to take turns safely, and reduces the risk of traffic accidents. A passive suspension system is the most widely used suspension system in vehicles due to its simple structure and low cost. Passive suspension systems do not have an actuator and therefore do not have a controller. Active suspension systems have an actuator and a controller. Although their structures are more complex and costly, they are safer. PID controller is widely used in active suspension systems due to its simple structure, reasonable cost, and easy adjustment of coefficients. In this study, a more robust LQR-controlled active suspension was designed than a passive suspension and a PID-controlled active suspension. Robustness analyses were performed for passive suspension, PID-controlled active suspension, and LQR-controlled active suspension. Suspension travel, sprung mass acceleration, and sprung mass motion simulations were performed for all three suspensions under road disturbance, under simultaneous road disturbance and parameter uncertainty and under road disturbance with white noise. A comparative analysis was performed by obtaining the rise time, overshoot, and settling time data of the suspensions under different conditions. It was observed that the LQR-controlled active suspension showed the fastest rise time, the least overshoot and had the shortest settling time. In this case, it was proven that the LQR controlled active suspension provided a more comfortable and safe ride compared to the other two suspension systems.
comment: 20 pages, 19 figures
AttentionSwarm: Reinforcement Learning with Attention Control Barier Function for Crazyflie Drones in Dynamic Environments
We introduce AttentionSwarm, a novel benchmark designed to evaluate safe and efficient swarm control in a dynamic drone racing scenario. Central to our approach is the Attention Model-Based Control Barrier Function (CBF) framework, which integrates attention mechanisms with safety-critical control theory to enable real-time collision avoidance and trajectory optimization. This framework dynamically prioritizes critical obstacles and agents in the swarm's vicinity using attention weights, while CBFs formally guarantee safety by enforcing collision-free constraints. The AttentionSwarm algorithm was developed and evaluated using a swarm of Crazyflie 2.1 micro quadrotors, which were tested indoors with the Vicon motion capture system to ensure precise localization and control. Experimental results show that our system achieves a 95-100% collision-free navigation rate in a dynamic multi-agent drone racing environment, underscoring its effectiveness and robustness in real-world scenarios. This work offers a promising foundation for safe, high-speed multi-robot applications in logistics, inspection, and racing.
comment: 6 pages, 6 figures
A Data-Driven Method to Identify Major Contributors to Low-Frequency Oscillations
We present a purely data-driven method to pinpoint generation plants that significantly contribute to poorly damped oscillations as part of post-event analysis. First, Extended Dynamic Mode Decomposition (EDMD) is applied on PMU data from the point of interconnection (POI) of the plants to obtain the finite-dimensional Koopman operator. Then, modal analysis is performed on a reduced-order Koopman operator to extract spatio-temporal patterns. The data-driven eigenvalues and eigenvectors quantify each plant's contribution to critical oscillatory modes without requiring any system model information. We demonstrate the effectiveness of this method through simulated case studies on modified IEEE 39-bus and WECC 179-bus test systems by benchmarking the data-driven results against ground-truth models. Its performance is further validated using PMU data from real oscillation events in the ISO-New England system. This data-driven method offers a practical tool for both planning-stage simulations and post-event analysis of real oscillation events, enabling effective mitigation.
comment: 10 pages, 11 figures, Journal paper.Submitted to IEEE Transactions on Power System
Planning of Off-Grid Renewable Power to Ammonia Systems with Heterogeneous Flexibility: A Multistakeholder Equilibrium Perspective
Off-grid renewable power to ammonia (ReP2A) systems present a promising pathway toward carbon neutrality in both the energy and chemical industries. However, due to chemical safety requirements, the limited flexibility of ammonia synthesis poses a challenge when attempting to align with the variable hydrogen flow produced from renewable power. This necessitates the optimal sizing of equipment capacity for effective and coordinated production across the system. Additionally, an ReP2A system may involve multiple stakeholders with varying degrees of operational flexibility, complicating the planning problem. This paper first examines the multistakeholder sizing equilibrium (MSSE) of the ReP2A system. First, we propose an MSSE model that accounts for individual planning decisions and the competing economic interests of the stakeholders of power generation, hydrogen production, and ammonia synthesis. We then construct an equivalent optimization problem based on Karush-Kuhn-Tucker (KKT) conditions to determine the equilibrium. Following this, we decompose the problem in the temporal dimension and solve it via multicut generalized Benders decomposition (GBD) to address long-term balancing issues. Case studies based on a realistic project reveal that the equilibrium does not naturally balance the interests of all stakeholders due to their heterogeneous characteristics. Our findings suggest that benefit transfer or re-arrangement ensure mutual benefits and the successful implementation of ReP2A projects.
comment: Accepted in IEEE Transactions on Power Systems \copyright2025 IEEE
Optimal Investment Portfolio of Thyristor- and IGBT-based Electrolysis Rectifiers in Utility-scale Renewable P2H Systems
Renewable power-to-hydrogen (ReP2H) systems require rectifiers to supply power to electrolyzers (ELZs). Two main types of rectifiers, insulated-gate bipolar transistor rectifiers (IGBT-Rs) and thyristor rectifiers (TRs), offer distinct tradeoffs. IGBT-Rs provide flexible reactive power control but are costly, whereas TRs are more affordable with lower power loss but consume a large amount of uncontrollable reactive power. A mixed configuration of rectifiers in utility-scale ReP2H systems could achieve a decent tradeoff and increase overall profitability. To explore this potential, this paper proposes an optimal investment portfolio model. First, we model and compare the active and reactive power characteristics of ELZs powered by TRs and IGBT-Rs. Second, we consider the investment of ELZs, rectifiers, and var resources and coordinate the operation of renewables, energy storage, var resources, and the on-off switching and load allocation of multiple ELZs. Subsequently, a two-stage stochastic programming (SP) model based on weighted information gap decision theory (W-IGDT) is developed to address the uncertainties of the renewable power and hydrogen price, and we apply the progressive hedging (PH) algorithm to accelerate its solution. Case studies demonstrate that optimal rectifier configurations increase revenue by at most 13.78% compared with configurations using only TRs or IGBT-Rs, existing project setups, or intuitive designs. Under the optimal portfolio, reactive power compensation investment is nearly eliminated, with a preferred TR-to-IGBT-R ratio of 3:1.
comment: Accepted in IEEE Transactions on Sustainable Energy \copyright 2025 IEEE
QPPG: Quantum-Preconditioned Policy Gradient for Link Adaptation in Rayleigh Fading Channels
Reliable link adaptation is critical for efficient wireless communications in dynamic fading environments. However, reinforcement learning (RL) solutions often suffer from unstable convergence due to poorly conditioned policy gradients, hindering their practical application. We propose the quantum-preconditioned policy gradient (QPPG) algorithm, which leverages Fisher-information-based preconditioning to stabilise and accelerate policy updates. Evaluations in Rayleigh fading scenarios show that QPPG achieves faster convergence, a 28.6% increase in average throughput, and a 43.8% decrease in average transmit power compared to classical methods. This work introduces quantum-geometric conditioning to link adaptation, marking a significant advance in developing robust, quantum-inspired reinforcement learning for future 6G networks, thereby enhancing communication reliability and energy efficiency.
comment: Submitted to IEEE Wireless Communications Letters
Time-causal and time-recursive wavelets
When to apply wavelet analysis to real-time temporal signals, where the future cannot be accessed, it is essential to base all the steps in the signal processing pipeline on computational mechanisms that are truly time-causal. This paper describes how a time-causal wavelet analysis can be performed based on concepts developed in the area of temporal scale-space theory, originating from a complete classification of temporal smoothing kernels that guarantee non-creation of new structures from finer to coarser temporal scale levels. By necessity, convolution with truncated exponential kernels in cascade constitutes the only permissable class of kernels, as well as their temporal derivatives as a natural complement to fulfil the admissibility conditions of wavelet representations. For a particular way of choosing the time constants in the resulting infinite convolution of truncated exponential kernels, to ensure temporal scale covariance and thus self-similarity over temporal scales, we describe how mother wavelets can be chosen as temporal derivatives of the resulting time-causal limit kernel. By developing connections between wavelet theory and scale-space theory, we characterize and quantify how the continuous scaling properties transfer to the discrete implementation, demonstrating how the proposed time-causal wavelet representation can reflect the duration of locally dominant temporal structures in the input signals. We propose that this notion of time-causal wavelet analysis could be a valuable tool for signal processing tasks, where streams of signals are to be processed in real time, specifically for signals that may contain local variations over a rich span of temporal scales, or more generally for analysing physical or biophysical temporal phenomena, where a fully time-causal analysis is called for to be physically realistic.
comment: 25 pages, 10 figures
Physics-Informed Reinforcement Learning for Large-Scale EV Smart Charging Considering Distribution Network Voltage Constraints
Electric Vehicles (EVs) offer substantial flexibility for grid services, yet large-scale, uncoordinated charging can threaten voltage stability in distribution networks. Existing Reinforcement Learning (RL) approaches for smart charging often disregard physical grid constraints or have limited performance for complex large-scale tasks, limiting their scalability and real-world applicability. This paper introduces a physics-informed (PI) RL algorithm that integrates a differentiable power flow model and voltage-based reward design into the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm, enabling EVs to deliver real-time voltage support while meeting user demands. The resulting PI-TD3 algorithm achieves faster convergence, improved sample efficiency, and reliable voltage magnitude regulation under uncertain and overloaded conditions. Benchmarks on the IEEE 34-bus and 123-bus networks show that the proposed PI-TD3 outperforms both model-free RL and optimization-based baselines in grid constraint management, user satisfaction, and economic metrics, even as the system scales to hundreds of EVs. These advances enable robust, scalable, and practical EV charging strategies that enhance grid resilience and support distribution networks operation.
Optimization and Control Technologies for Renewable-Dominated Hydrogen-Blended Integrated Gas-Electricity System: A Review
The growing coupling among electricity, gas, and hydrogen systems is driven by green hydrogen blending into existing natural gas pipelines, paving the way toward a renewable-dominated energy future. However, the integration poses significant challenges, particularly ensuring efficient and safe operation under varying hydrogen penetration and infrastructure adaptability. This paper reviews progress in optimization and control technologies for hydrogen-blended integrated gas-electricity system. First, key technologies and international demonstration projects are introduced to provide an overview of current developments. Besides, advances in gas-electricity system integration, including modeling, scheduling, planning and market design, are reviewed respectively. Then, the potential for cross-system fault propagation is highlighted, and practical methods for safety analysis and control are proposed. Finally, several possible research directions are introduced, aiming to ensure efficient renewable integration and reliable operation.
comment: Accepted by CSEE Journal of Power and Energy Systems in Oct. 2025
Robotics
MADR: MPC-guided Adversarial DeepReach
Hamilton-Jacobi (HJ) Reachability offers a framework for generating safe value functions and policies in the face of adversarial disturbance, but is limited by the curse of dimensionality. Physics-informed deep learning is able to overcome this infeasibility, but itself suffers from slow and inaccurate convergence, primarily due to weak PDE gradients and the complexity of self-supervised learning. A few works, recently, have demonstrated that enriching the self-supervision process with regular supervision (based on the nature of the optimal control problem), greatly accelerates convergence and solution quality, however, these have been limited to single player problems and simple games. In this work, we introduce MADR: MPC-guided Adversarial DeepReach, a general framework to robustly approximate the two-player, zero-sum differential game value function. In doing so, MADR yields the corresponding optimal strategies for both players in zero-sum games as well as safe policies for worst-case robustness. We test MADR on a multitude of high-dimensional simulated and real robotic agents with varying dynamics and games, finding that our approach significantly out-performs state-of-the-art baselines in simulation and produces impressive results in hardware.
comment: 8 pages, under review
Online Object-Level Semantic Mapping for Quadrupeds in Real-World Environments
We present an online semantic object mapping system for a quadruped robot operating in real indoor environments, turning sensor detections into named objects in a global map. During a run, the mapper integrates range geometry with camera detections, merges co-located detections within a frame, and associates repeated detections into persistent object instances across frames. Objects remain in the map when they are out of view, and repeated sightings update the same instance rather than creating duplicates. The output is a compact object layer that can be queried (class, pose, and confidence), is integrated with the occupancy map and readable by a planner. In on-robot tests, the layer remained stable across viewpoint changes.
comment: Published at the Italian Conference on Robotics and Intelligent Machines (I-RIM) 3D, 2025
Sharing the Load: Distributed Model-Predictive Control for Precise Multi-Rover Cargo Transport
For autonomous cargo transportation, teams of mobile robots can provide more operational flexibility than a single large robot. In these scenarios, precision in both inter-vehicle distance and path tracking is key. With this motivation, we develop a distributed model-predictive controller (MPC) for multi-vehicle cargo operations that builds on the precise path-tracking of lidar teach and repeat. To carry cargo, a following vehicle must maintain a Euclidean distance offset from a lead vehicle regardless of the path curvature. Our approach uses a shared map to localize the robots relative to each other without GNSS or direct observations. We compare our approach to a centralized MPC and a baseline approach that directly measures the inter-vehicle distance. The distributed MPC shows equivalent nominal performance to the more complex centralized MPC. Using a direct measurement of the relative distance between the leader and follower shows improved tracking performance in close-range scenarios but struggles with long-range offsets. The operational flexibility provided by distributing the computation makes it well suited for real deployments. We evaluate four types of convoyed path trackers with over 10 km of driving in a coupled convoy. With convoys of two and three rovers, the proposed distributed MPC method works in real-time to allow map-based convoying to maintain maximum spacing within 20 cm of the target in various conditions.
comment: 8 pages, 4 figures
Event-Grounding Graph: Unified Spatio-Temporal Scene Graph from Robotic Observations
A fundamental aspect for building intelligent autonomous robots that can assist humans in their daily lives is the construction of rich environmental representations. While advances in semantic scene representations have enriched robotic scene understanding, current approaches lack a connection between spatial features and dynamic events; e.g., connecting the blue mug to the event washing a mug. In this work, we introduce the event-grounding graph (EGG), a framework grounding event interactions to spatial features of a scene. This representation allows robots to perceive, reason, and respond to complex spatio-temporal queries. Experiments using real robotic data demonstrate EGG's capability to retrieve relevant information and respond accurately to human inquiries concerning the environment and events within. Furthermore, the EGG framework's source code and evaluation dataset are released as open-source at: https://github.com/aalto-intelligent-robotics/EGG.
comment: Submitted to RA-L
Towards An Adaptive Locomotion Strategy For Quadruped Rovers: Quantifying When To Slide Or Walk On Planetary Slopes
Legged rovers provide enhanced mobility compared to wheeled platforms, enabling navigation on steep and irregular planetary terrains. However, traditional legged locomotion might be energetically inefficient and potentially dangerous to the rover on loose and inclined surfaces, such as crater walls and cave slopes. This paper introduces a preliminary study that compares the Cost of Transport (CoT) of walking and torso-based sliding locomotion for quadruped robots across different slopes, friction conditions and speed levels. By identifying intersections between walking and sliding CoT curves, we aim to define threshold conditions that may trigger transitions between the two strategies. The methodology combines physics-based simulations in Isaac Sim with particle interaction validation in ANSYS-Rocky. Our results represent an initial step towards adaptive locomotion strategies for planetary legged rovers.
comment: Published at the 18th Symposium on Advanced Space Technologies in Robotics and Automation (ASTRA 2025)
Least Restrictive Hyperplane Control Barrier Functions
Control Barrier Functions (CBFs) can provide provable safety guarantees for dynamic systems. However, finding a valid CBF for a system of interest is often non-trivial, especially if the shape of the unsafe region is complex and the CBFs are of higher order. A common solution to this problem is to make a conservative approximation of the unsafe region in the form of a line/hyperplane, and use the corresponding conservative Hyperplane-CBF when deciding on safe control actions. In this letter, we note that conservative constraints are only a problem if they prevent us from doing what we want. Thus, instead of first choosing a CBF and then choosing a safe control with respect to the CBF, we optimize over a combination of CBFs and safe controls to get as close as possible to our desired control, while still having the safety guarantee provided by the CBF. We call the corresponding CBF the least restrictive Hyperplane-CBF. Finally, we also provide a way of creating a smooth parameterization of the CBF-family for the optimization, and illustrate the approach on a double integrator dynamical system with acceleration constraints, moving through a group of arbitrarily shaped static and moving obstacles.
C-SWAP: Explainability-Aware Structured Pruning for Efficient Neural Networks Compression BMVC2025
Neural network compression has gained increasing attention in recent years, particularly in computer vision applications, where the need for model reduction is crucial for overcoming deployment constraints. Pruning is a widely used technique that prompts sparsity in model structures, e.g. weights, neurons, and layers, reducing size and inference costs. Structured pruning is especially important as it allows for the removal of entire structures, which further accelerates inference time and reduces memory overhead. However, it can be computationally expensive, requiring iterative retraining and optimization. To overcome this problem, recent methods considered one-shot setting, which applies pruning directly at post-training. Unfortunately, they often lead to a considerable drop in performance. In this paper, we focus on this issue by proposing a novel one-shot pruning framework that relies on explainable deep learning. First, we introduce a causal-aware pruning approach that leverages cause-effect relations between model predictions and structures in a progressive pruning process. It allows us to efficiently reduce the size of the network, ensuring that the removed structures do not deter the performance of the model. Then, through experiments conducted on convolution neural network and vision transformer baselines, pre-trained on classification tasks, we demonstrate that our method consistently achieves substantial reductions in model size, with minimal impact on performance, and without the need for fine-tuning. Overall, our approach outperforms its counterparts, offering the best trade-off. Our code is available on GitHub.
comment: 10 pages, BMVC2025
A Compositional Paradigm for Foundation Models: Towards Smarter Robotic Agents
The birth of Foundation Models brought unprecedented results in a wide range of tasks, from language to vision, to robotic control. These models are able to process huge quantities of data, and can extract and develop rich representations, which can be employed across different domains and modalities. However, they still have issues in adapting to dynamic, real-world scenarios without retraining the entire model from scratch. In this work, we propose the application of Continual Learning and Compositionality principles to foster the development of more flexible, efficient and smart AI solutions.
Quadrupeds for Planetary Exploration: Field Testing Control Algorithms on an Active Volcano
Missions such as the Ingenuity helicopter have shown the advantages of using novel locomotion modes to increase the scientific return of planetary exploration missions. Legged robots can further expand the reach and capability of future planetary missions by traversing more difficult terrain than wheeled rovers, such as jumping over cracks on the ground or traversing rugged terrain with boulders. To develop and test algorithms for using quadruped robots, the AAPLE project was carried out at DFKI. As part of the project, we conducted a series of field experiments on the Volcano on the Aeolian island of Vulcano, an active stratovolcano near Sicily, Italy. The experiments focused on validating newly developed state-of-the-art adaptive optimal control algorithms for quadrupedal locomotion in a high-fidelity analog environment for Lunar and Martian surfaces. This paper presents the technical approach, test plan, software architecture, field deployment strategy, and evaluation results from the Vulcano campaign.
comment: Presented at 18th Symposium on Advanced Space Technologies in Robotics and Automation (ASTRA)
Flexbee: A Grasping and Perching UAV Based on Soft Vector-Propulsion Nozzle
The aim of this paper is to design a new type of grasping and perching unmanned aerial vehicle (UAV), called Flexbee, which features a soft vector-propulsion nozzle (SVPN). Compared to previous UAVs, Flexbee integrates flight, grasping, and perching functionalities into the four SVPNs. This integration offers advantages including decoupled position and attitude control, high structural reuse, and strong adaptability strong adaptability for grasping and perching. A dynamics model of Flexbee has been developed, and the nonlinear coupling issue of the moment has been resolved through linearization of the equivalent moment model. A hierarchical control strategy was used to design controllers for the two operational modes of Flexbee. Finally, flight, grasping, and perching experiments were conducted to validate Flexbee's kinematic capabilities and the effectiveness of the control strategy.
comment: 11 pages, 17 figures
EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval NeurIPS 2025
Object-goal navigation (ObjNav) tasks an agent with navigating to the location of a specific object in an unseen environment. Embodied agents equipped with large language models (LLMs) and online constructed navigation maps can perform ObjNav in a zero-shot manner. However, existing agents heavily rely on giant LLMs on the cloud, e.g., GPT-4, while directly switching to small LLMs, e.g., LLaMA3.2-11b, suffer from significant success rate drops due to limited model capacity for understanding complex navigation maps, which prevents deploying ObjNav on local devices. At the same time, the long prompt introduced by the navigation map description will cause high planning latency on local devices. In this paper, we propose EfficientNav to enable on-device efficient LLM-based zero-shot ObjNav. To help the smaller LLMs better understand the environment, we propose semantics-aware memory retrieval to prune redundant information in navigation maps. To reduce planning latency, we propose discrete memory caching and attention-based memory clustering to efficiently save and re-use the KV cache. Extensive experimental results demonstrate that EfficientNav achieves 11.1% improvement in success rate on HM3D benchmark over GPT-4-based baselines, and demonstrates 6.7x real-time latency reduction and 4.7x end-to-end latency reduction over GPT-4 planner. Our code will be released soon.
comment: NeurIPS 2025
Efficient Model-Based Reinforcement Learning for Robot Control via Online Learning
We present an online model-based reinforcement learning algorithm suitable for controlling complex robotic systems directly in the real world. Unlike prevailing sim-to-real pipelines that rely on extensive offline simulation and model-free policy optimization, our method builds a dynamics model from real-time interaction data and performs policy updates guided by the learned dynamics model. This efficient model-based reinforcement learning scheme significantly reduces the number of samples to train control policies, enabling direct training on real-world rollout data. This significantly reduces the influence of bias in the simulated data, and facilitates the search for high-performance control policies. We adopt online learning analysis to derive sublinear regret bounds under standard stochastic online optimization assumptions, providing formal guarantees on performance improvement as more interaction data are collected. Experimental evaluations were performed on a hydraulic excavator arm and a soft robot arm, where the algorithm demonstrates strong sample efficiency compared to model-free reinforcement learning methods, reaching comparable performance within hours. Robust adaptation to shifting dynamics was also observed when the payload condition was randomized. Our approach paves the way toward efficient and reliable on-robot learning for a broad class of challenging control tasks.
MPC-based motion planning for non-holonomic systems in non-convex domains
Motivated by the application of using model predictive control (MPC) for motion planning of autonomous mobile robots, a form of output tracking MPC for non-holonomic systems and with non-convex constraints is studied. Although the advantages of using MPC for motion planning have been demonstrated in several papers, in most of the available fundamental literature on output tracking MPC it is assumed, often implicitly, that the model is holonomic and generally the state or output constraints must be convex. Thus, in application-oriented publications, empirical results dominate and the topic of proving completeness, in particular under which assumptions the target is always reached, has received comparatively little attention. To address this gap, we present a novel MPC formulation that guarantees convergence to the desired target under realistic assumptions, which can be verified in relevant real-world scenarios.
comment: Preprint of ECC 2025 submission
Biomechanically consistent real-time action recognition for human-robot interaction
This paper presents a novel framework for real-time human action recognition in industrial contexts, using standard 2D cameras. We introduce a complete pipeline for robust and real-time estimation of human joint kinematics, input to a temporally smoothed Transformer-based network, for action recognition. We rely on a new dataset including 11 subjects performing various actions, to evaluate our approach. Unlike most of the literature that relies on joint center positions (JCP) and is offline, ours uses biomechanical prior, eg. joint angles, for fast and robust real-time recognition. Besides, joint angles make the proposed method agnostic to sensor and subject poses as well as to anthropometric differences, and ensure robustness across environments and subjects. Our proposed learning model outperforms the best baseline model, running also in real-time, along various metrics. It achieves 88% accuracy and shows great generalization ability, for subjects not facing the cameras. Finally, we demonstrate the robustness and usefulness of our technique, through an online interaction experiment, with a simulated robot controlled in real-time via the recognized actions.
MMRHP: A Miniature Mixed-Reality HIL Platform for Auditable Closed-Loop Evaluation
Validation of autonomous driving systems requires a trade-off between test fidelity, cost, and scalability. While miniaturized hardware-in-the-loop (HIL) platforms have emerged as a promising solution, a systematic framework supporting rigorous quantitative analysis is generally lacking, limiting their value as scientific evaluation tools. To address this challenge, we propose MMRHP, a miniature mixed-reality HIL platform that elevates miniaturized testing from functional demonstration to rigorous, reproducible quantitative analysis. The core contributions are threefold. First, we propose a systematic three-phase testing process oriented toward the Safety of the Intended Functionality(SOTIF)standard, providing actionable guidance for identifying the performance limits and triggering conditions of otherwise correctly functioning systems. Second, we design and implement a HIL platform centered around a unified spatiotemporal measurement core to support this process, ensuring consistent and traceable quantification of physical motion and system timing. Finally, we demonstrate the effectiveness of this solution through comprehensive experiments. The platform itself was first validated, achieving a spatial accuracy of 10.27 mm RMSE and a stable closed-loop latency baseline of approximately 45 ms. Subsequently, an in-depth Autoware case study leveraged this validated platform to quantify its performance baseline and identify a critical performance cliff at an injected latency of 40 ms. This work shows that a structured process, combined with a platform offering a unified spatio-temporal benchmark, enables reproducible, interpretable, and quantitative closed-loop evaluation of autonomous driving systems.
PGTT: Phase-Guided Terrain Traversal for Perceptive Legged Locomotion
State-of-the-art perceptive Reinforcement Learning controllers for legged robots either (i) impose oscillator or IK-based gait priors that constrain the action space, add bias to the policy optimization and reduce adaptability across robot morphologies, or (ii) operate "blind", which struggle to anticipate hind-leg terrain, and are brittle to noise. In this paper, we propose Phase-Guided Terrain Traversal (PGTT), a perception-aware deep-RL approach that overcomes these limitations by enforcing gait structure purely through reward shaping, thereby reducing inductive bias in policy learning compared to oscillator/IK-conditioned action priors. PGTT encodes per-leg phase as a cubic Hermite spline that adapts swing height to local heightmap statistics and adds a swing-phase contact penalty, while the policy acts directly in joint space supporting morphology-agnostic deployment. Trained in MuJoCo (MJX) on procedurally generated stair-like terrains with curriculum and domain randomization, PGTT achieves the highest success under push disturbances (median +7.5% vs. the next best method) and on discrete obstacles (+9%), with comparable velocity tracking, and converging to an effective policy roughly 2x faster than strong end-to-end baselines. We validate PGTT on a Unitree Go2 using a real-time LiDAR elevation-to-heightmap pipeline, and we report preliminary results on ANYmal-C obtained with the same hyperparameters. These findings indicate that terrain-adaptive, phase-guided reward shaping is a simple and general mechanism for robust perceptive locomotion across platforms.
comment: 9 pages, 9 figures, 2 tables
Coverage-Recon: Coordinated Multi-Drone Image Sampling with Online Map Feedback
This article addresses collaborative 3D map reconstruction using multiple drones. Achieving high-quality reconstruction requires capturing images of keypoints within the target scene from diverse viewing angles, and coverage control offers an effective framework to meet this requirement. Meanwhile, recent advances in real-time 3D reconstruction algorithms make it possible to render an evolving map during flight, enabling immediate feedback to guide drone motion. Building on this, we present Coverage-Recon, a novel coordinated image sampling algorithm that integrates online map feedback to improve reconstruction quality on-the-fly. In Coverage-Recon, the coordinated motion of drones is governed by a Quadratic Programming (QP)-based angle-aware coverage controller, which ensures multi-viewpoint image capture while enforcing safety constraints. The captured images are processed in real time by the NeuralRecon algorithm to generate an evolving 3D mesh. Mesh changes across the scene are interpreted as indicators of reconstruction uncertainty and serve as feedback to update the importance index of the coverage control as the map evolves. The effectiveness of Coverage-Recon is validated through simulation and experiments, demonstrating both qualitatively and quantitatively that incorporating online map feedback yields more complete and accurate 3D reconstructions than conventional methods. Project page: https://htnk-lab.github.io/coverage-recon/
comment: Submitted to IEEE Transactions on Control Systems Technology (under review). Project page: https://htnk-lab.github.io/coverage-recon/
MoTVLA: A Vision-Language-Action Model with Unified Fast-Slow Reasoning
Integrating visual-language instructions into visuomotor policies is gaining momentum in robot learning for enhancing open-world generalization. Despite promising advances, existing approaches face two challenges: limited language steerability when no generated reasoning is used as a condition, or significant inference latency when reasoning is incorporated.In this work, we introduce MoTVLA, a mixture-of-transformers (MoT)-based vision-language-action (VLA) model that integrates fast-slow unified reasoning with behavior policy learning. MoTVLA preserves the general intelligence of pre-trained VLMs (serving as the generalist) for tasks such as perception, scene understanding, and semantic planning, while incorporating a domain expert, a second transformer that shares knowledge with the pretrained VLM, to generate domain-specific fast reasoning (e.g., robot motion decomposition), thereby improving policy execution efficiency. By conditioning the action expert on decomposed motion instructions, MoTVLA can learn diverse behaviors and substantially improve language steerability. Extensive evaluations across natural language processing benchmarks, robotic simulation environments, and real-world experiments confirm the superiority of MoTVLA in both fast-slow reasoning and manipulation task performance.
MoMaGen: Generating Demonstrations under Soft and Hard Constraints for Multi-Step Bimanual Mobile Manipulation
Imitation learning from large-scale, diverse human demonstrations has proven effective for training robots, but collecting such data is costly and time-consuming. This challenge is amplified for multi-step bimanual mobile manipulation, where humans must teleoperate both a mobile base and two high-degree-of-freedom arms. Prior automated data generation frameworks have addressed static bimanual manipulation by augmenting a few human demonstrations in simulation, but they fall short for mobile settings due to two key challenges: (1) determining base placement to ensure reachability, and (2) positioning the camera to provide sufficient visibility for visuomotor policies. To address these issues, we introduce MoMaGen, which formulates data generation as a constrained optimization problem that enforces hard constraints (e.g., reachability) while balancing soft constraints (e.g., visibility during navigation). This formulation generalizes prior approaches and provides a principled foundation for future methods. We evaluate MoMaGen on four multi-step bimanual mobile manipulation tasks and show that it generates significantly more diverse datasets than existing methods. Leveraging this diversity, MoMaGen can train successful imitation learning policies from a single source demonstration, and these policies can be fine-tuned with as few as 40 real-world demonstrations to achieve deployment on physical robotic hardware. More details are available at our project page: momagen.github.io.
comment: Project website: momagen.github.io. The first four authors contribute equally
A Cross-Environment and Cross-Embodiment Path Planning Framework via a Conditional Diffusion Model
Path planning for a robotic system in high-dimensional cluttered environments needs to be efficient, safe, and adaptable for different environments and hardware. Conventional methods face high computation time and require extensive parameter tuning, while prior learning-based methods still fail to generalize effectively. The primary goal of this research is to develop a path planning framework capable of generalizing to unseen environments and new robotic manipulators without the need for retraining. We present GADGET (Generalizable and Adaptive Diffusion-Guided Environment-aware Trajectory generation), a diffusion-based planning model that generates joint-space trajectories conditioned on voxelized scene representations as well as start and goal configurations. A key innovation is GADGET's hybrid dual-conditioning mechanism that combines classifier-free guidance via learned scene encoding with classifier-guided Control Barrier Function (CBF) safety shaping, integrating environment awareness with real-time collision avoidance directly in the denoising process. This design supports zero-shot transfer to new environments and robotic embodiments without retraining. Experimental results show that GADGET achieves high success rates with low collision intensity in spherical-obstacle, bin-picking, and shelf environments, with CBF guidance further improving safety. Moreover, comparative evaluations indicate strong performance relative to both sampling-based and learning-based baselines. Furthermore, GADGET provides transferability across Franka Panda, Kinova Gen3 (6/7-DoF), and UR5 robots, and physical execution on a Kinova Gen3 demonstrates its ability to generate safe, collision-free trajectories in real-world settings.
comment: 20 pages, 9 figures
Safe Active Navigation and Exploration for Planetary Environments Using Proprioceptive Measurements
Legged robots can sense terrain through force interactions during locomotion, offering more reliable traversability estimates than remote sensing and serving as scouts for guiding wheeled rovers in challenging environments. However, even legged scouts face challenges when traversing highly deformable or unstable terrain. We present Safe Active Exploration for Granular Terrain (SAEGT), a navigation framework that enables legged robots to safely explore unknown granular environments using proprioceptive sensing, particularly where visual input fails to capture terrain deformability. SAEGT estimates the safe region and frontier region online from leg-terrain interactions using Gaussian Process regression for traversability assessment, with a reactive controller for real-time safe exploration and navigation. SAEGT demonstrated its ability to safely explore and navigate toward a specified goal using only proprioceptively estimated traversability in simulation.
Kinematic Analysis and Integration of Vision Algorithms for a Mobile Manipulator Employed Inside a Self-Driving Laboratory
Recent advances in robotics and autonomous systems have broadened the use of robots in laboratory settings, including automated synthesis, scalable reaction workflows, and collaborative tasks in self-driving laboratories (SDLs). This paper presents a comprehensive development of a mobile manipulator designed to assist human operators in such autonomous lab environments. Kinematic modeling of the manipulator is carried out based on the Denavit Hartenberg (DH) convention and inverse kinematics solution is determined to enable precise and adaptive manipulation capabilities. A key focus of this research is enhancing the manipulator ability to reliably grasp textured objects as a critical component of autonomous handling tasks. Advanced vision-based algorithms are implemented to perform real-time object detection and pose estimation, guiding the manipulator in dynamic grasping and following tasks. In this work, we integrate a vision method that combines feature-based detection with homography-driven pose estimation, leveraging depth information to represent an object pose as a $2$D planar projection within $3$D space. This adaptive capability enables the system to accommodate variations in object orientation and supports robust autonomous manipulation across diverse environments. By enabling autonomous experimentation and human-robot collaboration, this work contributes to the scalability and reproducibility of next-generation chemical laboratories
comment: International Journal of Intelligent Robotics and Applications 2025
Sample-Based Hybrid Mode Control: Asymptotically Optimal Switching of Algorithmic and Non-Differentiable Control Modes
This paper investigates a sample-based solution to the hybrid mode control problem across non-differentiable and algorithmic hybrid modes. Our approach reasons about a set of hybrid control modes as an integer-based optimization problem where we select what mode to apply, when to switch to another mode, and the duration for which we are in a given control mode. A sample-based variation is derived to efficiently search the integer domain for optimal solutions. We find our formulation yields strong performance guarantees that can be applied to a number of robotics-related tasks. In addition, our approach is able to synthesize complex algorithms and policies to compound behaviors and achieve challenging tasks. Last, we demonstrate the effectiveness of our approach in real-world robotic examples that require reactive switching between long-term planning and high-frequency control.
Local Guidance for Configuration-Based Multi-Agent Pathfinding
Guidance is an emerging concept that improves the empirical performance of real-time, sub-optimal multi-agent pathfinding (MAPF) methods. It offers additional information to MAPF algorithms to mitigate congestion on a global scale by considering the collective behavior of all agents across the entire workspace. This global perspective helps reduce agents' waiting times, thereby improving overall coordination efficiency. In contrast, this study explores an alternative approach: providing local guidance in the vicinity of each agent. While such localized methods involve recomputation as agents move and may appear computationally demanding, we empirically demonstrate that supplying informative spatiotemporal cues to the planner can significantly improve solution quality without exceeding a moderate time budget. When applied to LaCAM, a leading configuration-based solver, this form of guidance establishes a new performance frontier for MAPF.
comment: 10 pages
A Learning-based Model Reference Adaptive Controller Implemented on a Prosthetic Hand Wrist
The functionality and natural motion of prosthetic hands remain limited by the challenges in controlling compliant wrist mechanisms. Current control strategies often lack adaptability and incur high computational costs, which impedes real-time deployment in assistive robotics. To address this gap, this study presents a computationally efficient Neural Network (NN)-based Model Reference Adaptive Controller (MRAC) for a tendon-driven soft continuum wrist integrated with a prosthetic hand. The dynamic modeling of the wrist is formulated using Timoshenko beam theory, capturing both shear and bending deformations. The proposed NN-MRAC estimates the required tendon forces from deflection errors and minimizes deviation from a reference model through online adaptation. Simulation results demonstrate improved precision with a root mean square error (RMSE) of $6.14 \times 10^{-4}$ m and a settling time of $3.2$s. Experimental validations confirm real-time applicability, with an average RMSE of $5.66 \times 10^{-3}$ m, steady-state error of $8.05 \times 10^{-3}$ m, and settling time of $1.58$ s. These results highlight the potential of the controller to enhance motion accuracy and responsiveness in soft prosthetic systems, thereby advancing the integration of adaptive intelligent control in wearable assistive devices.
comment: International Conference on Social Robotics + AI
Convex Maneuver Planning for Spacecraft Collision Avoidance
Conjunction analysis and maneuver planning for spacecraft collision avoidance remains a manual and time-consuming process, typically involving repeated forward simulations of hand-designed maneuvers. With the growing density of satellites in low-Earth orbit (LEO), autonomy is becoming essential for efficiently evaluating and mitigating collisions. In this work, we present an algorithm to design low-thrust collision-avoidance maneuvers for short-term conjunction events. We first formulate the problem as a nonconvex quadratically-constrained quadratic program (QCQP), which we then relax into a convex semidefinite program (SDP) using Shor's relaxation. We demonstrate empirically that the relaxation is tight, which enables the recovery of globally optimal solutions to the original nonconvex problem. Our formulation produces a minimum-energy solution while ensuring a desired probability of collision at the time of closest approach. Finally, if the desired probability of collision cannot be satisfied, we relax this constraint into a penalty, yielding a minimum-risk solution. We validate our algorithm with a high-fidelity simulation of a satellite conjunction in low-Earth orbit with a simulated conjunction data message (CDM), demonstrating its effectiveness in reducing collision risk.
comment: 8 pages, 6 figures, Accepted to International Space Robotics Conference
Macroscopic EEG Reveals Discriminative Low-Frequency Oscillations in Plan-to-Grasp Visuomotor Tasks
The vision-based grasping brain network integrates visual perception with cognitive and motor processes for visuomotor tasks. While invasive recordings have successfully decoded localized neural activity related to grasp type planning and execution, macroscopic neural activation patterns captured by noninvasive electroencephalography (EEG) remain far less understood. We introduce a novel vision-based grasping platform to investigate grasp-type-specific (precision, power, no-grasp) neural activity across large-scale brain networks using EEG neuroimaging. The platform isolates grasp-specific planning from its associated execution phases in naturalistic visuomotor tasks, where the Filter-Bank Common Spatial Pattern (FBCSP) technique was designed to extract discriminative frequency-specific features within each phase. Support vector machine (SVM) classification discriminated binary (precision vs. power, grasp vs. no-grasp) and multiclass (precision vs. power vs. no-grasp) scenarios for each phase, and were compared against traditional Movement-Related Cortical Potential (MRCP) methods. Low-frequency oscillations (0.5-8 Hz) carry grasp-related information established during planning and maintained throughout execution, with consistent classification performance across both phases (75.3-77.8\%) for precision vs. power discrimination, compared to 61.1\% using MRCP. Higher-frequency activity (12-40 Hz) showed phase-dependent results with 93.3\% accuracy for grasp vs. no-grasp classification but 61.2\% for precision vs. power discrimination. Feature importance using SVM coefficients identified discriminative features within frontoparietal networks during planning and motor networks during execution. This work demonstrated the role of low-frequency oscillations in decoding grasp type during planning using noninvasive EEG.
comment: 12 pages, 8 figures, 1 table
Motion Planning and Control of an Overactuated 4-Wheel Drive with Constrained Independent Steering
This paper addresses motion planning and con- trol of an overactuated 4-wheel drive train with independent steering (4WIS) where mechanical constraints prevent the wheels from executing full 360-degree rotations (swerve). The configuration space of such a robot is constrained and contains discontinuities that affect the smoothness of the robot motion. We introduce a mathematical formulation of the steering constraints and derive discontinuity planes that partition the velocity space into regions of smooth and efficient motion. We further design the motion planner for path tracking and ob- stacle avoidance that explicitly accounts for swerve constraints and the velocity transition smoothness. The motion controller uses local feedback to generate actuation from the desired velocity, while properly handling the discontinuity crossing by temporarily stopping the motion and repositioning the wheels. We implement the proposed motion planner as an extension to ROS Navigation package and evaluate the system in simulation and on a physical robot.
comment: 7 pages, 5 figures, 3 tables, video available at https://youtu.be/8l9s2Wb_vec, To appear at IEEE 2025 International Conference on Advanced Robotics
Robust Driving QA through Metadata-Grounded Context and Task-Specific Prompts
We present a two-phase vision-language QA system for autonomous driving that answers high-level perception, prediction, and planning questions. In Phase-1, a large multimodal LLM (Qwen2.5-VL-32B) is conditioned on six-camera inputs, a short temporal window of history, and a chain-of-thought prompt with few-shot exemplars. A self-consistency ensemble (multiple sampled reasoning chains) further improves answer reliability. In Phase-2, we augment the prompt with nuScenes scene metadata (object annotations, ego-vehicle state, etc.) and category-specific question instructions (separate prompts for perception, prediction, planning tasks). In experiments on a driving QA benchmark, our approach significantly outperforms the baseline Qwen2.5 models. For example, using 5 history frames and 10-shot prompting in Phase-1 yields 65.1% overall accuracy (vs.62.61% with zero-shot); applying self-consistency raises this to 66.85%. Phase-2 achieves 67.37% overall. Notably, the system maintains 96% accuracy under severe visual corruption. These results demonstrate that carefully engineered prompts and contextual grounding can greatly enhance high-level driving QA with pretrained vision-language models.
$\nabla$-SDF: Learning Euclidean Signed Distance Functions Online with Gradient-Augmented Octree Interpolation and Neural Residual
Estimation of signed distance functions (SDFs) from point cloud data has been shown to benefit many robot autonomy capabilities, including localization, mapping, motion planning, and control. Methods that support online and large-scale SDF reconstruction tend to rely on discrete volumetric data structures, which affect the continuity and differentiability of the SDF estimates. Recently, using implicit features, neural network methods have demonstrated high-fidelity and differentiable SDF reconstruction but they tend to be less efficient, can experience catastrophic forgetting and memory limitations in large environments, and are often restricted to truncated SDFs. This work proposes $\nabla$-SDF, a hybrid method that combines an explicit prior obtained from gradient-augmented octree interpolation with an implicit neural residual. Our method achieves non-truncated (Euclidean) SDF reconstruction with computational and memory efficiency comparable to volumetric methods and differentiability and accuracy comparable to neural network methods. Extensive experiments demonstrate that \methodname{} outperforms the state of the art in terms of accuracy and efficiency, providing a scalable solution for downstream tasks in robotics and computer vision.
SHRUMS: Sensor Hallucination for Real-time Underwater Motion Planning with a Compact 3D Sonar
Autonomous navigation in 3D is a fundamental problem for autonomy. Despite major advancements in terrestrial and aerial settings due to improved range sensors including LiDAR, compact sensors with similar capabilities for underwater robots have only recently become available, in the form of 3D sonars. This paper introduces a novel underwater 3D navigation pipeline, called SHRUMS (Sensor Hallucination for Robust Underwater Motion planning with 3D Sonar). To the best of the authors' knowledge, SHRUMS is the first underwater autonomous navigation stack to integrate a 3D sonar. The proposed pipeline exhibits strong robustness while operating in complex 3D environments in spite of extremely poor visibility conditions. To accommodate the intricacies of the novel sensor data stream while achieving real-time locally optimal performance, SHRUMS introduces the concept of hallucinating sensor measurements from non-existent sensors with convenient arbitrary parameters, tailored to application specific requirements. The proposed concepts are validated with real 3D sonar sensor data, utilizing real inputs in challenging settings and local maps constructed in real-time. Field deployments validating the proposed approach in full are planned in the very near future.
comment: 8 pages, 5 figures
Underwater Dense Mapping with the First Compact 3D Sonar
In the past decade, the adoption of compact 3D range sensors, such as LiDARs, has driven the developments of robust state-estimation pipelines, making them a standard sensor for aerial, ground, and space autonomy. Unfortunately, poor propagation of electromagnetic waves underwater, has limited the visibility-independent sensing options of underwater state-estimation to acoustic range sensors, which provide 2D information including, at-best, spatially ambiguous information. This paper, to the best of our knowledge, is the first study examining the performance, capacity, and opportunities arising from the recent introduction of the first compact 3D sonar. Towards that purpose, we introduce calibration procedures for extracting the extrinsics between the 3D sonar and a camera and we provide a study on acoustic response in different surfaces and materials. Moreover, we provide novel mapping and SLAM pipelines tested in deployments in underwater cave systems and other geometrically and acoustically challenging underwater environments. Our assessment showcases the unique capacity of 3D sonars to capture consistent spatial information allowing for detailed reconstructions and localization in datasets expanding to hundreds of meters. At the same time it highlights remaining challenges related to acoustic propagation, as found also in other acoustic sensors. Datasets collected for our evaluations would be released and shared with the community to enable further research advancements.
comment: 8 pages, 12 figures
Towards Proprioceptive Terrain Mapping with Quadruped Robots for Exploration in Planetary Permanently Shadowed Regions
Permanently Shadowed Regions (PSRs) near the lunar poles are of interest for future exploration due to their potential to contain water ice and preserve geological records. Their complex, uneven terrain favors the use of legged robots, which can traverse challenging surfaces while collecting in-situ data, and have proven effective in Earth analogs, including dark caves, when equipped with onboard lighting. While exteroceptive sensors like cameras and lidars can capture terrain geometry and even semantic information, they cannot quantify its physical interaction with the robot, a capability provided by proprioceptive sensing. We propose a terrain mapping framework for quadruped robots, which estimates elevation, foot slippage, energy cost, and stability margins from internal sensing during locomotion. These metrics are incrementally integrated into a multi-layer 2.5D gridmap that reflects terrain interaction from the robot's perspective. The system is evaluated in a simulator that mimics a lunar environment, using the 21 kg quadruped robot Aliengo, showing consistent mapping performance under lunar gravity and terrain conditions.
comment: Published in the Proceedings of the International Conference on Space Robotics (iSpaRo 2025)
Actor-Free Continuous Control via Structurally Maximizable Q-Functions NeurIPS 2025
Value-based algorithms are a cornerstone of off-policy reinforcement learning due to their simplicity and training stability. However, their use has traditionally been restricted to discrete action spaces, as they rely on estimating Q-values for individual state-action pairs. In continuous action spaces, evaluating the Q-value over the entire action space becomes computationally infeasible. To address this, actor-critic methods are typically employed, where a critic is trained on off-policy data to estimate Q-values, and an actor is trained to maximize the critic's output. Despite their popularity, these methods often suffer from instability during training. In this work, we propose a purely value-based framework for continuous control that revisits structural maximization of Q-functions, introducing a set of key architectural and algorithmic choices to enable efficient and stable learning. We evaluate the proposed actor-free Q-learning approach on a range of standard simulation tasks, demonstrating performance and sample efficiency on par with state-of-the-art baselines, without the cost of learning a separate actor. Particularly, in environments with constrained action spaces, where the value functions are typically non-smooth, our method with structural maximization outperforms traditional actor-critic methods with gradient-based maximization. We have released our code at https://github.com/USC-Lira/Q3C.
comment: 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
DDBot: Differentiable Physics-based Digging Robot for Unknown Granular Materials
Automating the manipulation of granular materials poses significant challenges due to complex contact dynamics, unpredictable material properties, and intricate system states. Existing approaches often fail to achieve efficiency and accuracy in such tasks. To fill the research gap, this paper studies the small-scale and high-precision granular material digging task with unknown physical properties. A new framework, named differentiable digging robot (DDBot), is proposed to manipulate granular materials, including sand and soil. Specifically, we equip DDBot with a differentiable physics-based simulator, tailored for granular material manipulation, powered by GPU-accelerated parallel computing and automatic differentiation. DDBot can perform efficient differentiable system identification and high-precision digging skill optimisation for unknown granular materials, which is enabled by a differentiable skill-to-action mapping, a task-oriented demonstration method, gradient clipping and line search-based gradient descent. Experimental results show that DDBot can efficiently (converge within 5 to 20 minutes) identify unknown granular material dynamics and optimise digging skills, with high-precision results in zero-shot real-world deployments, highlighting its practicality. Benchmark results against state-of-the-art baselines also confirm the robustness and efficiency of DDBot in such digging tasks.
comment: Accepted as a regular paper by the IEEE Transactions on Robotics
DiffVLA++: Bridging Cognitive Reasoning and End-to-End Driving through Metric-Guided Alignment
Conventional end-to-end (E2E) driving models are effective at generating physically plausible trajectories, but often fail to generalize to long-tail scenarios due to the lack of essential world knowledge to understand and reason about surrounding environments. In contrast, Vision-Language-Action (VLA) models leverage world knowledge to handle challenging cases, but their limited 3D reasoning capability can lead to physically infeasible actions. In this work we introduce DiffVLA++, an enhanced autonomous driving framework that explicitly bridges cognitive reasoning and E2E planning through metric-guided alignment. First, we build a VLA module directly generating semantically grounded driving trajectories. Second, we design an E2E module with a dense trajectory vocabulary that ensures physical feasibility. Third, and most critically, we introduce a metric-guided trajectory scorer that guides and aligns the outputs of the VLA and E2E modules, thereby integrating their complementary strengths. The experiment on the ICCV 2025 Autonomous Grand Challenge leaderboard shows that DiffVLA++ achieves EPDMS of 49.12.
RAPID Hand Prototype: Design of an Affordable, Fully-Actuated Biomimetic Hand for Dexterous Teleoperation IROS2025
This paper addresses the scarcity of affordable, fully-actuated five-fingered hands for dexterous teleoperation, which is crucial for collecting large-scale real-robot data within the "Learning from Demonstrations" paradigm. We introduce the prototype version of the RAPID Hand, the first low-cost, 20-degree-of-actuation (DoA) dexterous hand that integrates a novel anthropomorphic actuation and transmission scheme with an optimized motor layout and structural design to enhance dexterity. Specifically, the RAPID Hand features a universal phalangeal transmission scheme for the non-thumb fingers and an omnidirectional thumb actuation mechanism. Prioritizing affordability, the hand employs 3D-printed parts combined with custom gears for easier replacement and repair. We assess the RAPID Hand's performance through quantitative metrics and qualitative testing in a dexterous teleoperation system, which is evaluated on three challenging tasks: multi-finger retrieval, ladle handling, and human-like piano playing. The results indicate that the RAPID Hand's fully actuated 20-DoF design holds significant promise for dexterous teleoperation.
comment: Accepted by IROS2025
Learn2Decompose: Learning Problem Decomposition for Efficient Sequential Multi-object Manipulation Planning
We present an efficient task and motion replanning approach for sequential multi-object manipulation in dynamic environments. Conventional Task And Motion Planning (TAMP) solvers experience an exponential increase in planning time as the planning horizon and number of objects grow, limiting their applicability in real-world scenarios. To address this, we propose learning problem decompositions from demonstrations to accelerate TAMP solvers. Our approach consists of three key components: goal decomposition learning, computational distance learning, and object reduction. Goal decomposition identifies the necessary sequences of states that the system must pass through before reaching the final goal, treating them as subgoal sequences. Computational distance learning predicts the computational complexity between two states, enabling the system to identify the temporally closest subgoal from a disturbed state. Object reduction minimizes the set of active objects considered during replanning, further improving efficiency. We evaluate our approach on three benchmarks, demonstrating its effectiveness in improving replanning efficiency for sequential multi-object manipulation tasks in dynamic environments.
comment: Extension of RAL version: added PR2 Whole-body kitchen task and detailed discussion on limitations in main text; added pseudocode and robustness analysis of our approach, and formal analysis on why and when task goals are decomposable in appendix
NEBULA: Do We Evaluate Vision-Language-Action Agents Correctly?
The evaluation of Vision-Language-Action (VLA) agents is hindered by the coarse, end-task success metric that fails to provide precise skill diagnosis or measure robustness to real-world perturbations. This challenge is exacerbated by a fragmented data landscape that impedes reproducible research and the development of generalist models. To address these limitations, we introduce NEBULA, a unified ecosystem for single-arm manipulation that enables diagnostic and reproducible evaluation. NEBULA features a novel dual-axis evaluation protocol that combines fine-grained capability tests for precise skill diagnosis with systematic stress tests that measure robustness. A standardized API and a large-scale, aggregated dataset are provided to reduce fragmentation and support cross-dataset training and fair comparison. Using NEBULA, we demonstrate that top-performing VLAs struggle with key capabilities such as spatial reasoning and dynamic adaptation, which are consistently obscured by conventional end-task success metrics. By measuring both what an agent can do and when it does so reliably, NEBULA provides a practical foundation for robust, general-purpose embodied agents.
comment: Homepage: https://vulab-ai.github.io/NEBULA-Alpha/
Wonder Wins Ways: Curiosity-Driven Exploration through Multi-Agent Contextual Calibration
Autonomous exploration in complex multi-agent reinforcement learning (MARL) with sparse rewards critically depends on providing agents with effective intrinsic motivation. While artificial curiosity offers a powerful self-supervised signal, it often confuses environmental stochasticity with meaningful novelty. Moreover, existing curiosity mechanisms exhibit a uniform novelty bias, treating all unexpected observations equally. However, peer behavior novelty, which encode latent task dynamics, are often overlooked, resulting in suboptimal exploration in decentralized, communication-free MARL settings. To this end, inspired by how human children adaptively calibrate their own exploratory behaviors via observing peers, we propose a novel approach to enhance multi-agent exploration. We introduce CERMIC, a principled framework that empowers agents to robustly filter noisy surprise signals and guide exploration by dynamically calibrating their intrinsic curiosity with inferred multi-agent context. Additionally, CERMIC generates theoretically-grounded intrinsic rewards, encouraging agents to explore state transitions with high information gain. We evaluate CERMIC on benchmark suites including VMAS, Meltingpot, and SMACv2. Empirical results demonstrate that exploration with CERMIC significantly outperforms SoTA algorithms in sparse-reward environments.
Seeing through Uncertainty: Robust Task-Oriented Optimization in Visual Navigation
Visual navigation is a fundamental problem in embodied AI, yet practical deployments demand long-horizon planning capabilities to address multi-objective tasks. A major bottleneck is data scarcity: policies learned from limited data often overfit and fail to generalize OOD. Existing neural network-based agents typically increase architectural complexity that paradoxically become counterproductive in the small-sample regime. This paper introduce NeuRO, a integrated learning-to-optimize framework that tightly couples perception networks with downstream task-level robust optimization. Specifically, NeuRO addresses core difficulties in this integration: (i) it transforms noisy visual predictions under data scarcity into convex uncertainty sets using Partially Input Convex Neural Networks (PICNNs) with conformal calibration, which directly parameterize the optimization constraints; and (ii) it reformulates planning under partial observability as a robust optimization problem, enabling uncertainty-aware policies that transfer across environments. Extensive experiments on both unordered and sequential multi-object navigation tasks demonstrate that NeuRO establishes SoTA performance, particularly in generalization to unseen environments. Our work thus presents a significant advancement for developing robust, generalizable autonomous agents.
Rethink Repeatable Measures of Robot Performance with Statistical Query
For a general standardized testing algorithm designed to evaluate a specific aspect of a robot's performance, several key expectations are commonly imposed. Beyond accuracy (i.e., closeness to a typically unknown ground-truth reference) and efficiency (i.e., feasibility within acceptable testing costs and equipment constraints), one particularly important attribute is repeatability. Repeatability refers to the ability to consistently obtain the same testing outcome when similar testing algorithms are executed on the same subject robot by different stakeholders, across different times or locations. However, achieving repeatable testing has become increasingly challenging as the components involved grow more complex, intelligent, diverse, and, most importantly, stochastic. While related efforts have addressed repeatability at ethical, hardware, and procedural levels, this study focuses specifically on repeatable testing at the algorithmic level. Specifically, we target the well-adopted class of testing algorithms in standardized evaluation: statistical query (SQ) algorithms (i.e., algorithms that estimate the expected value of a bounded function over a distribution using sampled data). We propose a lightweight, parameterized, and adaptive modification applicable to any SQ routine, whether based on Monte Carlo sampling, importance sampling, or adaptive importance sampling, that makes it provably repeatable, with guaranteed bounds on both accuracy and efficiency. We demonstrate the effectiveness of the proposed approach across three representative scenarios: (i) established and widely adopted standardized testing of manipulators, (ii) emerging intelligent testing algorithms for operational risk assessment in automated vehicles, and (iii) developing use cases involving command tracking performance evaluation of humanoid robots in locomotion tasks.
Towards Versatile Humanoid Table Tennis: Unified Reinforcement Learning with Prediction Augmentation
Humanoid table tennis (TT) demands rapid perception, proactive whole-body motion, and agile footwork under strict timing -- capabilities that remain difficult for unified controllers. We propose a reinforcement learning framework that maps ball-position observations directly to whole-body joint commands for both arm striking and leg locomotion, strengthened by predictive signals and dense, physics-guided rewards. A lightweight learned predictor, fed with recent ball positions, estimates future ball states and augments the policy's observations for proactive decision-making. During training, a physics-based predictor supplies precise future states to construct dense, informative rewards that lead to effective exploration. The resulting policy attains strong performance across varied serve ranges (hit rate $\geq$ 96% and success rate $\geq$ 92%) in simulations. Ablation studies confirm that both the learned predictor and the predictive reward design are critical for end-to-end learning. Deployed zero-shot on a physical Booster T1 humanoid with 23 revolute joints, the policy produces coordinated lateral and forward-backward footwork with accurate, fast returns, suggesting a practical path toward versatile, competitive humanoid TT.
Interpretable Decision-Making for End-to-End Autonomous Driving ICCV 2025
Trustworthy AI is mandatory for the broad deployment of autonomous vehicles. Although end-to-end approaches derive control commands directly from raw data, interpreting these decisions remains challenging, especially in complex urban scenarios. This is mainly attributed to very deep neural networks with non-linear decision boundaries, making it challenging to grasp the logic behind AI-driven decisions. This paper presents a method to enhance interpretability while optimizing control commands in autonomous driving. To address this, we propose loss functions that promote the interpretability of our model by generating sparse and localized feature maps. The feature activations allow us to explain which image regions contribute to the predicted control command. We conduct comprehensive ablation studies on the feature extraction step and validate our method on the CARLA benchmarks. We also demonstrate that our approach improves interpretability, which correlates with reducing infractions, yielding a safer, high-performance driving model. Notably, our monocular, non-ensemble model surpasses the top-performing approaches from the CARLA Leaderboard by achieving lower infraction scores and the highest route completion rate, all while ensuring interpretability.
comment: Accepted to the ICCV 2025 2nd Workshop on the Challenge Of Out-of-Label Hazards in Autonomous Driving (2COOOL)
Learning to See and Act: Task-Aware View Planning for Robotic Manipulation
Recent vision-language-action (VLA) models for multi-task robotic manipulation commonly rely on static viewpoints and shared visual encoders, which limit 3D perception and cause task interference, hindering robustness and generalization. In this work, we propose Task-Aware View Planning (TAVP), a framework designed to overcome these challenges by integrating active view planning with task-specific representation learning. TAVP employs an efficient exploration policy, accelerated by a novel pseudo-environment, to actively acquire informative views. Furthermore, we introduce a Mixture-of-Experts (MoE) visual encoder to disentangle features across different tasks, boosting both representation fidelity and task generalization. By learning to see the world in a task-aware way, TAVP generates more complete and discriminative visual representations, demonstrating significantly enhanced action prediction across a wide array of manipulation challenges. Extensive experiments on RLBench tasks show that our proposed TAVP model achieves superior performance over state-of-the-art fixed-view approaches. Visual results and code are provided at: https://hcplab-sysu.github.io/TAVP.
comment: 14 pages, 8 figures, project page: https://hcplab-sysu.github.io/TAVP
Dynamic object goal pushing with mobile manipulators through model-free constrained reinforcement learning ICRA 2025
Non-prehensile pushing to move and reorient objects to a goal is a versatile loco-manipulation skill. In the real world, the object's physical properties and friction with the floor contain significant uncertainties, which makes the task challenging for a mobile manipulator. In this paper, we develop a learning-based controller for a mobile manipulator to move an unknown object to a desired position and yaw orientation through a sequence of pushing actions. The proposed controller for the robotic arm and the mobile base motion is trained using a constrained Reinforcement Learning (RL) formulation. We demonstrate its capability in experiments with a quadrupedal robot equipped with an arm. The learned policy achieves a success rate of 91.35% in simulation and at least 80% on hardware in challenging scenarios. Through our extensive hardware experiments, we show that the approach demonstrates high robustness against unknown objects of different masses, materials, sizes, and shapes. It reactively discovers the pushing location and direction, thus achieving contact-rich behavior while observing only the pose of the object. Additionally, we demonstrate the adaptive behavior of the learned policy towards preventing the object from toppling.
comment: presented at ICRA 2025, Video: https://youtu.be/wGAdPGVf9Ws?si=pi83ONWofHHqbFG0
Generation of Uncertainty-Aware Emergent Concepts in Factorized 3D Scene Graphs via Graph Neural Networks
Enabling robots to autonomously discover emergent spatial concepts (e.g., rooms) from primitive geometric observations (e.g., planar surfaces) within 3D Scene Graphs is essential for robust indoor navigation and mapping. These graphs provide a hierarchical metric-semantic representation in which such concepts are organized. To further enhance graph-SLAM performance, Factorized 3D Scene Graphs incorporate these concepts as optimization factors that constrain relative geometry and enforce global consistency. However, both stages of this process remain largely manual: concepts are typically derived using hand-crafted, concept-specific heuristics, while factors and their covariances are likewise manually designed. This reliance on manual specification limits generalization across diverse environments and scalability to new concept classes. This paper presents, for the first time, a learning-based method to generate online spatial emergent concepts as optimizable factors within a SLAM backend, reducing the need to handcraft both concept generation and the definition of their corresponding factors and covariances. In both simulated and real indoor scenarios, our approach improves complex concept detection by 20.7% and 5.3%, trajectory estimation by 19.2%, and map reconstruction by 12.3% and 3.8%, respectively, highlighting the benefits of this integration for robust and adaptive spatial understanding.
comment: Submitted to IEEE Robotics and Automation Letters (RA-L)
From Watch to Imagine: Steering Long-horizon Manipulation via Human Demonstration and Future Envisionment
Generalizing to long-horizon manipulation tasks in a zero-shot setting remains a central challenge in robotics. Current multimodal foundation based approaches, despite their capabilities, typically fail to decompose high-level commands into executable action sequences from static visual input alone. To address this challenge, we introduce Super-Mimic, a hierarchical framework that enables zero-shot robotic imitation by directly inferring procedural intent from unscripted human demonstration videos. Our framework is composed of two sequential modules. First, a Human Intent Translator (HIT) parses the input video using multimodal reasoning to produce a sequence of language-grounded subtasks. These subtasks then condition a Future Dynamics Predictor (FDP), which employs a generative model that synthesizes a physically plausible video rollout for each step. The resulting visual trajectories are dynamics-aware, explicitly modeling crucial object interactions and contact points to guide the low-level controller. We validate this approach through extensive experiments on a suite of long-horizon manipulation tasks, where Super-Mimic significantly outperforms state-of-the-art zero-shot methods by over 20%. These results establish that coupling video-driven intent parsing with prospective dynamics modeling is a highly effective strategy for developing general-purpose robotic systems.
comment: More details and videos can be found at: https://yipko.com/super-mimic
VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning
Coordinating multiple embodied agents in dynamic environments remains a core challenge in artificial intelligence, requiring both perception-driven reasoning and scalable cooperation strategies. While recent works have leveraged large language models (LLMs) for multi-agent planning, a few have begun to explore vision-language models (VLMs) for visual reasoning. However, these VLM-based approaches remain limited in their support for diverse embodiment types. In this work, we introduce VIKI-Bench, the first hierarchical benchmark tailored for embodied multi-agent cooperation, featuring three structured levels: agent activation, task planning, and trajectory perception. VIKI-Bench includes diverse robot embodiments, multi-view visual observations, and structured supervision signals to evaluate reasoning grounded in visual inputs. To demonstrate the utility of VIKI-Bench, we propose VIKI-R, a two-stage framework that fine-tunes a pretrained vision-language model (VLM) using Chain-of-Thought annotated demonstrations, followed by reinforcement learning under multi-level reward signals. Our extensive experiments show that VIKI-R significantly outperforms baselines method across all task levels. Furthermore, we show that reinforcement learning enables the emergence of compositional cooperation patterns among heterogeneous agents. Together, VIKI-Bench and VIKI-R offer a unified testbed and method for advancing multi-agent, visual-driven cooperation in embodied AI systems.
comment: Project page: https://faceong.github.io/VIKI-R/
Neural 3D Object Reconstruction with Small-Scale Unmanned Aerial Vehicles
Small Unmanned Aerial Vehicles (UAVs) exhibit immense potential for navigating indoor and hard-to-reach areas, yet their significant constraints in payload and autonomy have largely prevented their use for complex tasks like high-quality 3-Dimensional (3D) reconstruction. To overcome this challenge, we introduce a novel system architecture that enables fully autonomous, high-fidelity 3D scanning of static objects using UAVs weighing under 100 grams. Our core innovation lies in a dual-reconstruction pipeline that creates a real-time feedback loop between data capture and flight control. A near-real-time (near-RT) process uses Structure from Motion (SfM) to generate an instantaneous pointcloud of the object. The system analyzes the model quality on the fly and dynamically adapts the UAV's trajectory to intelligently capture new images of poorly covered areas. This ensures comprehensive data acquisition. For the final, detailed output, a non-real-time (non-RT) pipeline employs a Neural Radiance Fields (NeRF)-based Neural 3D Reconstruction (N3DR) approach, fusing SfM-derived camera poses with precise Ultra Wide-Band (UWB) location data to achieve superior accuracy. We implemented and validated this architecture using Crazyflie 2.1 UAVs. Our experiments, conducted in both single- and multi-UAV configurations, conclusively show that dynamic trajectory adaptation consistently improves reconstruction quality over static flight paths. This work demonstrates a scalable and autonomous solution that unlocks the potential of miniaturized UAVs for fine-grained 3D reconstruction in constrained environments, a capability previously limited to much larger platforms.
comment: 13 pages, 16 figures, 3 tables, 45 references
VLA-Cache: Efficient Vision-Language-Action Manipulation via Adaptive Token Caching NeurIPS 2025
Vision-Language-Action (VLA) models have demonstrated strong multi-modal reasoning capabilities, enabling direct action generation from visual perception and language instructions in an end-to-end manner. However, their substantial computational cost poses a challenge for real-time robotic control, where rapid decision-making is essential. This paper introduces VLA-Cache, a training-free inference acceleration method that reduces computational overhead by adaptively caching and reusing static visual tokens across frames. Exploiting the temporal continuity in robotic manipulation, VLA-Cache identifies minimally changed tokens between adjacent frames and reuses their cached key-value representations, thereby circumventing redundant computations. Additionally, to maintain action precision, VLA-Cache selectively re-computes task-relevant tokens that are environmentally sensitive, ensuring the fidelity of critical visual information. To further optimize efficiency, we introduce a layer adaptive token reusing strategy that dynamically adjusts the reuse ratio based on attention concentration across decoder layers, prioritizing critical tokens for recomputation. Extensive experiments on two simulation platforms (LIBERO and SIMPLER) and a real-world robotic system demonstrate that VLA-Cache achieves up to 1.7x speedup in CUDA latency and a 15% increase in control frequency, with negligible loss on task success rate. The code and videos can be found at our project page: https://vla-cache.github.io.
comment: Accepted to NeurIPS 2025
Time Reversal Symmetry for Efficient Robotic Manipulations in Deep Reinforcement Learning NeurIPS 2025
Symmetry is pervasive in robotics and has been widely exploited to improve sample efficiency in deep reinforcement learning (DRL). However, existing approaches primarily focus on spatial symmetries, such as reflection, rotation, and translation, while largely neglecting temporal symmetries. To address this gap, we explore time reversal symmetry, a form of temporal symmetry commonly found in robotics tasks such as door opening and closing. We propose Time Reversal symmetry enhanced Deep Reinforcement Learning (TR-DRL), a framework that combines trajectory reversal augmentation and time reversal guided reward shaping to efficiently solve temporally symmetric tasks. Our method generates reversed transitions from fully reversible transitions, identified by a proposed dynamics-consistent filter, to augment the training data. For partially reversible transitions, we apply reward shaping to guide learning, according to successful trajectories from the reversed task. Extensive experiments on the Robosuite and MetaWorld benchmarks demonstrate that TR-DRL is effective in both single-task and multi-task settings, achieving higher sample efficiency and stronger final performance compared to baseline methods.
comment: Accepted in NeurIPS 2025
IR2: Implicit Rendezvous for Robotic Exploration Teams under Sparse Intermittent Connectivity
Information sharing is critical in time-sensitive and realistic multi-robot exploration, especially for smaller robotic teams in large-scale environments where connectivity may be sparse and intermittent. Existing methods often overlook such communication constraints by assuming unrealistic global connectivity. Other works account for communication constraints (by maintaining close proximity or line of sight during information exchange), but are often inefficient. For instance, preplanned rendezvous approaches typically involve unnecessary detours resulting from poorly timed rendezvous, while pursuit-based approaches often result in short-sighted decisions due to their greedy nature. We present IR2, a deep reinforcement learning approach to information sharing for multi-robot exploration. Leveraging attention-based neural networks trained via reinforcement and curriculum learning, IR2 allows robots to effectively reason about the longer-term trade-offs between disconnecting for solo exploration and reconnecting for information sharing. In addition, we propose a hierarchical graph formulation to maintain a sparse yet informative graph, enabling our approach to scale to large-scale environments. We present simulation results in three large-scale Gazebo environments, which show that our approach yields 6.6-34.1% shorter exploration paths when compared to state-of-the-art baselines, and lastly deploy our learned policy on hardware. Our simulation training and testing code is available at https://ir2-explore.github.io.
comment: \c{opyright} 20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
As large language models (LLMs) evolve, their integration with 3D spatial data (3D-LLMs) has seen rapid progress, offering unprecedented capabilities for understanding and interacting with physical spaces. This survey provides a comprehensive overview of the methodologies enabling LLMs to process, understand, and generate 3D data. Highlighting the unique advantages of LLMs, such as in-context learning, step-by-step reasoning, open-vocabulary capabilities, and extensive world knowledge, we underscore their potential to significantly advance spatial comprehension and interaction within embodied Artificial Intelligence (AI) systems. Our investigation spans various 3D data representations, from point clouds to Neural Radiance Fields (NeRFs). It examines their integration with LLMs for tasks such as 3D scene understanding, captioning, question-answering, and dialogue, as well as LLM-based agents for spatial reasoning, planning, and navigation. The paper also includes a brief review of other methods that integrate 3D and language. The meta-analysis presented in this paper reveals significant progress yet underscores the necessity for novel approaches to harness the full potential of 3D-LLMs. Hence, with this paper, we aim to chart a course for future research that explores and expands the capabilities of 3D-LLMs in understanding and interacting with the complex 3D world. To support this survey, we have established a project page where papers related to our topic are organized and listed: https://github.com/ActiveVisionLab/Awesome-LLM-3D.
comment: 2nd version update to Jun.2025
Distilling LLM Prior to Flow Model for Generalizable Agent's Imagination in Object Goal Navigation
The Object Goal Navigation (ObjectNav) task challenges agents to locate a specified object in an unseen environment by imagining unobserved regions of the scene. Prior approaches rely on deterministic and discriminative models to complete semantic maps, overlooking the inherent uncertainty in indoor layouts and limiting their ability to generalize to unseen environments. In this work, we propose GOAL, a generative flow-based framework that models the semantic distribution of indoor environments by bridging observed regions with LLM-enriched full-scene semantic maps. During training, spatial priors inferred from large language models (LLMs) are encoded as two-dimensional Gaussian fields and injected into target maps, distilling rich contextual knowledge into the flow model and enabling more generalizable completions. Extensive experiments demonstrate that GOAL achieves state-of-the-art performance on MP3D and Gibson, and shows strong generalization in transfer settings to HM3D. Codes and pretrained models are available at https://github.com/Badi-Li/GOAL.
RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning
Existing end-to-end autonomous driving (AD) algorithms typically follow the Imitation Learning (IL) paradigm, which faces challenges such as causal confusion and an open-loop gap. In this work, we propose RAD, a 3DGS-based closed-loop Reinforcement Learning (RL) framework for end-to-end Autonomous Driving. By leveraging 3DGS techniques, we construct a photorealistic digital replica of the real physical world, enabling the AD policy to extensively explore the state space and learn to handle out-of-distribution scenarios through large-scale trial and error. To enhance safety, we design specialized rewards to guide the policy in effectively responding to safety-critical events and understanding real-world causal relationships. To better align with human driving behavior, we incorporate IL into RL training as a regularization term. We introduce a closed-loop evaluation benchmark consisting of diverse, previously unseen 3DGS environments. Compared to IL-based methods, RAD achieves stronger performance in most closed-loop metrics, particularly exhibiting a 3x lower collision rate. Abundant closed-loop results are presented in the supplementary material. Code is available at https://github.com/hustvl/RAD for facilitating future research.
comment: Code: https://github.com/hustvl/RAD
SAMPO:Scale-wise Autoregression with Motion PrOmpt for generative world models
World models allow agents to simulate the consequences of actions in imagined environments for planning, control, and long-horizon decision-making. However, existing autoregressive world models struggle with visually coherent predictions due to disrupted spatial structure, inefficient decoding, and inadequate motion modeling. In response, we propose \textbf{S}cale-wise \textbf{A}utoregression with \textbf{M}otion \textbf{P}r\textbf{O}mpt (\textbf{SAMPO}), a hybrid framework that combines visual autoregressive modeling for intra-frame generation with causal modeling for next-frame generation. Specifically, SAMPO integrates temporal causal decoding with bidirectional spatial attention, which preserves spatial locality and supports parallel decoding within each scale. This design significantly enhances both temporal consistency and rollout efficiency. To further improve dynamic scene understanding, we devise an asymmetric multi-scale tokenizer that preserves spatial details in observed frames and extracts compact dynamic representations for future frames, optimizing both memory usage and model performance. Additionally, we introduce a trajectory-aware motion prompt module that injects spatiotemporal cues about object and robot trajectories, focusing attention on dynamic regions and improving temporal consistency and physical realism. Extensive experiments show that SAMPO achieves competitive performance in action-conditioned video prediction and model-based control, improving generation quality with 4.4$\times$ faster inference. We also evaluate SAMPO's zero-shot generalization and scaling behavior, demonstrating its ability to generalize to unseen tasks and benefit from larger model sizes.
comment: 22 pages,15 figures
LLM-RG: Referential Grounding in Outdoor Scenarios using Large Language Models IROS 2025
Referential grounding in outdoor driving scenes is challenging due to large scene variability, many visually similar objects, and dynamic elements that complicate resolving natural-language references (e.g., "the black car on the right"). We propose LLM-RG, a hybrid pipeline that combines off-the-shelf vision-language models for fine-grained attribute extraction with large language models for symbolic reasoning. LLM-RG processes an image and a free-form referring expression by using an LLM to extract relevant object types and attributes, detecting candidate regions, generating rich visual descriptors with a VLM, and then combining these descriptors with spatial metadata into natural-language prompts that are input to an LLM for chain-of-thought reasoning to identify the referent's bounding box. Evaluated on the Talk2Car benchmark, LLM-RG yields substantial gains over both LLM and VLM-based baselines. Additionally, our ablations show that adding 3D spatial cues further improves grounding. Our results demonstrate the complementary strengths of VLMs and LLMs, applied in a zero-shot manner, for robust outdoor referential grounding.
comment: Human-aware Embodied AI Workshop @ IROS 2025
Towards foundational LiDAR world models with efficient latent flow matching NeurIPS 2025
LiDAR-based world models offer more structured and geometry-aware representations than their image-based counterparts. However, existing LiDAR world models are narrowly trained; each model excels only in the domain for which it was built. Can we develop LiDAR world models that exhibit strong transferability across multiple domains? We conduct the first systematic domain transfer study across three demanding scenarios: (i) outdoor to indoor generalization, (ii) sparse-beam & dense-beam adaptation, and (iii) non-semantic to semantic transfer. Given different amounts of fine-tuning data, our experiments show that a single pre-trained model can achieve up to 11% absolute improvement (83% relative) over training from scratch and outperforms training from scratch in 30/36 of our comparisons. This transferability of dynamic learning significantly reduces the reliance on manually annotated data for semantic occupancy forecasting: our method exceed the previous semantic occupancy forecasting models with only 5% of the labeled training data required by prior models. We also observed inefficiencies of current LiDAR world models, mainly through their under-compression of LiDAR data and inefficient training objectives. To address this, we propose a latent conditional flow matching (CFM)-based frameworks that achieves state-of-the-art reconstruction accuracy using only half the training data and a compression ratio 6 times higher than that of prior methods. Our model achieves SOTA performance on future-trajectory-conditioned semantic occupancy forecasting while being 23x more computationally efficient (a 28x FPS speedup); and achieves SOTA performance on semantic occupancy forecasting while being 2x more computationally efficient (a 1.1x FPS speedup).
comment: Accepted to the Thirty-Ninth Conference on Neural Information Processing Systems (NeurIPS 2025), 25 pages, 13 figures
Improving planning and MBRL with temporally-extended actions NeurIPS 2025
Continuous time systems are often modeled using discrete time dynamics but this requires a small simulation step to maintain accuracy. In turn, this requires a large planning horizon which leads to computationally demanding planning problems and reduced performance. Previous work in model-free reinforcement learning has partially addressed this issue using action repeats where a policy is learned to determine a discrete action duration. Instead we propose to control the continuous decision timescale directly by using temporally-extended actions and letting the planner treat the duration of the action as an additional optimization variable along with the standard action variables. This additional structure has multiple advantages. It speeds up simulation time of trajectories and, importantly, it allows for deep horizon search in terms of primitive actions while using a shallow search depth in the planner. In addition, in the model-based reinforcement learning (MBRL) setting, it reduces compounding errors from model learning and improves training time for models. We show that this idea is effective and that the range for action durations can be automatically selected using a multi-armed bandit formulation and integrated into the MBRL framework. An extensive experimental evaluation both in planning and in MBRL, shows that our approach yields faster planning, better solutions, and that it enables solutions to problems that are not solved in the standard formulation.
comment: NeurIPS 2025. For project website, see https://pecey.github.io/MBRL-with-TEA/
SPiDR: A Simple Approach for Zero-Shot Safety in Sim-to-Real Transfer
Deploying reinforcement learning (RL) safely in the real world is challenging, as policies trained in simulators must face the inevitable sim-to-real gap. Robust safe RL techniques are provably safe, however difficult to scale, while domain randomization is more practical yet prone to unsafe behaviors. We address this gap by proposing SPiDR, short for Sim-to-real via Pessimistic Domain Randomization -- a scalable algorithm with provable guarantees for safe sim-to-real transfer. SPiDR uses domain randomization to incorporate the uncertainty about the sim-to-real gap into the safety constraints, making it versatile and highly compatible with existing training pipelines. Through extensive experiments on sim-to-sim benchmarks and two distinct real-world robotic platforms, we demonstrate that SPiDR effectively ensures safety despite the sim-to-real gap while maintaining strong performance.
Multiagent Systems
LightMem: Lightweight and Efficient Memory-Augmented Generation
Despite their remarkable capabilities, Large Language Models (LLMs) struggle to effectively leverage historical interaction information in dynamic and complex environments. Memory systems enable LLMs to move beyond stateless interactions by introducing persistent information storage, retrieval, and utilization mechanisms. However, existing memory systems often introduce substantial time and computational overhead. To this end, we introduce a new memory system called LightMem, which strikes a balance between the performance and efficiency of memory systems. Inspired by the Atkinson-Shiffrin model of human memory, LightMem organizes memory into three complementary stages. First, cognition-inspired sensory memory rapidly filters irrelevant information through lightweight compression and groups information according to their topics. Next, topic-aware short-term memory consolidates these topic-based groups, organizing and summarizing content for more structured access. Finally, long-term memory with sleep-time update employs an offline procedure that decouples consolidation from online inference. Experiments on LongMemEval with GPT and Qwen backbones show that LightMem outperforms strong baselines in accuracy (up to 10.9% gains) while reducing token usage by up to 117x, API calls by up to 159x, and runtime by over 12x. The code is available at https://github.com/zjunlp/LightMem.
comment: Work in progress
Computational Foundations for Strategic Coopetition: Formalizing Interdependence and Complementarity
Modern socio-technical systems are characterized by strategic coopetition where actors simultaneously cooperate to create value and compete to capture it. While conceptual modeling languages like i* provide rich qualitative representations of strategic dependencies, they lack mechanisms for quantitative analysis of dynamic trade-offs. Conversely, classical game theory offers mathematical rigor but strips away contextual richness. This technical report bridges this gap by developing computational foundations that formalize two critical dimensions of coopetition: interdependence and complementarity. We ground interdependence in i* structural dependency analysis, translating depender-dependee-dependum relationships into quantitative interdependence coefficients through a structured translation framework. We formalize complementarity following Brandenburger and Nalebuff's Added Value concept, modeling synergistic value creation with validated parameterization. We integrate structural dependencies with bargaining power in value appropriation and introduce a game-theoretic formulation where Nash Equilibrium incorporates structural interdependence. Validation combines comprehensive experimental testing across power and logarithmic value function specifications, demonstrating functional form robustness, with empirical application to the Samsung-Sony S-LCD joint venture (2004-2011), where logarithmic specifications achieve superior empirical fit (validation score 45/60) while power functions provide theoretical tractability. This technical report serves as the foundational reference for a coordinated research program examining strategic coopetition in requirements engineering and multi-agent systems, with companion work addressing trust dynamics, team production, and reciprocity mechanisms.
comment: 36 pages, 7 figures
Fetch.ai: An Architecture for Modern Multi-Agent Systems
Recent surges in LLM-driven intelligent systems largely overlook decades of foundational multi-agent systems (MAS) research, resulting in frameworks with critical limitations such as centralization and inadequate trust and communication protocols. This paper introduces the Fetch.ai architecture, an industrial-strength platform designed to bridge this gap by facilitating the integration of classical MAS principles with modern AI capabilities. We present a novel, multi-layered solution built on a decentralized foundation of on-chain blockchain services for verifiable identity, discovery, and transactions. This is complemented by a comprehensive development framework for creating secure, interoperable agents, a cloud-based platform for deployment, and an intelligent orchestration layer where an agent-native LLM translates high-level human goals into complex, multi-agent workflows. We demonstrate the deployed nature of this system through a decentralized logistics use case where autonomous agents dynamically discover, negotiate, and transact with one another securely. Ultimately, the Fetch.ai stack provides a principled architecture for moving beyond current agent implementations towards open, collaborative, and economically sustainable multi-agent ecosystems.
comment: 26 pages, figures, code examples
Socialized Learning and Emergent Behaviors in Multi-Agent Systems based on Multimodal Large Language Models
This search introduces the Multimodal Socialized Learning Framework (M-S2L), designed to foster emergent social intelligence in AI agents by integrating Multimodal Large Language Models (M-LLMs) with social learning mechanisms. The framework equips agents with multimodal perception (vision and text) and structured action capabilities, enabling physical manipulation and grounded multimodal communication (e.g., text with visual pointers). M-S2L combines direct reinforcement learning with two novel social learning pathways: multimodal observational learning and communication-driven learning from feedback, augmented by an episodic memory system for long-term social context. We evaluate M-S2L in a Collaborative Assembly Environment (CAE), where agent teams must construct complex devices from ambiguous blueprints under informational asymmetry. Across tasks of increasing complexity, M-S2L agents consistently outperform Text-Only and No-Social-Learning baselines in Task Completion Rate and Time to Completion, particularly in dynamic problem-solving scenarios. Ablation studies confirm the necessity of both multimodality and socialized learning. Our analysis reveals the emergence of efficient communication protocols integrating visual pointers with concise text, alongside rapid role specialization leading to stable labor division. Qualitative case studies demonstrate agents' abilities for shared awareness, dynamic re-planning, and adaptive problem-solving, suggesting a nascent form of machine social cognition. These findings indicate that integrating multimodal perception with explicit social learning is critical for developing human-like collaborative intelligence in multi-agent systems.
Food4All: A Multi-Agent Framework for Real-time Free Food Discovery with Integrated Nutritional Metadata
Food insecurity remains a persistent public health emergency in the United States, tightly interwoven with chronic disease, mental illness, and opioid misuse. Yet despite the existence of thousands of food banks and pantries, access remains fragmented: 1) current retrieval systems depend on static directories or generic search engines, which provide incomplete and geographically irrelevant results; 2) LLM-based chatbots offer only vague nutritional suggestions and fail to adapt to real-world constraints such as time, mobility, and transportation; and 3) existing food recommendation systems optimize for culinary diversity but overlook survival-critical needs of food-insecure populations, including immediate proximity, verified availability, and contextual barriers. These limitations risk leaving the most vulnerable individuals, those experiencing homelessness, addiction, or digital illiteracy, unable to access urgently needed resources. To address this, we introduce Food4All, the first multi-agent framework explicitly designed for real-time, context-aware free food retrieval. Food4All unifies three innovations: 1) heterogeneous data aggregation across official databases, community platforms, and social media to provide a continuously updated pool of food resources; 2) a lightweight reinforcement learning algorithm trained on curated cases to optimize for both geographic accessibility and nutritional correctness; and 3) an online feedback loop that dynamically adapts retrieval policies to evolving user needs. By bridging information acquisition, semantic analysis, and decision support, Food4All delivers nutritionally annotated and guidance at the point of need. This framework establishes an urgent step toward scalable, equitable, and intelligent systems that directly support populations facing food insecurity and its compounding health risks.
Distributed Allocation and Resource Scheduling Algorithms Resilient to Link Failure
Distributed resource allocation (DRA) is fundamental to modern networked systems, spanning applications from economic dispatch in smart grids to CPU scheduling in data centers. Conventional DRA approaches require reliable communication, yet real-world networks frequently suffer from link failures, packet drops, and communication delays due to environmental conditions, network congestion, and security threats. We introduce a novel resilient DRA algorithm that addresses these critical challenges, and our main contributions are as follows: (1) guaranteed constraint feasibility at all times, ensuring resource-demand balance even during algorithm termination or network disruption; (2) robust convergence despite sector-bound nonlinearities at nodes/links, accommodating practical constraints like quantization and saturation; and (3) optimal performance under merely uniformly-connected networks, eliminating the need for continuous connectivity. Unlike existing approaches that require persistent network connectivity and provide only asymptotic feasibility, our graph-theoretic solution leverages network percolation theory to maintain performance during intermittent disconnections. This makes it particularly valuable for mobile multi-agent systems where nodes frequently move out of communication range. Theoretical analysis and simulations demonstrate that our algorithm converges to optimal solutions despite heterogeneous time delays and substantial link failures, significantly advancing the reliability of distributed resource allocation in practical network environments.
comment: European Journal of Control
From Agent Simulation to Social Simulator: A Comprehensive Review (Part 1)
This is the first part of the comprehensive review, focusing on the historical development of Agent-Based Modeling (ABM) and its classic cases. It begins by discussing the development history and design principles of Agent-Based Modeling (ABM), helping readers understand the significant challenges that traditional physical simulation methods face in the social domain. Then, it provides a detailed introduction to foundational models for simulating social systems, including individual models, environmental models, and rule-based models. Finally, it presents classic cases of social simulation, covering three types: thought experiments, mechanism exploration, and parallel optimization.
The Emergence of Complex Behavior in Large-Scale Ecological Environments
We explore how physical scale and population size shape the emergence of complex behaviors in open-ended ecological environments. In our setting, agents are unsupervised and have no explicit rewards or learning objectives but instead evolve over time according to reproduction, mutation, and natural selection. As they act, agents also shape their environment and the population around them in an ongoing dynamic ecology. Our goal is not to optimize a single high-performance policy, but instead to examine how behaviors emerge and evolve across large populations due to natural competition and environmental pressures. In an effort to discover how complex behaviors naturally emerge, we conduct experiments in large-scale worlds that reach populations of more than 60,000 individual agents, each with their own evolved neural network policy. We identify various emergent behaviors such as long-range resource extraction, vision-based foraging, and predation that arise under competitive and survival pressures. We examine how sensing modalities and environmental scale affect the emergence of these behaviors, finding that some appear only in sufficiently large environments and populations, with larger scales increasing behavioral stability and consistency. While there is a rich history of research in evolutionary settings, our scaling results provide promising new directions to explore ecology as an instrument of machine learning in an era of abundant computational resources. Experimental code is available at https://github.com/jbejjani2022/ecological-emergent-behavior.
comment: 18 pages, 11 figures, 6 tables, experiment code available at https://github.com/jbejjani2022/ecological-emergent-behavior
Adaptive Coopetition: Leveraging Coarse Verifier Signals for Resilient Multi-Agent LLM Reasoning
Inference-time computation is a critical yet challenging paradigm for enhancing the reasoning performance of large language models (LLMs). While existing strategies improve reasoning stability and consistency, they suffer from notable limitations: self-correction often reinforces the model's initial biases, and Multi-Agent Collaboration (MAC) often fails due to the lack of efficient coordination mechanisms, leading to collective errors. Although high-performing verifiers can detect reasoning errors, making them reliable requires substantial training. To address these challenges, we introduce a novel inference-time framework, Adaptive Coopetition (AdCo), in which LLM agents utilize an adaptive, UCB-based "coopetition" mechanism. At each round, agents leverage coarse verifier signals to determine whether to collaborate or compete, and iteratively refine their reasoning based on peer feedback. Without relying on high-performance verifiers, our adaptive strategy achieves significant performance gains on mathematical reasoning benchmarks, yielding a 20% relative improvement over baselines on the more challenging dataset. Our approach remains robust and consistent in terms of accuracy under different sample sizes and configurations. This adaptive, signal-guided "coopetition" framework enhances reasoning robustness by leveraging both model knowledge diversity and reasoning trace measures, while also promoting uncertainty-driven exploration, especially when participants have comparable capabilities. From this perspective, our work offers a fresh lens on inference-time computation and paves the way for more resilient multi-agent LLM systems. Our code is available at: https://github.com/AdCo-Research/adaptive-coopetition.
comment: 13 pages, 8 figures
Plural Voices, Single Agent: Towards Inclusive AI in Multi-User Domestic Spaces
Domestic AI agents faces ethical, autonomy, and inclusion challenges, particularly for overlooked groups like children, elderly, and Neurodivergent users. We present the Plural Voices Model (PVM), a novel single-agent framework that dynamically negotiates multi-user needs through real-time value alignment, leveraging diverse public datasets on mental health, eldercare, education, and moral reasoning. Using human+synthetic curriculum design with fairness-aware scenarios and ethical enhancements, PVM identifies core values, conflicts, and accessibility requirements to inform inclusive principles. Our privacy-focused prototype features adaptive safety scaffolds, tailored interactions (e.g., step-by-step guidance for Neurodivergent users, simple wording for children), and equitable conflict resolution. In preliminary evaluations, PVM outperforms multi-agent baselines in compliance (76% vs. 70%), fairness (90% vs. 85%), safety-violation rate (0% vs. 7%), and latency. Design innovations, including video guidance, autonomy sliders, family hubs, and adaptive safety dashboards, demonstrate new directions for ethical and inclusive domestic AI, for building user-centered agentic systems in plural domestic contexts. Our Codes and Model are been open sourced, available for reproduction: https://github.com/zade90/Agora
Sync or Sink: Bounds on Algorithmic Collective Action with Noise and Multiple Groups NeurIPS
Collective action against algorithmic systems, which enables groups to promote their own interests, is poised to grow. Hence, there will be growth in the size and the number of distinct collectives. Currently, there is no formal analysis of how coordination challenges within a collective can impact downstream outcomes, or how multiple collectives may affect each other's success. In this work, we aim to provide guarantees on the success of collective action in the presence of both coordination noise and multiple groups. Our insight is that data generated by either multiple collectives or by coordination noise can be viewed as originating from multiple data distributions. Using this framing, we derive bounds on the success of collective action. We conduct experiments to study the effects of noise on collective action. We find that sufficiently high levels of noise can reduce the success of collective action. In certain scenarios, large noise can sink a collective success rate from $100\%$ to just under $60\%$. We identify potential trade-offs between collective size and coordination noise; for example, a collective that is twice as big but with four times more noise experiencing worse outcomes than the smaller, more coordinated one. This work highlights the importance of understanding nuanced dynamics of strategic behavior in algorithmic systems.
comment: Full Version of NeurIPS workshop paper
ATL*AS: An Automata-Theoretic Approach and Tool for the Verification of Strategic Abilities in Multi-Agent Systems
We present two novel symbolic algorithms for model checking the Alternating-time Temporal Logic ATL*, over both the infinite-trace and the finite-trace semantics. In particular, for infinite traces we design a novel symbolic reduction to parity games. We implement both methods in the ATL*AS model checker and evaluate it using synthetic benchmarks as well as a cybersecurity scenario. Our results demonstrate that the symbolic approach significantly outperforms the explicit-state representation and we find that our parity-game-based algorithm offers a more scalable and efficient solution for infinite-trace verification, outperforming previously available tools. Our results also confirm that finite-trace model checking yields substantial performance benefits over infinite-trace verification. As such, we provide a comprehensive toolset for verifying multiagent systems against specifications in ATL*.
Disaster Management in the Era of Agentic AI Systems: A Vision for Collective Human-Machine Intelligence for Augmented Resilience
The escalating frequency and severity of disasters routinely overwhelm traditional response capabilities, exposing critical vulnerability in disaster management. Current practices are hindered by fragmented data streams, siloed technologies, resource constraints, and the erosion of institutional memory, which collectively impede timely and effective decision making. This study introduces Disaster Copilot, a vision for a multi-agent artificial intelligence system designed to overcome these systemic challenges by unifying specialized AI tools within a collaborative framework. The proposed architecture utilizes a central orchestrator to coordinate diverse sub-agents, each specializing in critical domains such as predictive risk analytics, situational awareness, and impact assessment. By integrating multi-modal data, the system delivers a holistic, real-time operational picture and serve as the essential AI backbone required to advance Disaster Digital Twins from passive models to active, intelligent environments. Furthermore, it ensures functionality in resource-limited environments through on-device orchestration and incorporates mechanisms to capture institutional knowledge, mitigating the impact of staff turnover. We detail the system architecture and propose a three-phased roadmap emphasizing the parallel growth of technology, organizational capacity, and human-AI teaming. Disaster Copilot offers a transformative vision, fostering collective human-machine intelligence to build more adaptive, data-driven and resilient communities.
Stop Reducing Responsibility in LLM-Powered Multi-Agent Systems to Local Alignment
LLM-powered Multi-Agent Systems (LLM-MAS) unlock new potentials in distributed reasoning, collaboration, and task generalization but also introduce additional risks due to unguaranteed agreement, cascading uncertainty, and adversarial vulnerabilities. We argue that ensuring responsible behavior in such systems requires a paradigm shift: from local, superficial agent-level alignment to global, systemic agreement. We conceptualize responsibility not as a static constraint but as a lifecycle-wide property encompassing agreement, uncertainty, and security, each requiring the complementary integration of subjective human-centered values and objective verifiability. Furthermore, a dual-perspective governance framework that combines interdisciplinary design with human-AI collaborative oversight is essential for tracing and ensuring responsibility throughout the lifecycle of LLM-MAS. Our position views LLM-MAS not as loose collections of agents, but as unified, dynamic socio-technical systems that demand principled mechanisms to support each dimension of responsibility and enable ethically aligned, verifiably coherent, and resilient behavior for sustained, system-wide agreement.
comment: Updated manuscript of our previous version (arXiv:2502.01714). Under review
Static Sandboxes Are Inadequate: Modeling Societal Complexity Requires Open-Ended Co-Evolution in LLM-Based Multi-Agent Simulations
What if artificial agents could not just communicate, but also evolve, adapt, and reshape their worlds in ways we cannot fully predict? With llm now powering multi-agent systems and social simulations, we are witnessing new possibilities for modeling open-ended, ever-changing environments. Yet, most current simulations remain constrained within static sandboxes, characterized by predefined tasks, limited dynamics, and rigid evaluation criteria. These limitations prevent them from capturing the complexity of real-world societies. In this paper, we argue that static, task-specific benchmarks are fundamentally inadequate and must be rethought. We critically review emerging architectures that blend llm with multi-agent dynamics, highlight key hurdles such as balancing stability and diversity, evaluating unexpected behaviors, and scaling to greater complexity, and introduce a fresh taxonomy for this rapidly evolving field. Finally, we present a research roadmap centered on open-endedness, continuous co-evolution, and the development of resilient, socially aligned AI ecosystems. We call on the community to move beyond static paradigms and help shape the next generation of adaptive, socially-aware multi-agent simulations.
comment: Preprint; feedback welcome
Counterfactual Effect Decomposition in Multi-Agent Sequential Decision Making ICML 2025
We address the challenge of explaining counterfactual outcomes in multi-agent Markov decision processes. In particular, we aim to explain the total counterfactual effect of an agent's action on the outcome of a realized scenario through its influence on the environment dynamics and the agents' behavior. To achieve this, we introduce a novel causal explanation formula that decomposes the counterfactual effect by attributing to each agent and state variable a score reflecting their respective contributions to the effect. First, we show that the total counterfactual effect of an agent's action can be decomposed into two components: one measuring the effect that propagates through all subsequent agents' actions and another related to the effect that propagates through the state transitions. Building on recent advancements in causal contribution analysis, we further decompose these two effects as follows. For the former, we consider agent-specific effects -- a causal concept that quantifies the counterfactual effect of an agent's action that propagates through a subset of agents. Based on this notion, we use Shapley value to attribute the effect to individual agents. For the latter, we consider the concept of structure-preserving interventions and attribute the effect to state variables based on their "intrinsic" contributions. Through extensive experimentation, we demonstrate the interpretability of our approach in a Gridworld environment with LLM-assisted agents and a sepsis management simulator.
comment: ICML 2025
Multi-Agent Collaboration via Evolving Orchestration NeurIPS 2025
Large language models (LLMs) have achieved remarkable results across diverse downstream tasks, but their monolithic nature restricts scalability and efficiency in complex problem-solving. While recent research explores multi-agent collaboration among LLMs, most approaches rely on static organizational structures that struggle to adapt as task complexity and agent numbers grow, resulting in coordination overhead and inefficiencies. To this end, we propose a puppeteer-style paradigm for LLM-based multi-agent collaboration, where a centralized orchestrator ("puppeteer") dynamically directs agents ("puppets") in response to evolving task states. This orchestrator is trained via reinforcement learning to adaptively sequence and prioritize agents, enabling flexible and evolvable collective reasoning. Experiments on closed- and open-domain scenarios show that this method achieves superior performance with reduced computational costs. Analyses further reveal that the key improvements consistently stem from the emergence of more compact, cyclic reasoning structures under the orchestrator's evolution. Our code is available at https://github.com/OpenBMB/ChatDev/tree/puppeteer.
comment: accepted at NeurIPS 2025
DrunkAgent: Stealthy Memory Corruption in LLM-Powered Recommender Agents
Large language model (LLM)-powered agents are increasingly used in recommender systems (RSs) to achieve personalized behavior modeling, where the memory mechanism plays a pivotal role in enabling the agents to autonomously explore, learn and self-evolve from real-world interactions. However, this very mechanism, serving as a contextual repository, inherently exposes an attack surface for potential adversarial manipulations. Despite its central role, the robustness of agentic RSs in the face of such threats remains largely underexplored. Previous works suffer from semantic mismatches or rely on static embeddings or pre-defined prompts, all of which are not designed for dynamic systems, especially for dynamic memory states of LLM agents. This challenge is exacerbated by the black-box nature of commercial recommenders. To tackle the above problems, in this paper, we present the first systematic investigation of memory-based vulnerabilities in LLM-powered recommender agents, revealing their security limitations and guiding efforts to strengthen system resilience and trustworthiness. Specifically, we propose a novel black-box attack framework named DrunkAgent. DrunkAgent crafts semantically meaningful adversarial textual triggers for target item promotions and introduces a series of strategies to maximize the trigger effect by corrupting the memory updates during the interactions. The triggers and strategies are optimized on a surrogate model, enabling DrunkAgent transferable and stealthy. Extensive experiments on real-world datasets across diverse agentic RSs, including collaborative filtering, retrieval augmentation and sequential recommendations, demonstrate the generalizability, transferability and stealthiness of DrunkAgent.
Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards
LLMs are increasingly used to design reward functions based on human preferences in Reinforcement Learning (RL). We focus on LLM-designed rewards for Restless Multi-Armed Bandits, a framework for allocating limited resources among agents. In applications such as public health, this approach empowers grassroots health workers to tailor automated allocation decisions to community needs. In the presence of multiple agents, altering the reward function based on human preferences can impact subpopulations very differently, leading to complex tradeoffs and a multi-objective resource allocation problem. We are the first to present a principled method termed Social Choice Language Model for dealing with these tradeoffs for LLM-designed rewards for multiagent planners in general and restless bandits in particular. The novel part of our model is a transparent and configurable selection component, called an adjudicator, external to the LLM that controls complex tradeoffs via a user-selected social welfare function. Our experiments demonstrate that our model reliably selects more effective, aligned, and balanced reward functions compared to purely LLM-based approaches.
Systems and Control (CS)
Lyapunov-Aware Quantum-Inspired Reinforcement Learning for Continuous-Time Vehicle Control: A Feasibility Study
This paper presents a novel Lyapunov-Based Quantum Reinforcement Learning (LQRL) framework that integrates quantum policy optimization with Lyapunov stability analysis for continuous-time vehicle control. The proposed approach combines the representational power of variational quantum circuits (VQCs) with a stability-aware policy gradient mechanism to ensure asymptotic convergence and safe decision-making under dynamic environments. The vehicle longitudinal control problem was formulated as a continuous-state reinforcement learning task, where the quantum policy network generates control actions subject to Lyapunov stability constraints. Simulation experiments were conducted in a closed-loop adaptive cruise control scenario using a quantum-inspired policy trained under stability feedback. The results demonstrate that the LQRL framework successfully embeds Lyapunov stability verification into quantum policy learning, enabling interpretable and stability-aware control performance. Although transient overshoot and Lyapunov divergence were observed under aggressive acceleration, the system maintained bounded state evolution, validating the feasibility of integrating safety guarantees within quantum reinforcement learning architectures. The proposed framework provides a foundational step toward provably safe quantum control in autonomous systems and hybrid quantum-classical optimization domains.
comment: 7 pages, 4 figures, 20 equations, 3 appendices, 4 tables
MADR: MPC-guided Adversarial DeepReach
Hamilton-Jacobi (HJ) Reachability offers a framework for generating safe value functions and policies in the face of adversarial disturbance, but is limited by the curse of dimensionality. Physics-informed deep learning is able to overcome this infeasibility, but itself suffers from slow and inaccurate convergence, primarily due to weak PDE gradients and the complexity of self-supervised learning. A few works, recently, have demonstrated that enriching the self-supervision process with regular supervision (based on the nature of the optimal control problem), greatly accelerates convergence and solution quality, however, these have been limited to single player problems and simple games. In this work, we introduce MADR: MPC-guided Adversarial DeepReach, a general framework to robustly approximate the two-player, zero-sum differential game value function. In doing so, MADR yields the corresponding optimal strategies for both players in zero-sum games as well as safe policies for worst-case robustness. We test MADR on a multitude of high-dimensional simulated and real robotic agents with varying dynamics and games, finding that our approach significantly out-performs state-of-the-art baselines in simulation and produces impressive results in hardware.
comment: 8 pages, under review
$\ell_1$-Based Adaptive Identification under Quantized Observations with Applications
Quantized observations are ubiquitous in a wide range of applications across engineering and the social sciences, and algorithms based on the $\ell_1$-norm are well recognized for their robustness to outliers compared with their $\ell_2$-based counterparts. Nevertheless, adaptive identification methods that integrate quantized observations with $\ell_1$-optimization remain largely underexplored. Motivated by this gap, we develop a novel $\ell_1$-based adaptive identification algorithm specifically designed for quantized observations. Without relying on the traditional persistent excitation condition, we establish global convergence of the parameter estimates to their true values and show that the average regret asymptotically vanishes as the data size increases. Finally, we apply our new identification algorithm to a judicial sentencing problem using real-world data, which demonstrates its superior performance and practical significance.
A Note on Optimal Distributed State Estimation for Linear Time-Varying Systems
In this technical note, we prove that the ODEFTC algorithm constitutes the first optimal distributed state estimator for continuous-time linear time-varying systems subject to stochastic disturbances. Particularly, we formally show that it is able to asymptotically recover the performance, in terms of error covariance of the estimates at each node, of the centralized Kalman-Bucy filter, which is known to be the optimal filter for the considered class of systems. Moreover, we provide a simple sufficient value for the consensus gain to guarantee the stability of the distributed estimator.
comment: This work has been submitted to the IEEE for possible publication
Quantifying Security for Networked Control Systems: A Review
Networked Control Systems (NCSs) are integral in critical infrastructures such as power grids, transportation networks, and production systems. Ensuring the resilient operation of these large-scale NCSs against cyber-attacks is crucial for societal well-being. Over the past two decades, extensive research has been focused on developing metrics to quantify the vulnerabilities of NCSs against attacks. Once the vulnerabilities are quantified, mitigation strategies can be employed to enhance system resilience. This article provides a comprehensive overview of methods developed for assessing NCS vulnerabilities and the corresponding mitigation strategies. Furthermore, we emphasize the importance of probabilistic risk metrics to model vulnerabilities under adversaries with imperfect process knowledge. The article concludes by outlining promising directions for future research.
comment: Journal submission
PIRA: Pan-CDN Intra-video Resource Adaptation for Short Video Streaming
In large scale short video platforms, CDN resource selection plays a critical role in maintaining Quality of Experience (QoE) while controlling escalating traffic costs. To better understand this phenomenon, we conduct in the wild network measurements during video playback in a production short video system. The results reveal that CDNs delivering higher average QoE often come at greater financial cost, yet their connection quality fluctuates even within a single video underscoring a fundamental and dynamic trade off between QoE and cost. However, the problem of sustaining high QoE under cost constraints remains insufficiently investigated in the context of CDN selection for short video streaming. To address this, we propose PIRA, a dynamic resource selection algorithm that optimizes QoE and cost in real time during video playback. PIRA formally integrating QoE and cost by a mathematical model, and introduce a intra video control theoretic CDN resource selection approach which can balance QoE and cost under network dynamics. To reduce the computation overheads, PIRA employs state space pruning and adaptive parameter adjustment to efficiently solve the high dimensional optimization problem. In large scale production experiments involving 450,000 users over two weeks, PIRA outperforms the production baseline, achieving a 2.1% reduction in start up delay, 15.2% shorter rebuffering time, and 10% lower average unit traffic cost, demonstrating its effectiveness in balancing user experience and financial cost at scale.
Designing trajectories in the Earth-Moon system: a Levenberg-Marquardt approach
Trajectory design in cislunar space under a High-Fidelity Ephemeris Model (HFEM) is pursued through a nonlinear optimization perspective anchored on the transition of solutions from lower fidelity models, namely the Circular Restricted Three-Body Problem (CR3BP). The optimization problem is posed in the likeness of a multiple-shooting approach, aiming for segment-to-segment continuity while tracking proximity to the original CR3BP structures. The analysis of various formulations leads to the selection of an unconstrained least-squares problem for further investigation. The nonlinear optimization problem is convexified and the use of the Levenberg-Marquardt algorithm, as an alternative to the minimum-norm update equation found in most literature, is investigated for its control over the update step and inherent robustness. Additional techniques such as adaptive weighting are employed to further consolidate the behavior of the proposed algorithm in challenging scenarios. Numerical trials evaluate the adequacy of the methodology presented and compare it to the minimum-norm baseline over various application cases, including the generation of quasi-periodic trajectories and orbital transfers between them. The proposed approach is found to outperform the baseline in applications where the initial guess is poor and the ease of including proximity constraints provides benefits in control over the shape of the converged solution.
comment: Preprint submitted to Advances in Space Research
Sliding-Mode Control Strategies for PMSM speed control: A Comprehensive Review, Taxonomy and Research Gaps
Permanent Magnet Synchronous Motors (PMSMs) are widely employed in high-performance drive systems due to their high efficiency, power density, and precise dynamic behavior. However, nonlinearities, load disturbances, and parameter uncertainties present persistent challenges to control. Sliding-Mode Control (SMC) remains one of the most reliable strategies for high-performance PMSM drives. Yet, the rapid proliferation of adaptive, fractional-order, and intelligent variants has fragmented recent literature. This paper presents a comprehensive review and taxonomy of SMC-based PMSM speed-control methods published between 2020 and 2025. More than 200 studies are systematically analyzed and classified according to control order, surface design, disturbance-observer integration, optimization approach, and intelligent augmentation. Trends in publication activity, dominant hybrid structures, and application domains are quantitatively summarized. The review reveals a clear evolution from conventional discontinuous SMC toward adaptive, higher-order, and data-driven frameworks that mitigate chattering while preserving robustness. Persistent research gaps are identified in hardware validation, energy-efficiency assessment, and real-time tuning strategies. The taxonomy and critical synthesis provided herein establish a coherent reference for researchers and form the conceptual foundation for the companion paper (Part II), which delivers a unified benchmark and comparative simulation study of representative SMC designs.
comment: 30 Pages, 7 Fugures, 3 Tables
MPC-based motion planning for non-holonomic systems in non-convex domains
Motivated by the application of using model predictive control (MPC) for motion planning of autonomous mobile robots, a form of output tracking MPC for non-holonomic systems and with non-convex constraints is studied. Although the advantages of using MPC for motion planning have been demonstrated in several papers, in most of the available fundamental literature on output tracking MPC it is assumed, often implicitly, that the model is holonomic and generally the state or output constraints must be convex. Thus, in application-oriented publications, empirical results dominate and the topic of proving completeness, in particular under which assumptions the target is always reached, has received comparatively little attention. To address this gap, we present a novel MPC formulation that guarantees convergence to the desired target under realistic assumptions, which can be verified in relevant real-world scenarios.
comment: Preprint of ECC 2025 submission
MMRHP: A Miniature Mixed-Reality HIL Platform for Auditable Closed-Loop Evaluation
Validation of autonomous driving systems requires a trade-off between test fidelity, cost, and scalability. While miniaturized hardware-in-the-loop (HIL) platforms have emerged as a promising solution, a systematic framework supporting rigorous quantitative analysis is generally lacking, limiting their value as scientific evaluation tools. To address this challenge, we propose MMRHP, a miniature mixed-reality HIL platform that elevates miniaturized testing from functional demonstration to rigorous, reproducible quantitative analysis. The core contributions are threefold. First, we propose a systematic three-phase testing process oriented toward the Safety of the Intended Functionality(SOTIF)standard, providing actionable guidance for identifying the performance limits and triggering conditions of otherwise correctly functioning systems. Second, we design and implement a HIL platform centered around a unified spatiotemporal measurement core to support this process, ensuring consistent and traceable quantification of physical motion and system timing. Finally, we demonstrate the effectiveness of this solution through comprehensive experiments. The platform itself was first validated, achieving a spatial accuracy of 10.27 mm RMSE and a stable closed-loop latency baseline of approximately 45 ms. Subsequently, an in-depth Autoware case study leveraged this validated platform to quantify its performance baseline and identify a critical performance cliff at an injected latency of 40 ms. This work shows that a structured process, combined with a platform offering a unified spatio-temporal benchmark, enables reproducible, interpretable, and quantitative closed-loop evaluation of autonomous driving systems.
Coverage-Recon: Coordinated Multi-Drone Image Sampling with Online Map Feedback
This article addresses collaborative 3D map reconstruction using multiple drones. Achieving high-quality reconstruction requires capturing images of keypoints within the target scene from diverse viewing angles, and coverage control offers an effective framework to meet this requirement. Meanwhile, recent advances in real-time 3D reconstruction algorithms make it possible to render an evolving map during flight, enabling immediate feedback to guide drone motion. Building on this, we present Coverage-Recon, a novel coordinated image sampling algorithm that integrates online map feedback to improve reconstruction quality on-the-fly. In Coverage-Recon, the coordinated motion of drones is governed by a Quadratic Programming (QP)-based angle-aware coverage controller, which ensures multi-viewpoint image capture while enforcing safety constraints. The captured images are processed in real time by the NeuralRecon algorithm to generate an evolving 3D mesh. Mesh changes across the scene are interpreted as indicators of reconstruction uncertainty and serve as feedback to update the importance index of the coverage control as the map evolves. The effectiveness of Coverage-Recon is validated through simulation and experiments, demonstrating both qualitatively and quantitatively that incorporating online map feedback yields more complete and accurate 3D reconstructions than conventional methods. Project page: https://htnk-lab.github.io/coverage-recon/
comment: Submitted to IEEE Transactions on Control Systems Technology (under review). Project page: https://htnk-lab.github.io/coverage-recon/
Explicit Reformulation of Discrete Distributionally Robust Optimization Problems
Distributionally robust optimization (DRO) is an effective framework for controlling real-world systems with various uncertainties, typically modeled using distributional uncertainty balls. However, DRO problems often involve infinitely many inequality constraints, rendering exact solutions computationally expensive. In this study, we propose a discrete DRO (DDRO) method that significantly simplifies the problem by reducing it to a single trivial constraint. Specifically, the proposed method utilizes two types of distributional uncertainty balls to reformulate the DDRO problem into a single-layer smooth convex program, significantly improving tractability. Furthermore, we provide practical guidance for selecting the appropriate ball sizes. The original DDRO problem is further reformulated into two optimization problems: one minimizing the mean and standard deviation, and the other minimizing the conditional value at risk (CVaR). These formulations account for the choice of ball sizes, thereby enhancing the practical applicability of the method. The proposed method was applied to a distributionally robust patrol-agent design problem, identifying a Pareto front in which the mean and standard deviation of the mean hitting time varied by up to 3% and 14%, respectively, while achieving a CVaR reduction of up to 13%.
comment: 15 pages, 4 figures, This paper is submitted to a journal for possible publication
Distributed Allocation and Resource Scheduling Algorithms Resilient to Link Failure
Distributed resource allocation (DRA) is fundamental to modern networked systems, spanning applications from economic dispatch in smart grids to CPU scheduling in data centers. Conventional DRA approaches require reliable communication, yet real-world networks frequently suffer from link failures, packet drops, and communication delays due to environmental conditions, network congestion, and security threats. We introduce a novel resilient DRA algorithm that addresses these critical challenges, and our main contributions are as follows: (1) guaranteed constraint feasibility at all times, ensuring resource-demand balance even during algorithm termination or network disruption; (2) robust convergence despite sector-bound nonlinearities at nodes/links, accommodating practical constraints like quantization and saturation; and (3) optimal performance under merely uniformly-connected networks, eliminating the need for continuous connectivity. Unlike existing approaches that require persistent network connectivity and provide only asymptotic feasibility, our graph-theoretic solution leverages network percolation theory to maintain performance during intermittent disconnections. This makes it particularly valuable for mobile multi-agent systems where nodes frequently move out of communication range. Theoretical analysis and simulations demonstrate that our algorithm converges to optimal solutions despite heterogeneous time delays and substantial link failures, significantly advancing the reliability of distributed resource allocation in practical network environments.
comment: European Journal of Control
Brute-force search and Warshall algorithms for matrix-weighted graphs
Although research on the control of networked systems has grown considerably, graph-theoretic and algorithmic studies on matrix-weighted graphs remain limited. To bridge this gap in the literature, this work introduces two algorithms-the brute-force search and the Warshall algorithm-for determining connectedness and clustering in undirected matrix-weighted graphs. The proposed algorithms, which are derived from a sufficient condition for connectedness, emphasize a key distinction between matrix-weighted and scalar-weighted graphs. While the existence of a path between two vertices guarantees connectedness in scalar-weighted graphs, connectedness in matrix-weighted graphs is a collective contribution of all paths joining the two vertices. Proofs of correctness and numerical examples are provided to illustrate and demonstrate the effectiveness of the algorithms.
comment: 20 pages, 6 figures, preprint
Urban Air Mobility: A Review of Recent Advances in Communication, Management, and Sustainability
Urban Air Mobility (UAM) offers a transformative approach to addressing urban congestion, improving accessibility, and advancing environmental sustainability. Rapid progress has emerged in three tightly linked domains since 2020: (1) Communication, where dynamic spectrum allocation and low-altitude channel characterization support reliable air-ground data exchange; (2) UAM management, with novel air-traffic control concepts for dense, largely autonomous urban airspace; and (3) Sustainability, driven by energy-efficient propulsion, integrated charging infrastructure, and holistic environmental assessment. This paper reviews and synthesizes the latest research across these areas, compares the state-of-the-art solutions, and outlines the technological and infrastructural milestones that are critical to realizing a scalable, sustainable UAM ecosystem.
comment: This work has been accepted by the 2025 International Conference on Cyber-physical Social Intelligence (CPSI 2025)
Harmonic Cancellation in Multi-Electrolyzer P2H Plants via Phasor-Modulated Production Scheduling
Thyristor rectifiers (TRs) are cost-effective power supplies for hydrogen electrolyzers (ELZs) but introduce harmonic distortion that may violate grid codes. This letter proposes a self-governing harmonic mitigation strategy through coordinated operation of multiple ELZs in large power-to-hydrogen (P2H) plants. First, the harmonic model of TR-powered ELZs is derived, revealing a natural harmonic cancellation mechanism among them. Based on this, a system-level operation scheme based on phasor modulation is developed and integrated into plant scheduling. Case studies demonstrate that the proposed method reduces harmonic currents by 21.2%-39.7% and ensures grid-code compliance, with only a 0.25% loss in hydrogen output, while increasing total revenue by over 21\% compared to production-oriented strategies.
comment: This work has been submitted to the IEEE for possible publication
Wisdom of Crowds Effects under Antagonistic Interactions and Correlated Opinions
This paper investigates the wisdom of crowds of linear opinion dynamics models evolving on signed networks. Conditions are given under which models such as the DeGroot, Friedkin-Johnsen (FJ) and concatenated FJ models improve or undermine collective wisdom. The extension to dependent initial opinions is also presented, highlighting how the correlation structure influences the feasibility and geometry of the wisdom-improving regions.
A Configurable Simulation Framework for Safety Assessment of Vulnerable Road Users
Ensuring the safety of vulnerable road users (VRUs), including pedestrians, cyclists, electric scooter riders, and motorcyclists, remains a major challenge for advanced driver assistance systems (ADAS) and connected and automated vehicles (CAV) technologies. Real-world VRU tests are expensive and sometimes cannot capture or repeat rare and hazardous events. In this paper, we present a lightweight, configurable simulation framework that follows European New Car Assessment Program (Euro NCAP) VRU testing protocols. A rule-based finite-state machine (FSM) is developed as a motion planner to provide vehicle automation during the VRU interaction. We also integrate ego-vehicle perception and idealized Vehicle-to-Everything (V2X) awareness to demonstrate safety margins in different scenarios. This work provides an extensible platform for rapid and repeatable VRU safety validation, paving the way for broader case-study deployment in diverse, user-defined settings, which will be essential for building a more VRU-friendly and sustainable intelligent transportation system.
comment: This work has been accepted by the 2025 International Conference on Cyber-physical Social Intelligence (CPSI 2025)
Time Domain Differential Equation Based Fault Location Identification in Mixed Overhead-Underground Power Distribution Systems
This paper proposes a time-domain fault location identification method for mixed overhead-underground power distribution systems that can handle challenging fault scenarios such as sub-cycle faults, arcing faults and evolving faults. The proposed method is formulated based on differential equations of the system and accounts for the peculiarities of power distribution systems with distributed generations. It considers the presence of loads, multi-phase laterals and sub-laterals, heterogenous overhead and underground lines, and infeeds and remote-end fault current contributions of distributed generations. It utilizes data collected by limited number of measuring devices installed in modern power distribution systems to systematically eliminate possible multiple fault location estimations to provide a single correct estimation of the actual location of the fault. The performance of the proposed method is demonstrated using IEEE 34-node test system.
A Learning-based Model Reference Adaptive Controller Implemented on a Prosthetic Hand Wrist
The functionality and natural motion of prosthetic hands remain limited by the challenges in controlling compliant wrist mechanisms. Current control strategies often lack adaptability and incur high computational costs, which impedes real-time deployment in assistive robotics. To address this gap, this study presents a computationally efficient Neural Network (NN)-based Model Reference Adaptive Controller (MRAC) for a tendon-driven soft continuum wrist integrated with a prosthetic hand. The dynamic modeling of the wrist is formulated using Timoshenko beam theory, capturing both shear and bending deformations. The proposed NN-MRAC estimates the required tendon forces from deflection errors and minimizes deviation from a reference model through online adaptation. Simulation results demonstrate improved precision with a root mean square error (RMSE) of $6.14 \times 10^{-4}$ m and a settling time of $3.2$s. Experimental validations confirm real-time applicability, with an average RMSE of $5.66 \times 10^{-3}$ m, steady-state error of $8.05 \times 10^{-3}$ m, and settling time of $1.58$ s. These results highlight the potential of the controller to enhance motion accuracy and responsiveness in soft prosthetic systems, thereby advancing the integration of adaptive intelligent control in wearable assistive devices.
comment: International Conference on Social Robotics + AI
Graph Analysis to Fully Automate Fault Location Identification in Power Distribution Systems
This paper proposes graph analysis methods to fully automate the fault location identification task in power distribution systems. The proposed methods take basic unordered data from power distribution systems as input, including branch parameters, load values, and the location of measuring devices. The proposed data preparation and analysis methods automatically identify the system's topology and extract essential information, such as faulted paths, structures, loading of laterals and sublaterals, and estimate the fault location accordingly. The proposed graph analysis methods do not require complex node and branch numbering processes or renumbering following changes in the system topology. The proposed methods eliminate the need for human intervention at any step of the fault location identification process. They are scalable and applicable to systems of any size. The performance of the proposed algorithm is demonstrated using the IEEE 34-bus distribution test system.
Convex Maneuver Planning for Spacecraft Collision Avoidance
Conjunction analysis and maneuver planning for spacecraft collision avoidance remains a manual and time-consuming process, typically involving repeated forward simulations of hand-designed maneuvers. With the growing density of satellites in low-Earth orbit (LEO), autonomy is becoming essential for efficiently evaluating and mitigating collisions. In this work, we present an algorithm to design low-thrust collision-avoidance maneuvers for short-term conjunction events. We first formulate the problem as a nonconvex quadratically-constrained quadratic program (QCQP), which we then relax into a convex semidefinite program (SDP) using Shor's relaxation. We demonstrate empirically that the relaxation is tight, which enables the recovery of globally optimal solutions to the original nonconvex problem. Our formulation produces a minimum-energy solution while ensuring a desired probability of collision at the time of closest approach. Finally, if the desired probability of collision cannot be satisfied, we relax this constraint into a penalty, yielding a minimum-risk solution. We validate our algorithm with a high-fidelity simulation of a satellite conjunction in low-Earth orbit with a simulated conjunction data message (CDM), demonstrating its effectiveness in reducing collision risk.
comment: 8 pages, 6 figures, Accepted to International Space Robotics Conference
Motion Planning and Control of an Overactuated 4-Wheel Drive with Constrained Independent Steering
This paper addresses motion planning and con- trol of an overactuated 4-wheel drive train with independent steering (4WIS) where mechanical constraints prevent the wheels from executing full 360-degree rotations (swerve). The configuration space of such a robot is constrained and contains discontinuities that affect the smoothness of the robot motion. We introduce a mathematical formulation of the steering constraints and derive discontinuity planes that partition the velocity space into regions of smooth and efficient motion. We further design the motion planner for path tracking and ob- stacle avoidance that explicitly accounts for swerve constraints and the velocity transition smoothness. The motion controller uses local feedback to generate actuation from the desired velocity, while properly handling the discontinuity crossing by temporarily stopping the motion and repositioning the wheels. We implement the proposed motion planner as an extension to ROS Navigation package and evaluate the system in simulation and on a physical robot.
comment: 7 pages, 5 figures, 3 tables, video available at https://youtu.be/8l9s2Wb_vec, To appear at IEEE 2025 International Conference on Advanced Robotics
Extreme value distributions of peak loads for non-residential customer segments SC
Electrical grid congestion is a growing challenge in Europe, driving the need for accurate prediction of load, particularly of peak load. Non-time-resolved models of peak load offer the advantages of simplicity and compactness, and among them, Velander's formula (VF) is a traditional method that has been used for decades. Moreover, VF can be adapted into a quantile VF, which learns a truncated cumulative distribution function of peak load based on electricity consumption. This paper proposes a mathematical model based on extreme value theory to characterize the probability distribution of peak load for large non-residential customers. The model underpins the quantile VF as demonstrated through multiple quantile regression and reduces its representation to just four parameters without sacrificing predictive performance. Moreover, using maximum likelihood estimation and the likelihood ratio test, we validate that the probability distribution of peak load of analysed groups belongs to the heavy-tailed Fr\'echet class.
comment: Submitted to Power Systems Computation Conference (PSCC) 2026
Extending Resource Constrained Project Scheduling to Mega-Projects with Model-Based Systems Engineering & Hetero-functional Graph Theory
Within the project management context, project scheduling serves as an indispensable component, functioning as a fundamental tool for planning, monitoring, controlling, and managing projects more broadly. Although the resource-constrained project scheduling problem (RCPSP) lies at the core of project management activities, it remains largely disconnected from the broader literature on model-based systems engineering (MBSE), thereby limiting its integration into the design and management of complex systems. The original contribution of this paper is twofold. First, the paper seeks to reconcile the RCPSP with the broader literature and vocabulary of model-based systems engineering and hetero-functional graph theory (HFGT). A concrete translation pipeline from an activity-on-node network to a SysML activity diagram, and then to an operand net is constructed. Using this representation, it specializes the hetero-functional network minimum-cost flow (HFNMCF) formulation to the RCPSP context as a systematic means of HFGT for quantitative analysis and proves that the RCPSP is recoverable as a special case of a broader model. Secondly, on an illustrative instance with renewable and non-renewable operands, the specialized HFNMCF, while producing similar schedules, yields explicit explanations of the project states that enable richer monitoring and control. Overall, the framework preserves the strengths of the classical RCPSP while accommodating real-world constraints and enterprise-level decision processes encountered in large, complex megaprojects.
Active Cooling Device: A Flexible, Lab-Scale Experimental Unit to Develop Spatio-Temporal Temperature Control Strategies
We present an experimental unit that realizes the ``multi-input, multi-output manifold'' thermal management technology proposed by Lamarre & Raymond (2023). The proposed setup can be used for experiments aimed at controlling spatiotemporal temperature distribution. Temperature control is achieved by impinging coolant fluid jets, leveraging a manifold of channels targeted to the surface. The direction of the fluid is controlled by shifting the role of channels between inputs, outputs, or closing them. Files associated with this work include Computer-Aided Design (CAD) STEP files, Gerber files to manufacture a Printed Circuit Board (PCB), and a Graphical User Interface (GUI) written in Python. We provide a step-by-step guide to assemble the experimental setup. We also provide instructions to interact with the setup through the GUI, which allows for real-time tracking of sample temperature and flow rates per flow control device. Additionally, we provide examples of usage of the setup, including system characterization with step response, Proportional-Integral-Derivative performance tracking, and disturbance rejection in a coupled system. Extending the application is accessible through the files provided in the open repository associated with this work. The active cooling device presents a safe, flexible, and complete design, allowing for lab-scale assessment of the performance of custom temperature control strategies using enclosed impinging jets.
Through-the-Earth Magnetic Induction Communication and Networking: A Comprehensive Survey
Magnetic induction (MI) communication (MIC) has emerged as a promising candidate for underground communication networks due to its excellent penetration capabilities. Integration with Space-Air-Ground-Underground (SAGUI) networks in next-generation mobile communication systems requires a well-defined network architecture. A recent discovery in MIC research, MI fast fading, remains in its early stages and presents unique challenges. This paper provides a comprehensive survey on through-the-earth (TTE) MIC, covering MI applications, channel modeling, point-to-point MIC design, relay techniques, network frameworks, and emerging technologies. We compare various MIC applications to highlight TTE-specific challenges and review the principles of channel modeling, addressing both MI slow fading and MI fast fading, along with its potential impact on existing MIC theories. We conduct a fine-grained decomposition of MI channel power gain into four distinct physical parameters, and propose a novel geometric model to analyze MI fast fading. We also summarize MI relay techniques, examine crosstalk effects in relay and high-density networks, and explore key research tasks within the OSI framework for a holistic MI network protocol in SAGUI. To bridge the gaps identified, we propose a MIC framework that supports TCP/IP and Linux, enabling full implementation of existing and emerging MIC solutions. This framework empowers researchers to leverage Linux resources and deep learning platforms for accelerated development of MIC in SAGUI networks. Remaining research challenges, open issues, and promising novel techniques are further identified to advance MIC research.
comment: This work has been accepted by the IEEE Communications Surveys & Tutorials (COMST) for publication. The final published version will be available on IEEE Xplore
PowerChain: A Verifiable Agentic AI System for Automating Distribution Grid Analyses
Rapid electrification and decarbonization are increasing the complexity of distribution grid (DG) operation and planning, necessitating advanced computational analyses to ensure reliability and resilience. These analyses depend on disparate workflows comprising complex models, function calls, and data pipelines that require substantial expert knowledge and remain difficult to automate. Workforce and budget constraints further limit utilities' ability to apply such analyses at scale. To address this gap, we build an agentic system PowerChain, which is capable of autonomously performing complex grid analyses. Existing agentic AI systems are typically developed in a bottom-up manner with customized context for predefined analysis tasks; therefore, they do not generalize to tasks that the agent has never seen. In comparison, to generalize to unseen DG analysis tasks, PowerChain dynamically generates structured context by leveraging supervisory signals from self-contained power systems tools (e.g., GridLAB-D) and an optimized set of expert-annotated and verified reasoning trajectories. For complex DG tasks defined in natural language, empirical results on real utility data demonstrate that PowerChain achieves up to a 144/% improvement in performance over baselines.
Rethink Repeatable Measures of Robot Performance with Statistical Query
For a general standardized testing algorithm designed to evaluate a specific aspect of a robot's performance, several key expectations are commonly imposed. Beyond accuracy (i.e., closeness to a typically unknown ground-truth reference) and efficiency (i.e., feasibility within acceptable testing costs and equipment constraints), one particularly important attribute is repeatability. Repeatability refers to the ability to consistently obtain the same testing outcome when similar testing algorithms are executed on the same subject robot by different stakeholders, across different times or locations. However, achieving repeatable testing has become increasingly challenging as the components involved grow more complex, intelligent, diverse, and, most importantly, stochastic. While related efforts have addressed repeatability at ethical, hardware, and procedural levels, this study focuses specifically on repeatable testing at the algorithmic level. Specifically, we target the well-adopted class of testing algorithms in standardized evaluation: statistical query (SQ) algorithms (i.e., algorithms that estimate the expected value of a bounded function over a distribution using sampled data). We propose a lightweight, parameterized, and adaptive modification applicable to any SQ routine, whether based on Monte Carlo sampling, importance sampling, or adaptive importance sampling, that makes it provably repeatable, with guaranteed bounds on both accuracy and efficiency. We demonstrate the effectiveness of the proposed approach across three representative scenarios: (i) established and widely adopted standardized testing of manipulators, (ii) emerging intelligent testing algorithms for operational risk assessment in automated vehicles, and (iii) developing use cases involving command tracking performance evaluation of humanoid robots in locomotion tasks.
One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares
While large machine learning models have shown remarkable performance in various domains, their training typically requires iterating for many passes over the training data. However, due to computational and memory constraints and potential privacy concerns, storing and accessing all the data is impractical in many real-world scenarios where the data arrives in a stream. In this paper, we investigate the problem of one-pass learning, in which a model is trained on sequentially arriving data without retraining on previous datapoints. Motivated by the demonstrated effectiveness of overparameterized models and the phenomenon of benign overfitting, we propose Orthogonal Recursive Fitting (ORFit), an algorithm for one-pass learning which seeks to perfectly fit each new datapoint while minimally altering the predictions on previous datapoints. ORFit updates the parameters in a direction orthogonal to past gradients, similar to orthogonal gradient descent (OGD) in continual learning. We show that, interestingly, ORFit's update leads to an operation similar to the recursive least-squares (RLS) algorithm in adaptive filtering but with significantly improved memory and computational efficiency, i.e., linear, instead of quadratic, in the number of parameters. To further reduce memory usage, we leverage the structure of the streaming data via an incremental principal component analysis (IPCA). We show that using the principal components is minimax optimal, i.e., it minimizes the worst-case forgetting of previous predictions for unknown future updates. Further, we prove that, for overparameterized linear models, the parameter vector obtained by ORFit matches what the standard multi-pass stochastic gradient descent (SGD) would converge to. Finally, we extend our results to the nonlinear setting for highly overparameterized models, relevant for deep learning.
comment: Journal extension of v1: Y. Min, K, Ahn, N. Azizan, "One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares," IEEE Conference on Decision and Control, 2022
Nondeterminism-Aware Optimistic Verification for Floating-Point Neural Networks
Neural networks increasingly run on hardware outside the user's control (cloud GPUs, inference marketplaces). Yet ML-as-a-Service reveals little about what actually ran or whether returned outputs faithfully reflect the intended inputs. Users lack recourse against service downgrades (model swaps, quantization, graph rewrites, or discrepancies like altered ad embeddings). Verifying outputs is hard because floating-point(FP) execution on heterogeneous accelerators is inherently nondeterministic. Existing approaches are either impractical for real FP neural networks or reintroduce vendor trust. We present NAO: a Nondeterministic tolerance Aware Optimistic verification protocol that accepts outputs within principled operator-level acceptance regions rather than requiring bitwise equality. NAO combines two error models: (i) sound per-operator IEEE-754 worst-case bounds and (ii) tight empirical percentile profiles calibrated across hardware. Discrepancies trigger a Merkle-anchored, threshold-guided dispute game that recursively partitions the computation graph until one operator remains, where adjudication reduces to a lightweight theoretical-bound check or a small honest-majority vote against empirical thresholds. Unchallenged results finalize after a challenge window, without requiring trusted hardware or deterministic kernels. We implement NAO as a PyTorch-compatible runtime and a contract layer currently deployed on Ethereum Holesky testnet. The runtime instruments graphs, computes per-operator bounds, and runs unmodified vendor kernels in FP32 with negligible overhead (0.3% on Qwen3-8B). Across CNNs, Transformers and diffusion models on A100, H100, RTX6000, RTX4090, empirical thresholds are $10^2-10^3$ times tighter than theoretical bounds, and bound-aware adversarial attacks achieve 0% success. NAO reconciles scalability with verifiability for real-world heterogeneous ML compute.
comment: 17 pages, 7 figures
Unifying Direct and Indirect Learning for Safe Control of Linear Systems
This paper develops learning-enabled safe controllers for linear systems subject to system uncertainties and bounded disturbances. Given the disturbance zonotope, the databased closed-loop dynamics (CLDs) are first characterized using a matrix zonotope (MZ), and refined through several steps to yield a constrained matrix zonotope (CMZ). This refinement is achieved by introducing conformal equality constraints that eliminate incompatible disturbance realizations. More precisely, prior knowledge and observed data are used separately to construct CMZ representations of disturbance sequences that conform to both data and prior knowledge, and are intersected by the initial MZ of the disturbance sequence, producing a refined CMZ. This approach reduces conservatism. To further reduce the conservativeness, we unify open-loop learning with closed-loop learning by presenting a novel set-membership identification method that models open-loop dynamics as a CMZ. The prior knowledge serves as an initial feasible open-loop model set (FOLMS) of this CMZ, which is refined into a posterior set whenever new informative online data becomes available. This posterior FOLMS then adaptively replaces the prior knowledge set employed in the disturbance elimination of the closed-loop learning process. The resulting refined parameterized set of CLD is subsequently leveraged to directly and adaptively learn a controller that robustly enforces safety. Toward this goal, we formulate a linear programming problem that guarantees {\lambda}contractiveness of a polyhedral safe set. A simulation example is provided to validate the effectiveness of the proposed approach and support the theoretical results.
comment: arXiv admin note: text overlap with arXiv:2502.04195
Dynamic object goal pushing with mobile manipulators through model-free constrained reinforcement learning ICRA 2025
Non-prehensile pushing to move and reorient objects to a goal is a versatile loco-manipulation skill. In the real world, the object's physical properties and friction with the floor contain significant uncertainties, which makes the task challenging for a mobile manipulator. In this paper, we develop a learning-based controller for a mobile manipulator to move an unknown object to a desired position and yaw orientation through a sequence of pushing actions. The proposed controller for the robotic arm and the mobile base motion is trained using a constrained Reinforcement Learning (RL) formulation. We demonstrate its capability in experiments with a quadrupedal robot equipped with an arm. The learned policy achieves a success rate of 91.35% in simulation and at least 80% on hardware in challenging scenarios. Through our extensive hardware experiments, we show that the approach demonstrates high robustness against unknown objects of different masses, materials, sizes, and shapes. It reactively discovers the pushing location and direction, thus achieving contact-rich behavior while observing only the pose of the object. Additionally, we demonstrate the adaptive behavior of the learned policy towards preventing the object from toppling.
comment: presented at ICRA 2025, Video: https://youtu.be/wGAdPGVf9Ws?si=pi83ONWofHHqbFG0
Neural 3D Object Reconstruction with Small-Scale Unmanned Aerial Vehicles
Small Unmanned Aerial Vehicles (UAVs) exhibit immense potential for navigating indoor and hard-to-reach areas, yet their significant constraints in payload and autonomy have largely prevented their use for complex tasks like high-quality 3-Dimensional (3D) reconstruction. To overcome this challenge, we introduce a novel system architecture that enables fully autonomous, high-fidelity 3D scanning of static objects using UAVs weighing under 100 grams. Our core innovation lies in a dual-reconstruction pipeline that creates a real-time feedback loop between data capture and flight control. A near-real-time (near-RT) process uses Structure from Motion (SfM) to generate an instantaneous pointcloud of the object. The system analyzes the model quality on the fly and dynamically adapts the UAV's trajectory to intelligently capture new images of poorly covered areas. This ensures comprehensive data acquisition. For the final, detailed output, a non-real-time (non-RT) pipeline employs a Neural Radiance Fields (NeRF)-based Neural 3D Reconstruction (N3DR) approach, fusing SfM-derived camera poses with precise Ultra Wide-Band (UWB) location data to achieve superior accuracy. We implemented and validated this architecture using Crazyflie 2.1 UAVs. Our experiments, conducted in both single- and multi-UAV configurations, conclusively show that dynamic trajectory adaptation consistently improves reconstruction quality over static flight paths. This work demonstrates a scalable and autonomous solution that unlocks the potential of miniaturized UAVs for fine-grained 3D reconstruction in constrained environments, a capability previously limited to much larger platforms.
comment: 13 pages, 16 figures, 3 tables, 45 references
A Flow-Based Model for Conditional and Probabilistic Electricity Consumption Profile Generation and Prediction
Residential Load Profile (RLP) generation and prediction are critical for the operation and planning of distribution networks, especially as diverse low-carbon technologies (e.g., photovoltaic and electric vehicles) are increasingly adopted. This paper introduces a novel flow-based generative model, termed Full Convolutional Profile Flow (FCPFlow), which is uniquely designed for both conditional and unconditional RLP generation, and for probabilistic load forecasting. By introducing two new layers--the invertible linear layer and the invertible normalization layer--the proposed FCPFlow architecture shows three main advantages compared to traditional statistical and contemporary deep generative models: 1) it is well-suited for RLP generation under continuous conditions, such as varying weather and annual electricity consumption, 2) it demonstrates superior scalability in different datasets compared to traditional statistical models, and 3) it also demonstrates better modeling capabilities in capturing the complex correlation of RLPs compared with deep generative models.
Asynchronous Federated Learning: A Scalable Approach for Decentralized Machine Learning
Federated Learning (FL) has emerged as a powerful paradigm for decentralized machine learning, enabling collaborative model training across diverse clients without sharing raw data. However, traditional FL approaches often face limitations in scalability and efficiency due to their reliance on synchronous client updates, which can result in significant delays and increased communication overhead, particularly in heterogeneous and dynamic environments. To address these challenges in this paper, we propose an Asynchronous Federated Learning (AFL) algorithm, which allows clients to update the global model independently and asynchronously. Our key contributions include a comprehensive convergence analysis of AFL in the presence of client delays and model staleness. By leveraging martingale difference sequence theory and variance bounds, we ensure robust convergence despite asynchronous updates. Assuming strongly convex local objective functions, we establish bounds on gradient variance under random client sampling and derive a recursion formula quantifying the impact of client delays on convergence. Furthermore, we demonstrate the practical applicability of the AFL algorithm by training decentralized linear regression and Support Vector Machine (SVM) based classifiers and compare its results with synchronous FL algorithm to effectively handling non-IID data distributed among clients. The proposed AFL algorithm addresses key limitations of traditional FL methods, such as inefficiency due to global synchronization and susceptibility to client drift. It enhances scalability, robustness, and efficiency in real-world settings with heterogeneous client populations and dynamic network conditions. Our results underscore the potential of AFL to drive advancements indistributed learning systems, particularly for large-scale, privacy-preserving applications in resource-constrained environments.
Finite-time Safety and Reach-avoid Verification of Stochastic Discrete-time Systems
This paper studies finite-time safety and reach-avoid verification for stochastic discrete-time dynamical systems. The aim is to ascertain lower and upper bounds of the probability that, within a predefined finite-time horizon, a system starting from an initial state in a safe set will either exit the safe set (safety verification) or reach a target set while remaining within the safe set until the first encounter with the target (reach-avoid verification). We introduce novel barrier-like sufficient conditions for characterizing these bounds, which either complement existing ones or fill gaps. Finally, we demonstrate the efficacy of these conditions on two examples.
comment: To appear in Information and Computation
Optimal state estimation: Turnpike analysis and performance results
In this paper, we introduce turnpike arguments in the context of optimal state estimation. In particular, we show that the optimal solution of the state estimation problem involving all available past data serves as turnpike for the solutions of truncated problems involving only a subset of the data. We mathematically formalize this phenomenon and derive a sufficient condition that relies on a decaying sensitivity property of the underlying nonlinear program. As second contribution, we show how a specific turnpike property can be used to establish performance guarantees when approximating the optimal solution of the full problem by a sequence of truncated problems, and we show that the resulting performance (both averaged and non-averaged) is approximately optimal with error terms that can be made arbitrarily small by an appropriate choice of the horizon length. In addition, we discuss interesting implications of these results for the practically relevant case of moving horizon estimation and illustrate our results with a numerical example.
comment: replaced with final version
Finite Sample Identification of Partially Observed Bilinear Dynamical Systems
We consider the problem of learning a realization of a partially observed bilinear dynamical system (BLDS) from noisy input-output data. Given a single trajectory of input-output samples, we provide a finite time analysis for learning the system's Markov-like parameters, from which a balanced realization of the bilinear system can be obtained. Our bilinear system identification algorithm learns the system's Markov-like parameters by regressing the outputs to highly correlated, nonlinear, and heavy-tailed covariates. Moreover, the stability of BLDS depends on the sequence of inputs used to excite the system. These properties, unique to partially observed bilinear dynamical systems, pose significant challenges to the analysis of our algorithm for learning the unknown dynamics. We address these challenges and provide high probability error bounds on our identification algorithm under a uniform stability assumption. Our analysis provides insights into system theoretic quantities that affect learning accuracy and sample complexity. Lastly, we perform numerical experiments with synthetic data to reinforce these insights.
Sub-optimality of the Separation Principle for Quadratic Control from Bilinear Observations
We consider the problem of controlling a linear dynamical system from bilinear observations with minimal quadratic cost. Despite the similarity of this problem to standard linear quadratic Gaussian (LQG) control, we show that when the observation model is bilinear, neither does the Separation Principle hold, nor is the optimal controller affine in the estimated state. Moreover, the cost-to-go is non-convex in the control input. Hence, finding an analytical expression for the optimal feedback controller is difficult in general. Under certain settings, we show that the standard LQG controller locally maximizes the cost instead of minimizing it. Furthermore, the optimal controllers (derived analytically) are not unique and are nonlinear in the estimated state. We also introduce a notion of input-dependent observability and derive conditions under which the Kalman filter covariance remains bounded. We illustrate our theoretical results through numerical experiments in multiple synthetic settings.
Iterated Invariant Extended Kalman Filter (IterIEKF)
We study the mathematical properties of the Invariant Extended Kalman Filter (IEKF) when iterating on the measurement update step, following the principles of the well-known Iterated Extended Kalman Filter. This iterative variant of the IEKF (IterIEKF) systematically improves its accuracy through Gauss-Newton-based relinearization, and exhibits additional theoretical properties, particularly in the low-noise regime, that resemble those of the linear Kalman filter. We apply the proposed approach to the problem of estimating the extended pose of a crane payload using an inertial measurement unit. Our results suggest that the IterIEKF significantly outperforms the IEKF when measurements are highly accurate.
Systems and Control (EESS)
Lyapunov-Aware Quantum-Inspired Reinforcement Learning for Continuous-Time Vehicle Control: A Feasibility Study
This paper presents a novel Lyapunov-Based Quantum Reinforcement Learning (LQRL) framework that integrates quantum policy optimization with Lyapunov stability analysis for continuous-time vehicle control. The proposed approach combines the representational power of variational quantum circuits (VQCs) with a stability-aware policy gradient mechanism to ensure asymptotic convergence and safe decision-making under dynamic environments. The vehicle longitudinal control problem was formulated as a continuous-state reinforcement learning task, where the quantum policy network generates control actions subject to Lyapunov stability constraints. Simulation experiments were conducted in a closed-loop adaptive cruise control scenario using a quantum-inspired policy trained under stability feedback. The results demonstrate that the LQRL framework successfully embeds Lyapunov stability verification into quantum policy learning, enabling interpretable and stability-aware control performance. Although transient overshoot and Lyapunov divergence were observed under aggressive acceleration, the system maintained bounded state evolution, validating the feasibility of integrating safety guarantees within quantum reinforcement learning architectures. The proposed framework provides a foundational step toward provably safe quantum control in autonomous systems and hybrid quantum-classical optimization domains.
comment: 7 pages, 4 figures, 20 equations, 3 appendices, 4 tables
MADR: MPC-guided Adversarial DeepReach
Hamilton-Jacobi (HJ) Reachability offers a framework for generating safe value functions and policies in the face of adversarial disturbance, but is limited by the curse of dimensionality. Physics-informed deep learning is able to overcome this infeasibility, but itself suffers from slow and inaccurate convergence, primarily due to weak PDE gradients and the complexity of self-supervised learning. A few works, recently, have demonstrated that enriching the self-supervision process with regular supervision (based on the nature of the optimal control problem), greatly accelerates convergence and solution quality, however, these have been limited to single player problems and simple games. In this work, we introduce MADR: MPC-guided Adversarial DeepReach, a general framework to robustly approximate the two-player, zero-sum differential game value function. In doing so, MADR yields the corresponding optimal strategies for both players in zero-sum games as well as safe policies for worst-case robustness. We test MADR on a multitude of high-dimensional simulated and real robotic agents with varying dynamics and games, finding that our approach significantly out-performs state-of-the-art baselines in simulation and produces impressive results in hardware.
comment: 8 pages, under review
$\ell_1$-Based Adaptive Identification under Quantized Observations with Applications
Quantized observations are ubiquitous in a wide range of applications across engineering and the social sciences, and algorithms based on the $\ell_1$-norm are well recognized for their robustness to outliers compared with their $\ell_2$-based counterparts. Nevertheless, adaptive identification methods that integrate quantized observations with $\ell_1$-optimization remain largely underexplored. Motivated by this gap, we develop a novel $\ell_1$-based adaptive identification algorithm specifically designed for quantized observations. Without relying on the traditional persistent excitation condition, we establish global convergence of the parameter estimates to their true values and show that the average regret asymptotically vanishes as the data size increases. Finally, we apply our new identification algorithm to a judicial sentencing problem using real-world data, which demonstrates its superior performance and practical significance.
A Note on Optimal Distributed State Estimation for Linear Time-Varying Systems
In this technical note, we prove that the ODEFTC algorithm constitutes the first optimal distributed state estimator for continuous-time linear time-varying systems subject to stochastic disturbances. Particularly, we formally show that it is able to asymptotically recover the performance, in terms of error covariance of the estimates at each node, of the centralized Kalman-Bucy filter, which is known to be the optimal filter for the considered class of systems. Moreover, we provide a simple sufficient value for the consensus gain to guarantee the stability of the distributed estimator.
comment: This work has been submitted to the IEEE for possible publication
Quantifying Security for Networked Control Systems: A Review
Networked Control Systems (NCSs) are integral in critical infrastructures such as power grids, transportation networks, and production systems. Ensuring the resilient operation of these large-scale NCSs against cyber-attacks is crucial for societal well-being. Over the past two decades, extensive research has been focused on developing metrics to quantify the vulnerabilities of NCSs against attacks. Once the vulnerabilities are quantified, mitigation strategies can be employed to enhance system resilience. This article provides a comprehensive overview of methods developed for assessing NCS vulnerabilities and the corresponding mitigation strategies. Furthermore, we emphasize the importance of probabilistic risk metrics to model vulnerabilities under adversaries with imperfect process knowledge. The article concludes by outlining promising directions for future research.
comment: Journal submission
PIRA: Pan-CDN Intra-video Resource Adaptation for Short Video Streaming
In large scale short video platforms, CDN resource selection plays a critical role in maintaining Quality of Experience (QoE) while controlling escalating traffic costs. To better understand this phenomenon, we conduct in the wild network measurements during video playback in a production short video system. The results reveal that CDNs delivering higher average QoE often come at greater financial cost, yet their connection quality fluctuates even within a single video underscoring a fundamental and dynamic trade off between QoE and cost. However, the problem of sustaining high QoE under cost constraints remains insufficiently investigated in the context of CDN selection for short video streaming. To address this, we propose PIRA, a dynamic resource selection algorithm that optimizes QoE and cost in real time during video playback. PIRA formally integrating QoE and cost by a mathematical model, and introduce a intra video control theoretic CDN resource selection approach which can balance QoE and cost under network dynamics. To reduce the computation overheads, PIRA employs state space pruning and adaptive parameter adjustment to efficiently solve the high dimensional optimization problem. In large scale production experiments involving 450,000 users over two weeks, PIRA outperforms the production baseline, achieving a 2.1% reduction in start up delay, 15.2% shorter rebuffering time, and 10% lower average unit traffic cost, demonstrating its effectiveness in balancing user experience and financial cost at scale.
Designing trajectories in the Earth-Moon system: a Levenberg-Marquardt approach
Trajectory design in cislunar space under a High-Fidelity Ephemeris Model (HFEM) is pursued through a nonlinear optimization perspective anchored on the transition of solutions from lower fidelity models, namely the Circular Restricted Three-Body Problem (CR3BP). The optimization problem is posed in the likeness of a multiple-shooting approach, aiming for segment-to-segment continuity while tracking proximity to the original CR3BP structures. The analysis of various formulations leads to the selection of an unconstrained least-squares problem for further investigation. The nonlinear optimization problem is convexified and the use of the Levenberg-Marquardt algorithm, as an alternative to the minimum-norm update equation found in most literature, is investigated for its control over the update step and inherent robustness. Additional techniques such as adaptive weighting are employed to further consolidate the behavior of the proposed algorithm in challenging scenarios. Numerical trials evaluate the adequacy of the methodology presented and compare it to the minimum-norm baseline over various application cases, including the generation of quasi-periodic trajectories and orbital transfers between them. The proposed approach is found to outperform the baseline in applications where the initial guess is poor and the ease of including proximity constraints provides benefits in control over the shape of the converged solution.
comment: Preprint submitted to Advances in Space Research
Sliding-Mode Control Strategies for PMSM speed control: A Comprehensive Review, Taxonomy and Research Gaps
Permanent Magnet Synchronous Motors (PMSMs) are widely employed in high-performance drive systems due to their high efficiency, power density, and precise dynamic behavior. However, nonlinearities, load disturbances, and parameter uncertainties present persistent challenges to control. Sliding-Mode Control (SMC) remains one of the most reliable strategies for high-performance PMSM drives. Yet, the rapid proliferation of adaptive, fractional-order, and intelligent variants has fragmented recent literature. This paper presents a comprehensive review and taxonomy of SMC-based PMSM speed-control methods published between 2020 and 2025. More than 200 studies are systematically analyzed and classified according to control order, surface design, disturbance-observer integration, optimization approach, and intelligent augmentation. Trends in publication activity, dominant hybrid structures, and application domains are quantitatively summarized. The review reveals a clear evolution from conventional discontinuous SMC toward adaptive, higher-order, and data-driven frameworks that mitigate chattering while preserving robustness. Persistent research gaps are identified in hardware validation, energy-efficiency assessment, and real-time tuning strategies. The taxonomy and critical synthesis provided herein establish a coherent reference for researchers and form the conceptual foundation for the companion paper (Part II), which delivers a unified benchmark and comparative simulation study of representative SMC designs.
comment: 30 Pages, 7 Fugures, 3 Tables
MPC-based motion planning for non-holonomic systems in non-convex domains
Motivated by the application of using model predictive control (MPC) for motion planning of autonomous mobile robots, a form of output tracking MPC for non-holonomic systems and with non-convex constraints is studied. Although the advantages of using MPC for motion planning have been demonstrated in several papers, in most of the available fundamental literature on output tracking MPC it is assumed, often implicitly, that the model is holonomic and generally the state or output constraints must be convex. Thus, in application-oriented publications, empirical results dominate and the topic of proving completeness, in particular under which assumptions the target is always reached, has received comparatively little attention. To address this gap, we present a novel MPC formulation that guarantees convergence to the desired target under realistic assumptions, which can be verified in relevant real-world scenarios.
comment: Preprint of ECC 2025 submission
MMRHP: A Miniature Mixed-Reality HIL Platform for Auditable Closed-Loop Evaluation
Validation of autonomous driving systems requires a trade-off between test fidelity, cost, and scalability. While miniaturized hardware-in-the-loop (HIL) platforms have emerged as a promising solution, a systematic framework supporting rigorous quantitative analysis is generally lacking, limiting their value as scientific evaluation tools. To address this challenge, we propose MMRHP, a miniature mixed-reality HIL platform that elevates miniaturized testing from functional demonstration to rigorous, reproducible quantitative analysis. The core contributions are threefold. First, we propose a systematic three-phase testing process oriented toward the Safety of the Intended Functionality(SOTIF)standard, providing actionable guidance for identifying the performance limits and triggering conditions of otherwise correctly functioning systems. Second, we design and implement a HIL platform centered around a unified spatiotemporal measurement core to support this process, ensuring consistent and traceable quantification of physical motion and system timing. Finally, we demonstrate the effectiveness of this solution through comprehensive experiments. The platform itself was first validated, achieving a spatial accuracy of 10.27 mm RMSE and a stable closed-loop latency baseline of approximately 45 ms. Subsequently, an in-depth Autoware case study leveraged this validated platform to quantify its performance baseline and identify a critical performance cliff at an injected latency of 40 ms. This work shows that a structured process, combined with a platform offering a unified spatio-temporal benchmark, enables reproducible, interpretable, and quantitative closed-loop evaluation of autonomous driving systems.
Coverage-Recon: Coordinated Multi-Drone Image Sampling with Online Map Feedback
This article addresses collaborative 3D map reconstruction using multiple drones. Achieving high-quality reconstruction requires capturing images of keypoints within the target scene from diverse viewing angles, and coverage control offers an effective framework to meet this requirement. Meanwhile, recent advances in real-time 3D reconstruction algorithms make it possible to render an evolving map during flight, enabling immediate feedback to guide drone motion. Building on this, we present Coverage-Recon, a novel coordinated image sampling algorithm that integrates online map feedback to improve reconstruction quality on-the-fly. In Coverage-Recon, the coordinated motion of drones is governed by a Quadratic Programming (QP)-based angle-aware coverage controller, which ensures multi-viewpoint image capture while enforcing safety constraints. The captured images are processed in real time by the NeuralRecon algorithm to generate an evolving 3D mesh. Mesh changes across the scene are interpreted as indicators of reconstruction uncertainty and serve as feedback to update the importance index of the coverage control as the map evolves. The effectiveness of Coverage-Recon is validated through simulation and experiments, demonstrating both qualitatively and quantitatively that incorporating online map feedback yields more complete and accurate 3D reconstructions than conventional methods. Project page: https://htnk-lab.github.io/coverage-recon/
comment: Submitted to IEEE Transactions on Control Systems Technology (under review). Project page: https://htnk-lab.github.io/coverage-recon/
Explicit Reformulation of Discrete Distributionally Robust Optimization Problems
Distributionally robust optimization (DRO) is an effective framework for controlling real-world systems with various uncertainties, typically modeled using distributional uncertainty balls. However, DRO problems often involve infinitely many inequality constraints, rendering exact solutions computationally expensive. In this study, we propose a discrete DRO (DDRO) method that significantly simplifies the problem by reducing it to a single trivial constraint. Specifically, the proposed method utilizes two types of distributional uncertainty balls to reformulate the DDRO problem into a single-layer smooth convex program, significantly improving tractability. Furthermore, we provide practical guidance for selecting the appropriate ball sizes. The original DDRO problem is further reformulated into two optimization problems: one minimizing the mean and standard deviation, and the other minimizing the conditional value at risk (CVaR). These formulations account for the choice of ball sizes, thereby enhancing the practical applicability of the method. The proposed method was applied to a distributionally robust patrol-agent design problem, identifying a Pareto front in which the mean and standard deviation of the mean hitting time varied by up to 3% and 14%, respectively, while achieving a CVaR reduction of up to 13%.
comment: 15 pages, 4 figures, This paper is submitted to a journal for possible publication
Distributed Allocation and Resource Scheduling Algorithms Resilient to Link Failure
Distributed resource allocation (DRA) is fundamental to modern networked systems, spanning applications from economic dispatch in smart grids to CPU scheduling in data centers. Conventional DRA approaches require reliable communication, yet real-world networks frequently suffer from link failures, packet drops, and communication delays due to environmental conditions, network congestion, and security threats. We introduce a novel resilient DRA algorithm that addresses these critical challenges, and our main contributions are as follows: (1) guaranteed constraint feasibility at all times, ensuring resource-demand balance even during algorithm termination or network disruption; (2) robust convergence despite sector-bound nonlinearities at nodes/links, accommodating practical constraints like quantization and saturation; and (3) optimal performance under merely uniformly-connected networks, eliminating the need for continuous connectivity. Unlike existing approaches that require persistent network connectivity and provide only asymptotic feasibility, our graph-theoretic solution leverages network percolation theory to maintain performance during intermittent disconnections. This makes it particularly valuable for mobile multi-agent systems where nodes frequently move out of communication range. Theoretical analysis and simulations demonstrate that our algorithm converges to optimal solutions despite heterogeneous time delays and substantial link failures, significantly advancing the reliability of distributed resource allocation in practical network environments.
comment: European Journal of Control
Brute-force search and Warshall algorithms for matrix-weighted graphs
Although research on the control of networked systems has grown considerably, graph-theoretic and algorithmic studies on matrix-weighted graphs remain limited. To bridge this gap in the literature, this work introduces two algorithms-the brute-force search and the Warshall algorithm-for determining connectedness and clustering in undirected matrix-weighted graphs. The proposed algorithms, which are derived from a sufficient condition for connectedness, emphasize a key distinction between matrix-weighted and scalar-weighted graphs. While the existence of a path between two vertices guarantees connectedness in scalar-weighted graphs, connectedness in matrix-weighted graphs is a collective contribution of all paths joining the two vertices. Proofs of correctness and numerical examples are provided to illustrate and demonstrate the effectiveness of the algorithms.
comment: 20 pages, 6 figures, preprint
Urban Air Mobility: A Review of Recent Advances in Communication, Management, and Sustainability
Urban Air Mobility (UAM) offers a transformative approach to addressing urban congestion, improving accessibility, and advancing environmental sustainability. Rapid progress has emerged in three tightly linked domains since 2020: (1) Communication, where dynamic spectrum allocation and low-altitude channel characterization support reliable air-ground data exchange; (2) UAM management, with novel air-traffic control concepts for dense, largely autonomous urban airspace; and (3) Sustainability, driven by energy-efficient propulsion, integrated charging infrastructure, and holistic environmental assessment. This paper reviews and synthesizes the latest research across these areas, compares the state-of-the-art solutions, and outlines the technological and infrastructural milestones that are critical to realizing a scalable, sustainable UAM ecosystem.
comment: This work has been accepted by the 2025 International Conference on Cyber-physical Social Intelligence (CPSI 2025)
Harmonic Cancellation in Multi-Electrolyzer P2H Plants via Phasor-Modulated Production Scheduling
Thyristor rectifiers (TRs) are cost-effective power supplies for hydrogen electrolyzers (ELZs) but introduce harmonic distortion that may violate grid codes. This letter proposes a self-governing harmonic mitigation strategy through coordinated operation of multiple ELZs in large power-to-hydrogen (P2H) plants. First, the harmonic model of TR-powered ELZs is derived, revealing a natural harmonic cancellation mechanism among them. Based on this, a system-level operation scheme based on phasor modulation is developed and integrated into plant scheduling. Case studies demonstrate that the proposed method reduces harmonic currents by 21.2%-39.7% and ensures grid-code compliance, with only a 0.25% loss in hydrogen output, while increasing total revenue by over 21\% compared to production-oriented strategies.
comment: This work has been submitted to the IEEE for possible publication
Wisdom of Crowds Effects under Antagonistic Interactions and Correlated Opinions
This paper investigates the wisdom of crowds of linear opinion dynamics models evolving on signed networks. Conditions are given under which models such as the DeGroot, Friedkin-Johnsen (FJ) and concatenated FJ models improve or undermine collective wisdom. The extension to dependent initial opinions is also presented, highlighting how the correlation structure influences the feasibility and geometry of the wisdom-improving regions.
A Configurable Simulation Framework for Safety Assessment of Vulnerable Road Users
Ensuring the safety of vulnerable road users (VRUs), including pedestrians, cyclists, electric scooter riders, and motorcyclists, remains a major challenge for advanced driver assistance systems (ADAS) and connected and automated vehicles (CAV) technologies. Real-world VRU tests are expensive and sometimes cannot capture or repeat rare and hazardous events. In this paper, we present a lightweight, configurable simulation framework that follows European New Car Assessment Program (Euro NCAP) VRU testing protocols. A rule-based finite-state machine (FSM) is developed as a motion planner to provide vehicle automation during the VRU interaction. We also integrate ego-vehicle perception and idealized Vehicle-to-Everything (V2X) awareness to demonstrate safety margins in different scenarios. This work provides an extensible platform for rapid and repeatable VRU safety validation, paving the way for broader case-study deployment in diverse, user-defined settings, which will be essential for building a more VRU-friendly and sustainable intelligent transportation system.
comment: This work has been accepted by the 2025 International Conference on Cyber-physical Social Intelligence (CPSI 2025)
Time Domain Differential Equation Based Fault Location Identification in Mixed Overhead-Underground Power Distribution Systems
This paper proposes a time-domain fault location identification method for mixed overhead-underground power distribution systems that can handle challenging fault scenarios such as sub-cycle faults, arcing faults and evolving faults. The proposed method is formulated based on differential equations of the system and accounts for the peculiarities of power distribution systems with distributed generations. It considers the presence of loads, multi-phase laterals and sub-laterals, heterogenous overhead and underground lines, and infeeds and remote-end fault current contributions of distributed generations. It utilizes data collected by limited number of measuring devices installed in modern power distribution systems to systematically eliminate possible multiple fault location estimations to provide a single correct estimation of the actual location of the fault. The performance of the proposed method is demonstrated using IEEE 34-node test system.
A Learning-based Model Reference Adaptive Controller Implemented on a Prosthetic Hand Wrist
The functionality and natural motion of prosthetic hands remain limited by the challenges in controlling compliant wrist mechanisms. Current control strategies often lack adaptability and incur high computational costs, which impedes real-time deployment in assistive robotics. To address this gap, this study presents a computationally efficient Neural Network (NN)-based Model Reference Adaptive Controller (MRAC) for a tendon-driven soft continuum wrist integrated with a prosthetic hand. The dynamic modeling of the wrist is formulated using Timoshenko beam theory, capturing both shear and bending deformations. The proposed NN-MRAC estimates the required tendon forces from deflection errors and minimizes deviation from a reference model through online adaptation. Simulation results demonstrate improved precision with a root mean square error (RMSE) of $6.14 \times 10^{-4}$ m and a settling time of $3.2$s. Experimental validations confirm real-time applicability, with an average RMSE of $5.66 \times 10^{-3}$ m, steady-state error of $8.05 \times 10^{-3}$ m, and settling time of $1.58$ s. These results highlight the potential of the controller to enhance motion accuracy and responsiveness in soft prosthetic systems, thereby advancing the integration of adaptive intelligent control in wearable assistive devices.
comment: International Conference on Social Robotics + AI
Graph Analysis to Fully Automate Fault Location Identification in Power Distribution Systems
This paper proposes graph analysis methods to fully automate the fault location identification task in power distribution systems. The proposed methods take basic unordered data from power distribution systems as input, including branch parameters, load values, and the location of measuring devices. The proposed data preparation and analysis methods automatically identify the system's topology and extract essential information, such as faulted paths, structures, loading of laterals and sublaterals, and estimate the fault location accordingly. The proposed graph analysis methods do not require complex node and branch numbering processes or renumbering following changes in the system topology. The proposed methods eliminate the need for human intervention at any step of the fault location identification process. They are scalable and applicable to systems of any size. The performance of the proposed algorithm is demonstrated using the IEEE 34-bus distribution test system.
Convex Maneuver Planning for Spacecraft Collision Avoidance
Conjunction analysis and maneuver planning for spacecraft collision avoidance remains a manual and time-consuming process, typically involving repeated forward simulations of hand-designed maneuvers. With the growing density of satellites in low-Earth orbit (LEO), autonomy is becoming essential for efficiently evaluating and mitigating collisions. In this work, we present an algorithm to design low-thrust collision-avoidance maneuvers for short-term conjunction events. We first formulate the problem as a nonconvex quadratically-constrained quadratic program (QCQP), which we then relax into a convex semidefinite program (SDP) using Shor's relaxation. We demonstrate empirically that the relaxation is tight, which enables the recovery of globally optimal solutions to the original nonconvex problem. Our formulation produces a minimum-energy solution while ensuring a desired probability of collision at the time of closest approach. Finally, if the desired probability of collision cannot be satisfied, we relax this constraint into a penalty, yielding a minimum-risk solution. We validate our algorithm with a high-fidelity simulation of a satellite conjunction in low-Earth orbit with a simulated conjunction data message (CDM), demonstrating its effectiveness in reducing collision risk.
comment: 8 pages, 6 figures, Accepted to International Space Robotics Conference
Motion Planning and Control of an Overactuated 4-Wheel Drive with Constrained Independent Steering
This paper addresses motion planning and con- trol of an overactuated 4-wheel drive train with independent steering (4WIS) where mechanical constraints prevent the wheels from executing full 360-degree rotations (swerve). The configuration space of such a robot is constrained and contains discontinuities that affect the smoothness of the robot motion. We introduce a mathematical formulation of the steering constraints and derive discontinuity planes that partition the velocity space into regions of smooth and efficient motion. We further design the motion planner for path tracking and ob- stacle avoidance that explicitly accounts for swerve constraints and the velocity transition smoothness. The motion controller uses local feedback to generate actuation from the desired velocity, while properly handling the discontinuity crossing by temporarily stopping the motion and repositioning the wheels. We implement the proposed motion planner as an extension to ROS Navigation package and evaluate the system in simulation and on a physical robot.
comment: 7 pages, 5 figures, 3 tables, video available at https://youtu.be/8l9s2Wb_vec, To appear at IEEE 2025 International Conference on Advanced Robotics
Extreme value distributions of peak loads for non-residential customer segments SC
Electrical grid congestion is a growing challenge in Europe, driving the need for accurate prediction of load, particularly of peak load. Non-time-resolved models of peak load offer the advantages of simplicity and compactness, and among them, Velander's formula (VF) is a traditional method that has been used for decades. Moreover, VF can be adapted into a quantile VF, which learns a truncated cumulative distribution function of peak load based on electricity consumption. This paper proposes a mathematical model based on extreme value theory to characterize the probability distribution of peak load for large non-residential customers. The model underpins the quantile VF as demonstrated through multiple quantile regression and reduces its representation to just four parameters without sacrificing predictive performance. Moreover, using maximum likelihood estimation and the likelihood ratio test, we validate that the probability distribution of peak load of analysed groups belongs to the heavy-tailed Fr\'echet class.
comment: Submitted to Power Systems Computation Conference (PSCC) 2026
Extending Resource Constrained Project Scheduling to Mega-Projects with Model-Based Systems Engineering & Hetero-functional Graph Theory
Within the project management context, project scheduling serves as an indispensable component, functioning as a fundamental tool for planning, monitoring, controlling, and managing projects more broadly. Although the resource-constrained project scheduling problem (RCPSP) lies at the core of project management activities, it remains largely disconnected from the broader literature on model-based systems engineering (MBSE), thereby limiting its integration into the design and management of complex systems. The original contribution of this paper is twofold. First, the paper seeks to reconcile the RCPSP with the broader literature and vocabulary of model-based systems engineering and hetero-functional graph theory (HFGT). A concrete translation pipeline from an activity-on-node network to a SysML activity diagram, and then to an operand net is constructed. Using this representation, it specializes the hetero-functional network minimum-cost flow (HFNMCF) formulation to the RCPSP context as a systematic means of HFGT for quantitative analysis and proves that the RCPSP is recoverable as a special case of a broader model. Secondly, on an illustrative instance with renewable and non-renewable operands, the specialized HFNMCF, while producing similar schedules, yields explicit explanations of the project states that enable richer monitoring and control. Overall, the framework preserves the strengths of the classical RCPSP while accommodating real-world constraints and enterprise-level decision processes encountered in large, complex megaprojects.
Active Cooling Device: A Flexible, Lab-Scale Experimental Unit to Develop Spatio-Temporal Temperature Control Strategies
We present an experimental unit that realizes the ``multi-input, multi-output manifold'' thermal management technology proposed by Lamarre & Raymond (2023). The proposed setup can be used for experiments aimed at controlling spatiotemporal temperature distribution. Temperature control is achieved by impinging coolant fluid jets, leveraging a manifold of channels targeted to the surface. The direction of the fluid is controlled by shifting the role of channels between inputs, outputs, or closing them. Files associated with this work include Computer-Aided Design (CAD) STEP files, Gerber files to manufacture a Printed Circuit Board (PCB), and a Graphical User Interface (GUI) written in Python. We provide a step-by-step guide to assemble the experimental setup. We also provide instructions to interact with the setup through the GUI, which allows for real-time tracking of sample temperature and flow rates per flow control device. Additionally, we provide examples of usage of the setup, including system characterization with step response, Proportional-Integral-Derivative performance tracking, and disturbance rejection in a coupled system. Extending the application is accessible through the files provided in the open repository associated with this work. The active cooling device presents a safe, flexible, and complete design, allowing for lab-scale assessment of the performance of custom temperature control strategies using enclosed impinging jets.
Through-the-Earth Magnetic Induction Communication and Networking: A Comprehensive Survey
Magnetic induction (MI) communication (MIC) has emerged as a promising candidate for underground communication networks due to its excellent penetration capabilities. Integration with Space-Air-Ground-Underground (SAGUI) networks in next-generation mobile communication systems requires a well-defined network architecture. A recent discovery in MIC research, MI fast fading, remains in its early stages and presents unique challenges. This paper provides a comprehensive survey on through-the-earth (TTE) MIC, covering MI applications, channel modeling, point-to-point MIC design, relay techniques, network frameworks, and emerging technologies. We compare various MIC applications to highlight TTE-specific challenges and review the principles of channel modeling, addressing both MI slow fading and MI fast fading, along with its potential impact on existing MIC theories. We conduct a fine-grained decomposition of MI channel power gain into four distinct physical parameters, and propose a novel geometric model to analyze MI fast fading. We also summarize MI relay techniques, examine crosstalk effects in relay and high-density networks, and explore key research tasks within the OSI framework for a holistic MI network protocol in SAGUI. To bridge the gaps identified, we propose a MIC framework that supports TCP/IP and Linux, enabling full implementation of existing and emerging MIC solutions. This framework empowers researchers to leverage Linux resources and deep learning platforms for accelerated development of MIC in SAGUI networks. Remaining research challenges, open issues, and promising novel techniques are further identified to advance MIC research.
comment: This work has been accepted by the IEEE Communications Surveys & Tutorials (COMST) for publication. The final published version will be available on IEEE Xplore
PowerChain: A Verifiable Agentic AI System for Automating Distribution Grid Analyses
Rapid electrification and decarbonization are increasing the complexity of distribution grid (DG) operation and planning, necessitating advanced computational analyses to ensure reliability and resilience. These analyses depend on disparate workflows comprising complex models, function calls, and data pipelines that require substantial expert knowledge and remain difficult to automate. Workforce and budget constraints further limit utilities' ability to apply such analyses at scale. To address this gap, we build an agentic system PowerChain, which is capable of autonomously performing complex grid analyses. Existing agentic AI systems are typically developed in a bottom-up manner with customized context for predefined analysis tasks; therefore, they do not generalize to tasks that the agent has never seen. In comparison, to generalize to unseen DG analysis tasks, PowerChain dynamically generates structured context by leveraging supervisory signals from self-contained power systems tools (e.g., GridLAB-D) and an optimized set of expert-annotated and verified reasoning trajectories. For complex DG tasks defined in natural language, empirical results on real utility data demonstrate that PowerChain achieves up to a 144/% improvement in performance over baselines.
Rethink Repeatable Measures of Robot Performance with Statistical Query
For a general standardized testing algorithm designed to evaluate a specific aspect of a robot's performance, several key expectations are commonly imposed. Beyond accuracy (i.e., closeness to a typically unknown ground-truth reference) and efficiency (i.e., feasibility within acceptable testing costs and equipment constraints), one particularly important attribute is repeatability. Repeatability refers to the ability to consistently obtain the same testing outcome when similar testing algorithms are executed on the same subject robot by different stakeholders, across different times or locations. However, achieving repeatable testing has become increasingly challenging as the components involved grow more complex, intelligent, diverse, and, most importantly, stochastic. While related efforts have addressed repeatability at ethical, hardware, and procedural levels, this study focuses specifically on repeatable testing at the algorithmic level. Specifically, we target the well-adopted class of testing algorithms in standardized evaluation: statistical query (SQ) algorithms (i.e., algorithms that estimate the expected value of a bounded function over a distribution using sampled data). We propose a lightweight, parameterized, and adaptive modification applicable to any SQ routine, whether based on Monte Carlo sampling, importance sampling, or adaptive importance sampling, that makes it provably repeatable, with guaranteed bounds on both accuracy and efficiency. We demonstrate the effectiveness of the proposed approach across three representative scenarios: (i) established and widely adopted standardized testing of manipulators, (ii) emerging intelligent testing algorithms for operational risk assessment in automated vehicles, and (iii) developing use cases involving command tracking performance evaluation of humanoid robots in locomotion tasks.
One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares
While large machine learning models have shown remarkable performance in various domains, their training typically requires iterating for many passes over the training data. However, due to computational and memory constraints and potential privacy concerns, storing and accessing all the data is impractical in many real-world scenarios where the data arrives in a stream. In this paper, we investigate the problem of one-pass learning, in which a model is trained on sequentially arriving data without retraining on previous datapoints. Motivated by the demonstrated effectiveness of overparameterized models and the phenomenon of benign overfitting, we propose Orthogonal Recursive Fitting (ORFit), an algorithm for one-pass learning which seeks to perfectly fit each new datapoint while minimally altering the predictions on previous datapoints. ORFit updates the parameters in a direction orthogonal to past gradients, similar to orthogonal gradient descent (OGD) in continual learning. We show that, interestingly, ORFit's update leads to an operation similar to the recursive least-squares (RLS) algorithm in adaptive filtering but with significantly improved memory and computational efficiency, i.e., linear, instead of quadratic, in the number of parameters. To further reduce memory usage, we leverage the structure of the streaming data via an incremental principal component analysis (IPCA). We show that using the principal components is minimax optimal, i.e., it minimizes the worst-case forgetting of previous predictions for unknown future updates. Further, we prove that, for overparameterized linear models, the parameter vector obtained by ORFit matches what the standard multi-pass stochastic gradient descent (SGD) would converge to. Finally, we extend our results to the nonlinear setting for highly overparameterized models, relevant for deep learning.
comment: Journal extension of v1: Y. Min, K, Ahn, N. Azizan, "One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares," IEEE Conference on Decision and Control, 2022
Nondeterminism-Aware Optimistic Verification for Floating-Point Neural Networks
Neural networks increasingly run on hardware outside the user's control (cloud GPUs, inference marketplaces). Yet ML-as-a-Service reveals little about what actually ran or whether returned outputs faithfully reflect the intended inputs. Users lack recourse against service downgrades (model swaps, quantization, graph rewrites, or discrepancies like altered ad embeddings). Verifying outputs is hard because floating-point(FP) execution on heterogeneous accelerators is inherently nondeterministic. Existing approaches are either impractical for real FP neural networks or reintroduce vendor trust. We present NAO: a Nondeterministic tolerance Aware Optimistic verification protocol that accepts outputs within principled operator-level acceptance regions rather than requiring bitwise equality. NAO combines two error models: (i) sound per-operator IEEE-754 worst-case bounds and (ii) tight empirical percentile profiles calibrated across hardware. Discrepancies trigger a Merkle-anchored, threshold-guided dispute game that recursively partitions the computation graph until one operator remains, where adjudication reduces to a lightweight theoretical-bound check or a small honest-majority vote against empirical thresholds. Unchallenged results finalize after a challenge window, without requiring trusted hardware or deterministic kernels. We implement NAO as a PyTorch-compatible runtime and a contract layer currently deployed on Ethereum Holesky testnet. The runtime instruments graphs, computes per-operator bounds, and runs unmodified vendor kernels in FP32 with negligible overhead (0.3% on Qwen3-8B). Across CNNs, Transformers and diffusion models on A100, H100, RTX6000, RTX4090, empirical thresholds are $10^2-10^3$ times tighter than theoretical bounds, and bound-aware adversarial attacks achieve 0% success. NAO reconciles scalability with verifiability for real-world heterogeneous ML compute.
comment: 17 pages, 7 figures
Unifying Direct and Indirect Learning for Safe Control of Linear Systems
This paper develops learning-enabled safe controllers for linear systems subject to system uncertainties and bounded disturbances. Given the disturbance zonotope, the databased closed-loop dynamics (CLDs) are first characterized using a matrix zonotope (MZ), and refined through several steps to yield a constrained matrix zonotope (CMZ). This refinement is achieved by introducing conformal equality constraints that eliminate incompatible disturbance realizations. More precisely, prior knowledge and observed data are used separately to construct CMZ representations of disturbance sequences that conform to both data and prior knowledge, and are intersected by the initial MZ of the disturbance sequence, producing a refined CMZ. This approach reduces conservatism. To further reduce the conservativeness, we unify open-loop learning with closed-loop learning by presenting a novel set-membership identification method that models open-loop dynamics as a CMZ. The prior knowledge serves as an initial feasible open-loop model set (FOLMS) of this CMZ, which is refined into a posterior set whenever new informative online data becomes available. This posterior FOLMS then adaptively replaces the prior knowledge set employed in the disturbance elimination of the closed-loop learning process. The resulting refined parameterized set of CLD is subsequently leveraged to directly and adaptively learn a controller that robustly enforces safety. Toward this goal, we formulate a linear programming problem that guarantees {\lambda}contractiveness of a polyhedral safe set. A simulation example is provided to validate the effectiveness of the proposed approach and support the theoretical results.
comment: arXiv admin note: text overlap with arXiv:2502.04195
Dynamic object goal pushing with mobile manipulators through model-free constrained reinforcement learning ICRA 2025
Non-prehensile pushing to move and reorient objects to a goal is a versatile loco-manipulation skill. In the real world, the object's physical properties and friction with the floor contain significant uncertainties, which makes the task challenging for a mobile manipulator. In this paper, we develop a learning-based controller for a mobile manipulator to move an unknown object to a desired position and yaw orientation through a sequence of pushing actions. The proposed controller for the robotic arm and the mobile base motion is trained using a constrained Reinforcement Learning (RL) formulation. We demonstrate its capability in experiments with a quadrupedal robot equipped with an arm. The learned policy achieves a success rate of 91.35% in simulation and at least 80% on hardware in challenging scenarios. Through our extensive hardware experiments, we show that the approach demonstrates high robustness against unknown objects of different masses, materials, sizes, and shapes. It reactively discovers the pushing location and direction, thus achieving contact-rich behavior while observing only the pose of the object. Additionally, we demonstrate the adaptive behavior of the learned policy towards preventing the object from toppling.
comment: presented at ICRA 2025, Video: https://youtu.be/wGAdPGVf9Ws?si=pi83ONWofHHqbFG0
Neural 3D Object Reconstruction with Small-Scale Unmanned Aerial Vehicles
Small Unmanned Aerial Vehicles (UAVs) exhibit immense potential for navigating indoor and hard-to-reach areas, yet their significant constraints in payload and autonomy have largely prevented their use for complex tasks like high-quality 3-Dimensional (3D) reconstruction. To overcome this challenge, we introduce a novel system architecture that enables fully autonomous, high-fidelity 3D scanning of static objects using UAVs weighing under 100 grams. Our core innovation lies in a dual-reconstruction pipeline that creates a real-time feedback loop between data capture and flight control. A near-real-time (near-RT) process uses Structure from Motion (SfM) to generate an instantaneous pointcloud of the object. The system analyzes the model quality on the fly and dynamically adapts the UAV's trajectory to intelligently capture new images of poorly covered areas. This ensures comprehensive data acquisition. For the final, detailed output, a non-real-time (non-RT) pipeline employs a Neural Radiance Fields (NeRF)-based Neural 3D Reconstruction (N3DR) approach, fusing SfM-derived camera poses with precise Ultra Wide-Band (UWB) location data to achieve superior accuracy. We implemented and validated this architecture using Crazyflie 2.1 UAVs. Our experiments, conducted in both single- and multi-UAV configurations, conclusively show that dynamic trajectory adaptation consistently improves reconstruction quality over static flight paths. This work demonstrates a scalable and autonomous solution that unlocks the potential of miniaturized UAVs for fine-grained 3D reconstruction in constrained environments, a capability previously limited to much larger platforms.
comment: 13 pages, 16 figures, 3 tables, 45 references
A Flow-Based Model for Conditional and Probabilistic Electricity Consumption Profile Generation and Prediction
Residential Load Profile (RLP) generation and prediction are critical for the operation and planning of distribution networks, especially as diverse low-carbon technologies (e.g., photovoltaic and electric vehicles) are increasingly adopted. This paper introduces a novel flow-based generative model, termed Full Convolutional Profile Flow (FCPFlow), which is uniquely designed for both conditional and unconditional RLP generation, and for probabilistic load forecasting. By introducing two new layers--the invertible linear layer and the invertible normalization layer--the proposed FCPFlow architecture shows three main advantages compared to traditional statistical and contemporary deep generative models: 1) it is well-suited for RLP generation under continuous conditions, such as varying weather and annual electricity consumption, 2) it demonstrates superior scalability in different datasets compared to traditional statistical models, and 3) it also demonstrates better modeling capabilities in capturing the complex correlation of RLPs compared with deep generative models.
Asynchronous Federated Learning: A Scalable Approach for Decentralized Machine Learning
Federated Learning (FL) has emerged as a powerful paradigm for decentralized machine learning, enabling collaborative model training across diverse clients without sharing raw data. However, traditional FL approaches often face limitations in scalability and efficiency due to their reliance on synchronous client updates, which can result in significant delays and increased communication overhead, particularly in heterogeneous and dynamic environments. To address these challenges in this paper, we propose an Asynchronous Federated Learning (AFL) algorithm, which allows clients to update the global model independently and asynchronously. Our key contributions include a comprehensive convergence analysis of AFL in the presence of client delays and model staleness. By leveraging martingale difference sequence theory and variance bounds, we ensure robust convergence despite asynchronous updates. Assuming strongly convex local objective functions, we establish bounds on gradient variance under random client sampling and derive a recursion formula quantifying the impact of client delays on convergence. Furthermore, we demonstrate the practical applicability of the AFL algorithm by training decentralized linear regression and Support Vector Machine (SVM) based classifiers and compare its results with synchronous FL algorithm to effectively handling non-IID data distributed among clients. The proposed AFL algorithm addresses key limitations of traditional FL methods, such as inefficiency due to global synchronization and susceptibility to client drift. It enhances scalability, robustness, and efficiency in real-world settings with heterogeneous client populations and dynamic network conditions. Our results underscore the potential of AFL to drive advancements indistributed learning systems, particularly for large-scale, privacy-preserving applications in resource-constrained environments.
Finite-time Safety and Reach-avoid Verification of Stochastic Discrete-time Systems
This paper studies finite-time safety and reach-avoid verification for stochastic discrete-time dynamical systems. The aim is to ascertain lower and upper bounds of the probability that, within a predefined finite-time horizon, a system starting from an initial state in a safe set will either exit the safe set (safety verification) or reach a target set while remaining within the safe set until the first encounter with the target (reach-avoid verification). We introduce novel barrier-like sufficient conditions for characterizing these bounds, which either complement existing ones or fill gaps. Finally, we demonstrate the efficacy of these conditions on two examples.
comment: To appear in Information and Computation
Optimal state estimation: Turnpike analysis and performance results
In this paper, we introduce turnpike arguments in the context of optimal state estimation. In particular, we show that the optimal solution of the state estimation problem involving all available past data serves as turnpike for the solutions of truncated problems involving only a subset of the data. We mathematically formalize this phenomenon and derive a sufficient condition that relies on a decaying sensitivity property of the underlying nonlinear program. As second contribution, we show how a specific turnpike property can be used to establish performance guarantees when approximating the optimal solution of the full problem by a sequence of truncated problems, and we show that the resulting performance (both averaged and non-averaged) is approximately optimal with error terms that can be made arbitrarily small by an appropriate choice of the horizon length. In addition, we discuss interesting implications of these results for the practically relevant case of moving horizon estimation and illustrate our results with a numerical example.
comment: replaced with final version
Finite Sample Identification of Partially Observed Bilinear Dynamical Systems
We consider the problem of learning a realization of a partially observed bilinear dynamical system (BLDS) from noisy input-output data. Given a single trajectory of input-output samples, we provide a finite time analysis for learning the system's Markov-like parameters, from which a balanced realization of the bilinear system can be obtained. Our bilinear system identification algorithm learns the system's Markov-like parameters by regressing the outputs to highly correlated, nonlinear, and heavy-tailed covariates. Moreover, the stability of BLDS depends on the sequence of inputs used to excite the system. These properties, unique to partially observed bilinear dynamical systems, pose significant challenges to the analysis of our algorithm for learning the unknown dynamics. We address these challenges and provide high probability error bounds on our identification algorithm under a uniform stability assumption. Our analysis provides insights into system theoretic quantities that affect learning accuracy and sample complexity. Lastly, we perform numerical experiments with synthetic data to reinforce these insights.
Sub-optimality of the Separation Principle for Quadratic Control from Bilinear Observations
We consider the problem of controlling a linear dynamical system from bilinear observations with minimal quadratic cost. Despite the similarity of this problem to standard linear quadratic Gaussian (LQG) control, we show that when the observation model is bilinear, neither does the Separation Principle hold, nor is the optimal controller affine in the estimated state. Moreover, the cost-to-go is non-convex in the control input. Hence, finding an analytical expression for the optimal feedback controller is difficult in general. Under certain settings, we show that the standard LQG controller locally maximizes the cost instead of minimizing it. Furthermore, the optimal controllers (derived analytically) are not unique and are nonlinear in the estimated state. We also introduce a notion of input-dependent observability and derive conditions under which the Kalman filter covariance remains bounded. We illustrate our theoretical results through numerical experiments in multiple synthetic settings.
Iterated Invariant Extended Kalman Filter (IterIEKF)
We study the mathematical properties of the Invariant Extended Kalman Filter (IEKF) when iterating on the measurement update step, following the principles of the well-known Iterated Extended Kalman Filter. This iterative variant of the IEKF (IterIEKF) systematically improves its accuracy through Gauss-Newton-based relinearization, and exhibits additional theoretical properties, particularly in the low-noise regime, that resemble those of the linear Kalman filter. We apply the proposed approach to the problem of estimating the extended pose of a crane payload using an inertial measurement unit. Our results suggest that the IterIEKF significantly outperforms the IEKF when measurements are highly accurate.
Robotics
Robobench: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models as Embodied Brain
Building robots that can perceive, reason, and act in dynamic, unstructured environments remains a core challenge. Recent embodied systems often adopt a dual-system paradigm, where System 2 handles high-level reasoning while System 1 executes low-level control. In this work, we refer to System 2 as the embodied brain, emphasizing its role as the cognitive core for reasoning and decision-making in manipulation tasks. Given this role, systematic evaluation of the embodied brain is essential. Yet existing benchmarks emphasize execution success, or when targeting high-level reasoning, suffer from incomplete dimensions and limited task realism, offering only a partial picture of cognitive capability. To bridge this gap, we introduce RoboBench, a benchmark that systematically evaluates multimodal large language models (MLLMs) as embodied brains. Motivated by the critical roles across the full manipulation pipeline, RoboBench defines five dimensions-instruction comprehension, perception reasoning, generalized planning, affordance prediction, and failure analysis-spanning 14 capabilities, 25 tasks, and 6092 QA pairs. To ensure realism, we curate datasets across diverse embodiments, attribute-rich objects, and multi-view scenes, drawing from large-scale real robotic data. For planning, RoboBench introduces an evaluation framework, MLLM-as-world-simulator. It evaluate embodied feasibility by simulating whether predicted plans can achieve critical object-state changes. Experiments on 14 MLLMs reveal fundamental limitations: difficulties with implicit instruction comprehension, spatiotemporal reasoning, cross-scenario planning, fine-grained affordance understanding, and execution failure diagnosis. RoboBench provides a comprehensive scaffold to quantify high-level cognition, and guide the development of next-generation embodied MLLMs. The project page is in https://robo-bench.github.io.
SoftMimic: Learning Compliant Whole-body Control from Examples
We introduce SoftMimic, a framework for learning compliant whole-body control policies for humanoid robots from example motions. Imitating human motions with reinforcement learning allows humanoids to quickly learn new skills, but existing methods incentivize stiff control that aggressively corrects deviations from a reference motion, leading to brittle and unsafe behavior when the robot encounters unexpected contacts. In contrast, SoftMimic enables robots to respond compliantly to external forces while maintaining balance and posture. Our approach leverages an inverse kinematics solver to generate an augmented dataset of feasible compliant motions, which we use to train a reinforcement learning policy. By rewarding the policy for matching compliant responses rather than rigidly tracking the reference motion, SoftMimic learns to absorb disturbances and generalize to varied tasks from a single motion clip. We validate our method through simulations and real-world experiments, demonstrating safe and effective interaction with the environment.
comment: Website: https://gmargo11.github.io/softmimic/
Botany-Bot: Digital Twin Monitoring of Occluded and Underleaf Plant Structures with Gaussian Splats IROS 2025
Commercial plant phenotyping systems using fixed cameras cannot perceive many plant details due to leaf occlusion. In this paper, we present Botany-Bot, a system for building detailed "annotated digital twins" of living plants using two stereo cameras, a digital turntable inside a lightbox, an industrial robot arm, and 3D segmentated Gaussian Splat models. We also present robot algorithms for manipulating leaves to take high-resolution indexable images of occluded details such as stem buds and the underside/topside of leaves. Results from experiments suggest that Botany-Bot can segment leaves with 90.8% accuracy, detect leaves with 86.2% accuracy, lift/push leaves with 77.9% accuracy, and take detailed overside/underside images with 77.3% accuracy. Code, videos, and datasets are available at https://berkeleyautomation.github.io/Botany-Bot/.
comment: 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025)
RESample: A Robust Data Augmentation Framework via Exploratory Sampling for Robotic Manipulation ICRA2026
Vision-Language-Action models (VLAs) have demonstrated remarkable performance on complex robotic manipulation tasks through imitation learning. However, existing imitation learning datasets contain only successful trajectories and lack failure or recovery data, especially for out-of-distribution (OOD) states where the robot deviates from the main policy due to minor perturbations or errors, leading VLA models to struggle with states deviating from the training distribution. To this end, we propose an automated OOD data augmentation framework named RESample through exploratory sampling. Specifically, we first leverage offline reinforcement learning to obtain an action-value network that accurately identifies sub-optimal actions under the current manipulation policy. We further sample potential OOD states from trajectories via rollout, and design an exploratory sampling mechanism that adaptively incorporates these action proxies into the training dataset to ensure efficiency. Subsequently, our framework explicitly encourages the VLAs to recover from OOD states and enhances their robustness against distributional shifts. We conduct extensive experiments on the LIBERO benchmark as well as real-world robotic manipulation tasks, demonstrating that RESample consistently improves the stability and generalization ability of VLA models.
comment: 9 pages,7 figures, submitted to ICRA2026
Learned Inertial Odometry for Cycling Based on Mixture of Experts Algorithm
With the rapid growth of bike sharing and the increasing diversity of cycling applications, accurate bicycle localization has become essential. traditional GNSS-based methods suffer from multipath effects, while existing inertial navigation approaches rely on precise modeling and show limited robustness. Tight Learned Inertial Odometry (TLIO) achieves low position drift by combining raw IMU data with predicted displacements by neural networks, but its high computational cost restricts deployment on mobile devices. To overcome this, we extend TLIO to bicycle localization and introduce an improved Mixture-of Experts (MoE) model that reduces both training and inference costs. Experiments show that, compared to the state-of-the-art LLIO framework, our method achieves comparable accuracy while reducing parameters by 64.7% and computational cost by 81.8%.
Intent-Driven LLM Ensemble Planning for Flexible Multi-Robot Disassembly: Demonstration on EV Batteries
This paper addresses the problem of planning complex manipulation tasks, in which multiple robots with different end-effectors and capabilities, informed by computer vision, must plan and execute concatenated sequences of actions on a variety of objects that can appear in arbitrary positions and configurations in unstructured scenes. We propose an intent-driven planning pipeline which can robustly construct such action sequences with varying degrees of supervisory input from a human using simple language instructions. The pipeline integrates: (i) perception-to-text scene encoding, (ii) an ensemble of large language models (LLMs) that generate candidate removal sequences based on the operator's intent, (iii) an LLM-based verifier that enforces formatting and precedence constraints, and (iv) a deterministic consistency filter that rejects hallucinated objects. The pipeline is evaluated on an example task in which two robot arms work collaboratively to dismantle an Electric Vehicle battery for recycling applications. A variety of components must be grasped and removed in specific sequences, determined by human instructions and/or by task-order feasibility decisions made by the autonomous system. On 200 real scenes with 600 operator prompts across five component classes, we used metrics of full-sequence correctness and next-task correctness to evaluate and compare five LLM-based planners (including ablation analyses of pipeline components). We also evaluated the LLM-based human interface in terms of time to execution and NASA TLX with human participant experiments. Results indicate that our ensemble-with-verification approach reliably maps operator intent to safe, executable multi-robot plans while maintaining low user effort.
comment: This work is funded by the project called "Research and Development of a Highly Automated and Safe Streamlined Process for Increasing Lithium-ion Battery Repurposing and Recycling" (REBELION) under Grant 101104241, and partially supported by the Ministry of National Education, Republic of Turkey. Submitted to Frontiers for Review
An Empirical Study of Lagrangian Methods in Safe Reinforcement Learning
In safety-critical domains such as robotics, navigation and power systems, constrained optimization problems arise where maximizing performance must be carefully balanced with associated constraints. Safe reinforcement learning provides a framework to address these challenges, with Lagrangian methods being a popular choice. However, the effectiveness of Lagrangian methods crucially depends on the choice of the Lagrange multiplier $\lambda$, which governs the trade-off between return and constraint cost. A common approach is to update the multiplier automatically during training. Although this is standard in practice, there remains limited empirical evidence on the robustness of an automated update and its influence on overall performance. Therefore, we analyze (i) optimality and (ii) stability of Lagrange multipliers in safe reinforcement learning across a range of tasks. We provide $\lambda$-profiles that give a complete visualization of the trade-off between return and constraint cost of the optimization problem. These profiles show the highly sensitive nature of $\lambda$ and moreover confirm the lack of general intuition for choosing the optimal value $\lambda^*$. Our findings additionally show that automated multiplier updates are able to recover and sometimes even exceed the optimal performance found at $\lambda^*$ due to the vast difference in their learning trajectories. Furthermore, we show that automated multiplier updates exhibit oscillatory behavior during training, which can be mitigated through PID-controlled updates. However, this method requires careful tuning to achieve consistently better performance across tasks. This highlights the need for further research on stabilizing Lagrangian methods in safe reinforcement learning. The code used to reproduce our results can be found at https://github.com/lindsayspoor/Lagrangian_SafeRL.
Distributed Spatial-Temporal Trajectory Optimization for Unmanned-Aerial-Vehicle Swarm
Swarm trajectory optimization problems are a well-recognized class of multi-agent optimal control problems with strong nonlinearity. However, the heuristic nature of needing to set the final time for agents beforehand and the time-consuming limitation of the significant number of iterations prohibit the application of existing methods to large-scale swarm of Unmanned Aerial Vehicles (UAVs) in practice. In this paper, we propose a spatial-temporal trajectory optimization framework that accomplishes multi-UAV consensus based on the Alternating Direction Multiplier Method (ADMM) and uses Differential Dynamic Programming (DDP) for fast local planning of individual UAVs. The introduced framework is a two-level architecture that employs Parameterized DDP (PDDP) as the trajectory optimizer for each UAV, and ADMM to satisfy the local constraints and accomplish the spatial-temporal parameter consensus among all UAVs. This results in a fully distributed algorithm called Distributed Parameterized DDP (D-PDDP). In addition, an adaptive tuning criterion based on the spectral gradient method for the penalty parameter is proposed to reduce the number of algorithmic iterations. Several simulation examples are presented to verify the effectiveness of the proposed algorithm.
HumanMPC - Safe and Efficient MAV Navigation among Humans
Safe and efficient robotic navigation among humans is essential for integrating robots into everyday environments. Most existing approaches focus on simplified 2D crowd navigation and fail to account for the full complexity of human body dynamics beyond root motion. We present HumanMPC, a Model Predictive Control (MPC) framework for 3D Micro Air Vehicle (MAV) navigation among humans that combines theoretical safety guarantees with data-driven models for realistic human motion forecasting. Our approach introduces a novel twist to reachability-based safety formulation that constrains only the initial control input for safety while modeling its effects over the entire planning horizon, enabling safe yet efficient navigation. We validate HumanMPC in both simulated experiments using real human trajectories and in the real-world, demonstrating its effectiveness across tasks ranging from goal-directed navigation to visual servoing for human tracking. While we apply our method to MAVs in this work, it is generic and can be adapted by other platforms. Our results show that the method ensures safety without excessive conservatism and outperforms baseline approaches in both efficiency and reliability.
Inverse Optimal Control of Muscle Force Sharing During Pathological Gait
Muscle force sharing is typically resolved by minimizing a specific objective function to approximate neural control strategies. An inverse optimal control approach was applied to identify the "best" objective function, among a positive linear combination of basis objective functions, associated with the gait of two post-stroke males, one high-functioning (subject S1) and one low-functioning (subject S2). It was found that the "best" objective function is subject- and leg-specific. No single function works universally well, yet the best options are usually differently weighted combinations of muscle activation- and power-minimization. Subject-specific inverse optimal control models performed best on their respective limbs (\textbf{RMSE 178/213 N, CC 0.71/0.61} for non-paretic and paretic legs of S1; \textbf{RMSE 205/165 N, CC 0.88/0.85} for respective legs of S2), but cross-subject generalization was poor, particularly for paretic legs. Moreover, minimizing the root mean square of muscle power emerged as important for paretic limbs, while minimizing activation-based functions dominated for non-paretic limbs. This may suggest different neural control strategies between affected and unaffected sides, possibly altered by the presence of spasticity. Among the 15 considered objective functions commonly used in inverse dynamics-based computations, the root mean square of muscle power was the only one explicitly incorporating muscle velocity, leading to a possible model for spasticity in the paretic limbs. Although this objective function has been rarely used, it may be relevant for modeling pathological gait, such as post-stroke gait.
A Generalization of Input-Output Linearization via Dynamic Switching Between Melds of Output Functions
This letter presents a systematic framework for switching between different sets of outputs for the control of nonlinear systems via feedback linearization. We introduce the concept of a meld to formally define a valid, feedback-linearizable subset of outputs that can be selected from a larger deck of possible outputs. The main contribution is a formal proof establishing that under suitable dwell-time and compatibility conditions, it is possible to switch between different melds while guaranteeing the uniform boundedness of the system state. We further show that the error dynamics of the active outputs remain exponentially stable within each switching interval and that outputs common to consecutive melds are tracked seamlessly through transitions. The proposed theory is valid for any feedback linearizable nonlinear system, such as, e.g., robots, aerial and terrestrial vehicles, etc.. We demonstrate it on a simple numerical simulation of a robotic manipulator.
From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors
Existing vision-language-action (VLA) models act in 3D real-world but are typically built on 2D encoders, leaving a spatial reasoning gap that limits generalization and adaptability. Recent 3D integration techniques for VLAs either require specialized sensors and transfer poorly across modalities, or inject weak cues that lack geometry and degrade vision-language alignment. In this work, we introduce FALCON (From Spatial to Action), a novel paradigm that injects rich 3D spatial tokens into the action head. FALCON leverages spatial foundation models to deliver strong geometric priors from RGB alone, and includes an Embodied Spatial Model that can optionally fuse depth, or pose for higher fidelity when available, without retraining or architectural changes. To preserve language reasoning, spatial tokens are consumed by a Spatial-Enhanced Action Head rather than being concatenated into the vision-language backbone. These designs enable FALCON to address limitations in spatial representation, modality transferability, and alignment. In comprehensive evaluations across three simulation benchmarks and eleven real-world tasks, our proposed FALCON achieves state-of-the-art performance, consistently surpasses competitive baselines, and remains robust under clutter, spatial-prompt conditioning, and variations in object scale and height.
comment: Project page: https://falcon-vla.github.io/
Integrating Trustworthy Artificial Intelligence with Energy-Efficient Robotic Arms for Waste Sorting
This paper presents a novel methodology that integrates trustworthy artificial intelligence (AI) with an energy-efficient robotic arm for intelligent waste classification and sorting. By utilizing a convolutional neural network (CNN) enhanced through transfer learning with MobileNetV2, the system accurately classifies waste into six categories: plastic, glass, metal, paper, cardboard, and trash. The model achieved a high training accuracy of 99.8% and a validation accuracy of 80.5%, demonstrating strong learning and generalization. A robotic arm simulator is implemented to perform virtual sorting, calculating the energy cost for each action using Euclidean distance to ensure optimal and efficient movement. The framework incorporates key elements of trustworthy AI, such as transparency, robustness, fairness, and safety, making it a reliable and scalable solution for smart waste management systems in urban settings.
comment: 5 pages, 2 figures
Graph Attention-Guided Search for Dense Multi-Agent Pathfinding
Finding near-optimal solutions for dense multi-agent pathfinding (MAPF) problems in real-time remains challenging even for state-of-the-art planners. To this end, we develop a hybrid framework that integrates a learned heuristic derived from MAGAT, a neural MAPF policy with a graph attention scheme, into a leading search-based algorithm, LaCAM. While prior work has explored learning-guided search in MAPF, such methods have historically underperformed. In contrast, our approach, termed LaGAT, outperforms both purely search-based and purely learning-based methods in dense scenarios. This is achieved through an enhanced MAGAT architecture, a pre-train-then-fine-tune strategy on maps of interest, and a deadlock detection scheme to account for imperfect neural guidance. Our results demonstrate that, when carefully designed, hybrid search offers a powerful solution for tightly coupled, challenging multi-agent coordination problems.
Bridging Embodiment Gaps: Deploying Vision-Language-Action Models on Soft Robots NeurIPS 2025
Robotic systems are increasingly expected to operate in human-centered, unstructured environments where safety, adaptability, and generalization are essential. Vision-Language-Action (VLA) models have been proposed as a language guided generalized control framework for real robots. However, their deployment has been limited to conventional serial link manipulators. Coupled by their rigidity and unpredictability of learning based control, the ability to safely interact with the environment is missing yet critical. In this work, we present the deployment of a VLA model on a soft continuum manipulator to demonstrate autonomous safe human-robot interaction. We present a structured finetuning and deployment pipeline evaluating two state-of-the-art VLA models (OpenVLA-OFT and $\pi_0$) across representative manipulation tasks, and show while out-of-the-box policies fail due to embodiment mismatch, through targeted finetuning the soft robot performs equally to the rigid counterpart. Our findings highlight the necessity of finetuning for bridging embodiment gaps, and demonstrate that coupling VLA models with soft robots enables safe and flexible embodied AI in human-shared environments.
comment: Accepted by NeurIPS 2025 SpaVLE workshop. 4 pages, 2 figures(in main paper, excluding references and supplements)
M2H: Multi-Task Learning with Efficient Window-Based Cross-Task Attention for Monocular Spatial Perception IROS 2025
Deploying real-time spatial perception on edge devices requires efficient multi-task models that leverage complementary task information while minimizing computational overhead. This paper introduces Multi-Mono-Hydra (M2H), a novel multi-task learning framework designed for semantic segmentation and depth, edge, and surface normal estimation from a single monocular image. Unlike conventional approaches that rely on independent single-task models or shared encoder-decoder architectures, M2H introduces a Window-Based Cross-Task Attention Module that enables structured feature exchange while preserving task-specific details, improving prediction consistency across tasks. Built on a lightweight ViT-based DINOv2 backbone, M2H is optimized for real-time deployment and serves as the foundation for monocular spatial perception systems supporting 3D scene graph construction in dynamic environments. Comprehensive evaluations show that M2H outperforms state-of-the-art multi-task models on NYUDv2, surpasses single-task depth and semantic baselines on Hypersim, and achieves superior performance on the Cityscapes dataset, all while maintaining computational efficiency on laptop hardware. Beyond benchmarks, M2H is validated on real-world data, demonstrating its practicality in spatial perception tasks.
comment: Accepted to the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025). 8 pages, 7 figures
Interactive Force-Impedance Control
Human collaboration with robots requires flexible role adaptation, enabling robot to switch between active leader and passive follower. Effective role switching depends on accurately estimating human intention, which is typically achieved through external force analysis, nominal robot dynamics, or data-driven approaches. However, these methods are primarily effective in contact-sparse environments. When robots under hybrid or unified force-impedance control physically interact with active humans or non-passive environments, the robotic system may lose passivity and thus compromise safety. To address this challenge, this paper proposes the unified Interactive Force-Impedance Control (IFIC) framework that adapts to the interaction power flow, ensuring effortless and safe interaction in contact-rich environments. The proposed control architecture is formulated within a port-Hamiltonian framework, incorporating both interaction and task control ports, through which system passivity is guaranteed.
DDBot: Differentiable Physics-based Digging Robot for Unknown Granular Materials
Automating the manipulation of granular materials poses significant challenges due to complex contact dynamics, unpredictable material properties, and intricate system states. Existing approaches often fail to achieve efficiency and accuracy in such tasks. To fill the research gap, this paper studies the small-scale and high-precision granular material digging task with unknown physical properties. A new framework, named differentiable digging robot (DDBot), is proposed to manipulate granular materials, including sand and soil. Specifically, we equip DDBot with a differentiable physics-based simulator, tailored for granular material manipulation, powered by GPU-accelerated parallel computing and automatic differentiation. DDBot can perform efficient differentiable system identification and high-precision digging skill optimisation for unknown granular materials, which is enabled by a differentiable skill-to-action mapping, a task-oriented demonstration method, gradient clipping and line search-based gradient descent. Experimental results show that DDBot can efficiently (converge within 5 to 20 minutes) identify unknown granular material dynamics and optimise digging skills, with high-precision results in zero-shot real-world deployments, highlighting its practicality. Benchmark results against state-of-the-art baselines also confirm the robustness and efficiency of DDBot in such digging tasks.
comment: Accepted as a regular paper by the IEEE Transactions on Robotics
Implicit State Estimation via Video Replanning
Video-based representations have gained prominence in planning and decision-making due to their ability to encode rich spatiotemporal dynamics and geometric relationships. These representations enable flexible and generalizable solutions for complex tasks such as object manipulation and navigation. However, existing video planning frameworks often struggle to adapt to failures at interaction time due to their inability to reason about uncertainties in partially observed environments. To overcome these limitations, we introduce a novel framework that integrates interaction-time data into the planning process. Our approach updates model parameters online and filters out previously failed plans during generation. This enables implicit state estimation, allowing the system to adapt dynamically without explicitly modeling unknown state variables. We evaluate our framework through extensive experiments on a new simulated manipulation benchmark, demonstrating its ability to improve replanning performance and advance the field of video-based decision-making.
Floating-Base Deep Lagrangian Networks
Grey-box methods for system identification combine deep learning with physics-informed constraints, capturing complex dependencies while improving out-of-distribution generalization. Yet, despite the growing importance of floating-base systems such as humanoids and quadrupeds, current grey-box models ignore their specific physical constraints. For instance, the inertia matrix is not only positive definite but also exhibits branch-induced sparsity and input independence. Moreover, the 6x6 composite spatial inertia of the floating base inherits properties of single-rigid-body inertia matrices. As we show, this includes the triangle inequality on the eigenvalues of the composite rotational inertia. To address the lack of physical consistency in deep learning models of floating-base systems, we introduce a parameterization of inertia matrices that satisfies all these constraints. Inspired by Deep Lagrangian Networks (DeLaN), we train neural networks to predict physically plausible inertia matrices that minimize inverse dynamics error under Lagrangian mechanics. For evaluation, we collected and released a dataset on multiple quadrupeds and humanoids. In these experiments, our Floating-Base Deep Lagrangian Networks (FeLaN) achieve highly competitive performance on both simulated and real robots, while providing greater physical interpretability.
High-Level Multi-Robot Trajectory Planning And Spurious Behavior Detection
The reliable execution of high-level missions in multi-robot systems with heterogeneous agents, requires robust methods for detecting spurious behaviors. In this paper, we address the challenge of identifying spurious executions of plans specified as a Linear Temporal Logic (LTL) formula, as incorrect task sequences, violations of spatial constraints, timing inconsis- tencies, or deviations from intended mission semantics. To tackle this, we introduce a structured data generation framework based on the Nets-within-Nets (NWN) paradigm, which coordinates robot actions with LTL-derived global mission specifications. We further propose a Transformer-based anomaly detection pipeline that classifies robot trajectories as normal or anomalous. Experi- mental evaluations show that our method achieves high accuracy (91.3%) in identifying execution inefficiencies, and demonstrates robust detection capabilities for core mission violations (88.3%) and constraint-based adaptive anomalies (66.8%). An ablation experiment of the embedding and architecture was carried out, obtaining successful results where our novel proposition performs better than simpler representations.
comment: 6 pages,3 figures, Iberian Robotics Conference 2025
An adaptive hierarchical control framework for quadrupedal robots in planetary exploration
Planetary exploration missions require robots capable of navigating extreme and unknown environments. While wheeled rovers have dominated past missions, their mobility is limited to traversable surfaces. Legged robots, especially quadrupeds, can overcome these limitations by handling uneven, obstacle-rich, and deformable terrains. However, deploying such robots in unknown conditions is challenging due to the need for environment-specific control, which is infeasible when terrain and robot parameters are uncertain. This work presents a modular control framework that combines model-based dynamic control with online model adaptation and adaptive footstep planning to address uncertainties in both robot and terrain properties. The framework includes state estimation for quadrupeds with and without contact sensing, supports runtime reconfiguration, and is integrated into ROS 2 with open-source availability. Its performance was validated on two quadruped platforms, multiple hardware architectures, and in a volcano field test, where the robot walked over 700 m.
comment: Presented at 18th Symposium on Advanced Space Technologies in Robotics and Automation (ASTRA)
Pole-Image: A Self-Supervised Pole-Anchored Descriptor for Long-Term LiDAR Localization and Map Maintenance
Long-term autonomy for mobile robots requires both robust self-localization and reliable map maintenance. Conventional landmark-based methods face a fundamental trade-off between landmarks with high detectability but low distinctiveness (e.g., poles) and those with high distinctiveness but difficult stable detection (e.g., local point cloud structures). This work addresses the challenge of descriptively identifying a unique "signature" (local point cloud) by leveraging a detectable, high-precision "anchor" (like a pole). To solve this, we propose a novel canonical representation, "Pole-Image," as a hybrid method that uses poles as anchors to generate signatures from the surrounding 3D structure. Pole-Image represents a pole-like landmark and its surrounding environment, detected from a LiDAR point cloud, as a 2D polar coordinate image with the pole itself as the origin. This representation leverages the pole's nature as a high-precision reference point, explicitly encoding the "relative geometry" between the stable pole and the variable surrounding point cloud. The key advantage of pole landmarks is that "detection" is extremely easy. This ease of detection allows the robot to easily track the same pole, enabling the automatic and large-scale collection of diverse observational data (positive pairs). This data acquisition feasibility makes "Contrastive Learning (CL)" applicable. By applying CL, the model learns a viewpoint-invariant and highly discriminative descriptor. The contributions are twofold: 1) The descriptor overcomes perceptual aliasing, enabling robust self-localization. 2) The high-precision encoding enables high-sensitivity change detection, contributing to map maintenance.
comment: 4 pages, technical report
Performance Evaluation of an Integrated System for Visible Light Communication and Positioning Using an Event Camera
Event cameras, featuring high temporal resolution and high dynamic range, offer visual sensing capabilities comparable to conventional image sensors while capturing fast-moving objects and handling scenes with extreme lighting contrasts such as tunnel exits. Leveraging these properties, this study proposes a novel self-localization system that integrates visible light communication (VLC) and visible light positioning (VLP) within a single event camera. The system enables a vehicle to estimate its position even in GPS-denied environments, such as tunnels, by using VLC to obtain coordinate information from LED transmitters and VLP to estimate the distance to each transmitter. Multiple LEDs are installed on the transmitter side, each assigned a unique pilot sequence based on Walsh-Hadamard codes. The event camera identifies individual LEDs within its field of view by correlating the received signal with these codes, allowing clear separation and recognition of each light source. This mechanism enables simultaneous high-capacity MISO (multi-input single-output) communication through VLC and precise distance estimation via phase-only correlation (POC) between multiple LED pairs. To the best of our knowledge, this is the first vehicle-mounted system to achieve simultaneous VLC and VLP functionalities using a single event camera. Field experiments were conducted by mounting the system on a vehicle traveling at 30 km/h (8.3 m/s). The results demonstrated robust real-world performance, with a root mean square error (RMSE) of distance estimation within 0.75 m for ranges up to 100 m and a bit error rate (BER) below 0.01 across the same range.
comment: 7pages, APCC2025
SimpleVSF: VLM-Scoring Fusion for Trajectory Prediction of End-to-End Autonomous Driving
End-to-end autonomous driving has emerged as a promising paradigm for achieving robust and intelligent driving policies. However, existing end-to-end methods still face significant challenges, such as suboptimal decision-making in complex scenarios. In this paper,we propose SimpleVSF (Simple VLM-Scoring Fusion), a novel framework that enhances end-to-end planning by leveraging the cognitive capabilities of Vision-Language Models (VLMs) and advanced trajectory fusion techniques. We utilize the conventional scorers and the novel VLM-enhanced scorers. And we leverage a robust weight fusioner for quantitative aggregation and a powerful VLM-based fusioner for qualitative, context-aware decision-making. As the leading approach in the ICCV 2025 NAVSIM v2 End-to-End Driving Challenge, our SimpleVSF framework demonstrates state-of-the-art performance, achieving a superior balance between safety, comfort, and efficiency.
comment: 6 pages, 2 figures, 2 tables
OmniVIC: A Self-Improving Variable Impedance Controller with Vision-Language In-Context Learning for Safe Robotic Manipulation
We present OmniVIC, a universal variable impedance controller (VIC) enhanced by a vision language model (VLM), which improves safety and adaptation in any contact-rich robotic manipulation task to enhance safe physical interaction. Traditional VIC have shown advantages when the robot physically interacts with the environment, but lack generalization in unseen, complex, and unstructured safe interactions in universal task scenarios involving contact or uncertainty. To this end, the proposed OmniVIC interprets task context derived reasoning from images and natural language and generates adaptive impedance parameters for a VIC controller. Specifically, the core of OmniVIC is a self-improving Retrieval-Augmented Generation(RAG) and in-context learning (ICL), where RAG retrieves relevant prior experiences from a structured memory bank to inform the controller about similar past tasks, and ICL leverages these retrieved examples and the prompt of current task to query the VLM for generating context-aware and adaptive impedance parameters for the current manipulation scenario. Therefore, a self-improved RAG and ICL guarantee OmniVIC works in universal task scenarios. The impedance parameter regulation is further informed by real-time force/torque feedback to ensure interaction forces remain within safe thresholds. We demonstrate that our method outperforms baselines on a suite of complex contact-rich tasks, both in simulation and on real-world robotic tasks, with improved success rates and reduced force violations. OmniVIC takes a step towards bridging high-level semantic reasoning and low-level compliant control, enabling safer and more generalizable manipulation. Overall, the average success rate increases from 27% (baseline) to 61.4% (OmniVIC).
comment: Code, video and RAG dataset are available at \url{https://sites.google.com/view/omni-vic}
DiffVLA++: Bridging Cognitive Reasoning and End-to-End Driving through Metric-Guided Alignment
Conventional end-to-end (E2E) driving models are effective at generating physically plausible trajectories, but often fail to generalize to long-tail scenarios due to the lack of essential world knowledge to understand and reason about surrounding environments. In contrast, Vision-Language-Action (VLA) models leverage world knowledge to handle challenging cases, but their limited 3D reasoning capability can lead to physically infeasible actions. In this work we introduce DiffVLA++, an enhanced autonomous driving framework that explicitly bridges cognitive reasoning and E2E planning through metric-guided alignment. First, we build a VLA module directly generating semantically grounded driving trajectories. Second, we design an E2E module with a dense trajectory vocabulary that ensures physical feasibility. Third, and most critically, we introduce a metric-guided trajectory scorer that guides and aligns the outputs of the VLA and E2E modules, thereby integrating their complementary strengths. The experiment on the ICCV 2025 Autonomous Grand Challenge leaderboard shows that DiffVLA++ achieves EPDMS of 49.12.
Decentralized Real-Time Planning for Multi-UAV Cooperative Manipulation via Imitation Learning
Existing approaches for transporting and manipulating cable-suspended loads using multiple UAVs along reference trajectories typically rely on either centralized control architectures or reliable inter-agent communication. In this work, we propose a novel machine learning based method for decentralized kinodynamic planning that operates effectively under partial observability and without inter-agent communication. Our method leverages imitation learning to train a decentralized student policy for each UAV by imitating a centralized kinodynamic motion planner with access to privileged global observations. The student policy generates smooth trajectories using physics-informed neural networks that respect the derivative relationships in motion. During training, the student policies utilize the full trajectory generated by the teacher policy, leading to improved sample efficiency. Moreover, each student policy can be trained in under two hours on a standard laptop. We validate our method in both simulation and real-world environments to follow an agile reference trajectory, demonstrating performance comparable to that of centralized approaches.
comment: Accepted by IEEE MRS 2025
Efficient Vision-Language-Action Models for Embodied Manipulation: A Systematic Survey
Vision-Language-Action (VLA) models extend vision-language models to embodied control by mapping natural-language instructions and visual observations to robot actions. Despite their capabilities, VLA systems face significant challenges due to their massive computational and memory demands, which conflict with the constraints of edge platforms such as on-board mobile manipulators that require real-time performance. Addressing this tension has become a central focus of recent research. In light of the growing efforts toward more efficient and scalable VLA systems, this survey provides a systematic review of approaches for improving VLA efficiency, with an emphasis on reducing latency, memory footprint, and training and inference costs. We categorize existing solutions into four dimensions: model architecture, perception feature, action generation, and training/inference strategies, summarizing representative techniques within each category. Finally, we discuss future trends and open challenges, highlighting directions for advancing efficient embodied intelligence.
Learning to Design Soft Hands using Reward Models
Soft robotic hands promise to provide compliant and safe interaction with objects and environments. However, designing soft hands to be both compliant and functional across diverse use cases remains challenging. Although co-design of hardware and control better couples morphology to behavior, the resulting search space is high-dimensional, and even simulation-based evaluation is computationally expensive. In this paper, we propose a Cross-Entropy Method with Reward Model (CEM-RM) framework that efficiently optimizes tendon-driven soft robotic hands based on teleoperation control policy, reducing design evaluations by more than half compared to pure optimization while learning a distribution of optimized hand designs from pre-collected teleoperation data. We derive a design space for a soft robotic hand composed of flexural soft fingers and implement parallelized training in simulation. The optimized hands are then 3D-printed and deployed in the real world using both teleoperation data and real-time teleoperation. Experiments in both simulation and hardware demonstrate that our optimized design significantly outperforms baseline hands in grasping success rates across a diverse set of challenging objects.
Quality Over Quantity: Curating Contact-Based Robot Datasets Improves Learning
In this paper, we investigate the utility of datasets and whether more data or the 'right' data is advantageous for robot learning. In particular, we are interested on quantifying the utility of contact-based data as contact holds significant information for robot learning. Our approach derives a contact-aware objective function for learning object dynamics and shape from pose and contact data. We show that the contact-aware Fisher-information metric can be used to rank and curate contact-data based on how informative data is for learning. In addition, we find that selecting a reduced dataset based on this ranking improves the learning task while also making learning a deterministic process. Interestingly, our results show that more data is not necessarily advantageous, and rather, less but informative data can accelerate learning, especially depending on the contact interactions. Last, we show how our metric can be used to provide initial guidance on data curation for contact-based robot learning.
ANGEL: A Novel Gripper for Versatile and Light-touch Fruit Harvesting
Fruit harvesting remains predominantly a labor-intensive process, motivating the development of research for robotic grippers. Conventional rigid or vacuum-driven grippers require complex mechanical design or high energy consumption. Current enveloping-based fruit harvesting grippers lack adaptability to fruits of different sizes. This paper introduces a drawstring-inspired, cable-driven soft gripper for versatile and gentle fruit harvesting. The design employs 3D-printed Thermoplastic Polyurethane (TPU) pockets with integrated steel wires that constrict around the fruit when actuated, distributing pressure uniformly to minimize bruising and allow versatility to fruits of varying sizes. The lightweight structure, which requires few components, reduces mechanical complexity and cost compared to other grippers. Actuation is achieved through servo-driven cable control, while motor feedback provides autonomous grip adjustment with tunable grip strength. Experimental validation shows that, for tomatoes within the gripper's effective size range, harvesting was achieved with a 0% immediate damage rate and a bruising rate of less than 9% after five days, reinforcing the gripper's suitability for fruit harvesting.
SafeCoop: Unravelling Full Stack Safety in Agentic Collaborative Driving
Collaborative driving systems leverage vehicle-to-everything (V2X) communication across multiple agents to enhance driving safety and efficiency. Traditional V2X systems take raw sensor data, neural features, or perception results as communication media, which face persistent challenges, including high bandwidth demands, semantic loss, and interoperability issues. Recent advances investigate natural language as a promising medium, which can provide semantic richness, decision-level reasoning, and human-machine interoperability at significantly lower bandwidth. Despite great promise, this paradigm shift also introduces new vulnerabilities within language communication, including message loss, hallucinations, semantic manipulation, and adversarial attacks. In this work, we present the first systematic study of full-stack safety and security issues in natural-language-based collaborative driving. Specifically, we develop a comprehensive taxonomy of attack strategies, including connection disruption, relay/replay interference, content spoofing, and multi-connection forgery. To mitigate these risks, we introduce an agentic defense pipeline, which we call SafeCoop, that integrates a semantic firewall, language-perception consistency checks, and multi-source consensus, enabled by an agentic transformation function for cross-frame spatial alignment. We systematically evaluate SafeCoop in closed-loop CARLA simulation across 32 critical scenarios, achieving 69.15% driving score improvement under malicious attacks and up to 67.32% F1 score for malicious detection. This study provides guidance for advancing research on safe, secure, and trustworthy language-driven collaboration in transportation systems. Our project page is https://xiangbogaobarry.github.io/SafeCoop.
R2BC: Multi-Agent Imitation Learning from Single-Agent Demonstrations
Imitation Learning (IL) is a natural way for humans to teach robots, particularly when high-quality demonstrations are easy to obtain. While IL has been widely applied to single-robot settings, relatively few studies have addressed the extension of these methods to multi-agent systems, especially in settings where a single human must provide demonstrations to a team of collaborating robots. In this paper, we introduce and study Round-Robin Behavior Cloning (R2BC), a method that enables a single human operator to effectively train multi-robot systems through sequential, single-agent demonstrations. Our approach allows the human to teleoperate one agent at a time and incrementally teach multi-agent behavior to the entire system, without requiring demonstrations in the joint multi-agent action space. We show that R2BC methods match, and in some cases surpass, the performance of an oracle behavior cloning approach trained on privileged synchronized demonstrations across four multi-agent simulated tasks. Finally, we deploy R2BC on two physical robot tasks trained using real human demonstrations.
comment: 9 pages, 6 figures
Provably Optimal Reinforcement Learning under Safety Filtering
Recent advances in reinforcement learning (RL) enable its use on increasingly complex tasks, but the lack of formal safety guarantees still limits its application in safety-critical settings. A common practical approach is to augment the RL policy with a safety filter that overrides unsafe actions to prevent failures during both training and deployment. However, safety filtering is often perceived as sacrificing performance and hindering the learning process. We show that this perceived safety-performance tradeoff is not inherent and prove, for the first time, that enforcing safety with a sufficiently permissive safety filter does not degrade asymptotic performance. We formalize RL safety with a safety-critical Markov decision process (SC-MDP), which requires categorical, rather than high-probability, avoidance of catastrophic failure states. Additionally, we define an associated filtered MDP in which all actions result in safe effects, thanks to a safety filter that is considered to be a part of the environment. Our main theorem establishes that (i) learning in the filtered MDP is safe categorically, (ii) standard RL convergence carries over to the filtered MDP, and (iii) any policy that is optimal in the filtered MDP-when executed through the same filter-achieves the same asymptotic return as the best safe policy in the SC-MDP, yielding a complete separation between safety enforcement and performance optimization. We validate the theory on Safety Gymnasium with representative tasks and constraints, observing zero violations during training and final performance matching or exceeding unfiltered baselines. Together, these results shed light on a long-standing question in safety-filtered learning and provide a simple, principled recipe for safe RL: train and deploy RL policies with the most permissive safety filter that is available.
comment: 17 pages, 3 figures
MOFM-Nav: On-Manifold Ordering-Flexible Multi-Robot Navigation
This paper addresses the problem of multi-robot navigation where robots maneuver on a desired \(m\)-dimensional (i.e., \(m\)-D) manifold in the $n$-dimensional Euclidean space, and maintain a {\it flexible spatial ordering}. We consider $ m\geq 2$, and the multi-robot coordination is achieved via non-Euclidean metrics. However, since the $m$-D manifold can be characterized by the zero-level sets of $n$ implicit functions, the last $m$ entries of the GVF propagation term become {\it strongly coupled} with the partial derivatives of these functions if the auxiliary vectors are not appropriately chosen. These couplings not only influence the on-manifold maneuvering of robots, but also pose significant challenges to the further design of the ordering-flexible coordination via non-Euclidean metrics. To tackle this issue, we first identify a feasible solution of auxiliary vectors such that the last $m$ entries of the propagation term are effectively decoupled to be the same constant. Then, we redesign the coordinated GVF (CGVF) algorithm to {\it boost} the advantages of singularities elimination and global convergence by treating $m$ manifold parameters as additional $m$ virtual coordinates. Furthermore, we enable the on-manifold ordering-flexible motion coordination by allowing each robot to share $m$ virtual coordinates with its time-varying neighbors and a virtual target robot, which {\it circumvents} the possible complex calculation if Euclidean metrics were used instead. Finally, we showcase the proposed algorithm's flexibility, adaptability, and robustness through extensive simulations with different initial positions, higher-dimensional manifolds, and robot breakdown, respectively.
SPACeR: Self-Play Anchoring with Centralized Reference Models
Developing autonomous vehicles (AVs) requires not only safety and efficiency, but also realistic, human-like behaviors that are socially aware and predictable. Achieving this requires sim agent policies that are human-like, fast, and scalable in multi-agent settings. Recent progress in imitation learning with large diffusion-based or tokenized models has shown that behaviors can be captured directly from human driving data, producing realistic policies. However, these models are computationally expensive, slow during inference, and struggle to adapt in reactive, closed-loop scenarios. In contrast, self-play reinforcement learning (RL) scales efficiently and naturally captures multi-agent interactions, but it often relies on heuristics and reward shaping, and the resulting policies can diverge from human norms. We propose SPACeR, a framework that leverages a pretrained tokenized autoregressive motion model as a centralized reference policy to guide decentralized self-play. The reference model provides likelihood rewards and KL divergence, anchoring policies to the human driving distribution while preserving RL scalability. Evaluated on the Waymo Sim Agents Challenge, our method achieves competitive performance with imitation-learned policies while being up to 10x faster at inference and 50x smaller in parameter size than large generative models. In addition, we demonstrate in closed-loop ego planning evaluation tasks that our sim agents can effectively measure planner quality with fast and scalable traffic simulation, establishing a new paradigm for testing autonomous driving policies.
comment: Project page: https://spacer-ai.github.io/
SAVANT: Semantic Analysis with Vision-Augmented Anomaly deTection
Autonomous driving systems remain critically vulnerable to the long-tail of rare, out-of-distribution scenarios with semantic anomalies. While Vision Language Models (VLMs) offer promising reasoning capabilities, naive prompting approaches yield unreliable performance and depend on expensive proprietary models, limiting practical deployment. We introduce SAVANT (Semantic Analysis with Vision-Augmented Anomaly deTection), a structured reasoning framework that achieves high accuracy and recall in detecting anomalous driving scenarios from input images through layered scene analysis and a two-phase pipeline: structured scene description extraction followed by multi-modal evaluation. Our approach transforms VLM reasoning from ad-hoc prompting to systematic analysis across four semantic layers: Street, Infrastructure, Movable Objects, and Environment. SAVANT achieves 89.6% recall and 88.0% accuracy on real-world driving scenarios, significantly outperforming unstructured baselines. More importantly, we demonstrate that our structured framework enables a fine-tuned 7B parameter open-source model (Qwen2.5VL) to achieve 90.8% recall and 93.8% accuracy - surpassing all models evaluated while enabling local deployment at near-zero cost. By automatically labeling over 9,640 real-world images with high accuracy, SAVANT addresses the critical data scarcity problem in anomaly detection and provides a practical path toward reliable, accessible semantic monitoring for autonomous systems.
comment: 8 pages, 5 figures
Humanoid Goalkeeper: Learning from Position Conditioned Task-Motion Constraints
We present a reinforcement learning framework for autonomous goalkeeping with humanoid robots in real-world scenarios. While prior work has demonstrated similar capabilities on quadrupedal platforms, humanoid goalkeeping introduces two critical challenges: (1) generating natural, human-like whole-body motions, and (2) covering a wider guarding range with an equivalent response time. Unlike existing approaches that rely on separate teleoperation or fixed motion tracking for whole-body control, our method learns a single end-to-end RL policy, enabling fully autonomous, highly dynamic, and human-like robot-object interactions. To achieve this, we integrate multiple human motion priors conditioned on perceptual inputs into the RL training via an adversarial scheme. We demonstrate the effectiveness of our method through real-world experiments, where the humanoid robot successfully performs agile, autonomous, and naturalistic interceptions of fast-moving balls. In addition to goalkeeping, we demonstrate the generalization of our approach through tasks such as ball escaping and grabbing. Our work presents a practical and scalable solution for enabling highly dynamic interactions between robots and moving objects, advancing the field toward more adaptive and lifelike robotic behaviors.
RoboChallenge: Large-scale Real-robot Evaluation of Embodied Policies
Testing on real machines is indispensable for robotic control algorithms. In the context of learning-based algorithms, especially VLA models, demand for large-scale evaluation, i.e. testing a large number of models on a large number of tasks, is becoming increasingly urgent. However, doing this right is highly non-trivial, especially when scalability and reproducibility is taken into account. In this report, we describe our methodology for constructing RoboChallenge, an online evaluation system to test robotic control algorithms, and our survey of recent state-of-the-art VLA models using our initial benchmark Table30.
comment: Authors are listed in alphabetical order. The official website is located at https://robochallenge.ai
Studying the Effects of Robot Intervention on School Shooters in Virtual Reality
We advance the understanding of robotic intervention in high-risk scenarios by examining their potential to distract and impede a school shooter. To evaluate this concept, we conducted a virtual reality study with 150 university participants role-playing as a school shooter. Within the simulation, an autonomous robot predicted the shooter's movements and positioned itself strategically to interfere and distract. The strategy the robot used to approach the shooter was manipulated -- either moving directly in front of the shooter (aggressive) or maintaining distance (passive) -- and the distraction method, ranging from no additional cues (low), to siren and lights (medium), to siren, lights, and smoke to impair visibility (high). An aggressive, high-distraction robot reduced the number of victims by 46.6% relative to a no-robot control. This outcome underscores both the potential of robotic intervention to enhance safety and the pressing ethical questions surrounding their use in school environments.
comment: Preprint under review for conference publication. 10 pages, 9 figures, 3 tables (including 1-page appendix)
DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning NeurIPS 2025
Sparse-reward reinforcement learning (RL) can model a wide range of highly complex tasks. Solving sparse-reward tasks is RL's core premise, requiring efficient exploration coupled with long-horizon credit assignment, and overcoming these challenges is key for building self-improving agents with superhuman ability. Prior work commonly explores with the objective of solving many sparse-reward tasks, making exploration of individual high-dimensional, long-horizon tasks intractable. We argue that solving such challenging tasks requires solving simpler tasks that are relevant to the target task, i.e., whose achieval will teach the agent skills required for solving the target task. We demonstrate that this sense of direction, necessary for effective exploration, can be extracted from existing RL algorithms, without leveraging any prior information. To this end, we propose a method for directed sparse-reward goal-conditioned very long-horizon RL (DISCOVER), which selects exploratory goals in the direction of the target task. We connect DISCOVER to principled exploration in bandits, formally bounding the time until the target task becomes achievable in terms of the agent's initial distance to the target, but independent of the volume of the space of all tasks. We then perform a thorough evaluation in high-dimensional environments. We find that the directed goal selection of DISCOVER solves exploration problems that are beyond the reach of prior state-of-the-art exploration methods in RL.
comment: NeurIPS 2025
General agents contain world models ICML 2025
Are world models a necessary ingredient for flexible, goal-directed behaviour, or is model-free learning sufficient? We provide a formal answer to this question, showing that any agent capable of generalizing to multi-step goal-directed tasks must have learned a predictive model of its environment. We show that this model can be extracted from the agent's policy, and that increasing the agents performance or the complexity of the goals it can achieve requires learning increasingly accurate world models. This has a number of consequences: from developing safe and general agents, to bounding agent capabilities in complex environments, and providing new algorithms for eliciting world models from agents.
comment: Accepted ICML 2025. Typos corrected
Leveraging Vision-Language Models for Open-Vocabulary Instance Segmentation and Tracking
Vision-language models (VLMs) excel in visual understanding but often lack reliable grounding capabilities and actionable inference rates. Integrating them with open-vocabulary object detection (OVD), instance segmentation, and tracking leverages their strengths while mitigating these drawbacks. We utilize VLM-generated structured descriptions to identify visible object instances, collect application-relevant attributes, and inform an open-vocabulary detector to extract corresponding bounding boxes that are passed to a video segmentation model providing segmentation masks and tracking. Once initialized, this model directly extracts segmentation masks, processing image streams in real time with minimal computational overhead. Tracks can be updated online as needed by generating new structured descriptions and detections. This combines the descriptive power of VLMs with the grounding capability of OVD and the pixel-level understanding and speed of video segmentation. Our evaluation across datasets and robotics platforms demonstrates the broad applicability of this approach, showcasing its ability to extract task-specific attributes from non-standard objects in dynamic environments. Code, data, videos, and benchmarks are available at https://vlm-gist.github.io
comment: IEEE Robotics and Automation Letters (RA-L), November 2025
4D Radar-Inertial Odometry based on Gaussian Modeling and Multi-Hypothesis Scan Matching
4D millimeter-wave (mmWave) radars are sensors that provide robustness against adverse weather conditions (rain, snow, fog, etc.), and as such they are increasingly used for odometry and SLAM (Simultaneous Location and Mapping). However, the noisy and sparse nature of the returned scan data proves to be a challenging obstacle for existing registration algorithms, especially those originally intended for more accurate sensors such as LiDAR. Following the success of 3D Gaussian Splatting for vision, in this paper we propose a summarized representation for radar scenes based on global simultaneous optimization of 3D Gaussians as opposed to voxel-based approaches, and leveraging its inherent probability distribution function for registration. Moreover, we propose tackling the problem of radar noise by optimizing multiple scan matching hypotheses in order to further increase the robustness of the system against local optima of the function. Finally, following existing practice we implement an Extended Kalman Filter-based Radar-Inertial Odometry pipeline in order to evaluate the effectiveness of our system. Experiments using publicly available 4D radar datasets show that our Gaussian approach is comparable to existing registration algorithms, outperforming them in several sequences.
comment: Our code and results can be publicly accessed at: https://github.com/robotics-upo/gaussian-rio-cpp
Unseen from Seen: Rewriting Observation-Instruction Using Foundation Models for Augmenting Vision-Language Navigation
Data scarcity is a long-standing challenge in the Vision-Language Navigation (VLN) field, which extremely hinders the generalization of agents to unseen environments. Previous works primarily rely on additional simulator data or web-collected images/videos to improve the generalization. However, the simulator environments still face limited diversity, and the web-collected data often requires extensive labor to remove the noise. In this paper, we propose a Rewriting-driven AugMentation (RAM) paradigm for VLN, which directly creates the unseen observation-instruction pairs via rewriting human-annotated training data. Benefiting from our rewriting mechanism, new observation-instruction pairs can be obtained in both simulator-free and labor-saving manners to promote generalization. Specifically, we first introduce Object-Enriched Observation Rewriting, where we combine Vision-Language Models (VLMs) and Large Language Models (LLMs) to derive rewritten object-enriched scene descriptions, enabling observation synthesis with diverse objects and spatial layouts via Text-to-Image Generation Models (T2IMs). Then, we propose Observation-Contrast Instruction Rewriting, which generates observation-aligned rewritten instructions by requiring LLMs to reason the difference between original and new observations. We further develop a mixing-then-focusing training strategy with a random observation cropping scheme, effectively enhancing data distribution diversity while suppressing augmentation data noise during training. Experiments on both the discrete environments (R2R, REVERIE, and R4R datasets) and continuous environments (R2R-CE dataset) show the superior performance and impressive generalization ability of our method. Code is available at https://github.com/SaDil13/VLN-RAM.
comment: Accepted by IEEE Transactions on Neural Networks and Learning Systems
ADA-DPM: A Neural Descriptors-based Adaptive Noise Filtering Strategy for SLAM
Lidar SLAM plays a significant role in mobile robot navigation and high-definition map construction. However, existing methods often face a trade-off between localization accuracy and system robustness in scenarios with a high proportion of dynamic objects, point cloud distortion, and unstructured environments. To address this issue, we propose a neural descriptors-based adaptive noise filtering strategy for SLAM, named ADA-DPM, which improves the performance of localization and mapping tasks through three key technical innovations. Firstly, to tackle dynamic object interference, we design the Dynamic Segmentation Head to predict and filter out dynamic feature points, eliminating the ego-motion interference caused by dynamic objects. Secondly, to mitigate the impact of noise and unstructured feature points, we propose the Global Importance Scoring Head that adaptively selects high-contribution feature points while suppressing the influence of noise and unstructured feature points. Moreover, we introduce the Cross-Layer Graph Convolution Module (GLI-GCN) to construct multi-scale neighborhood graphs, fusing local structural information across different scales and improving the discriminative power of overlapping features. Finally, experimental validations on multiple public datasets confirm the effectiveness of ADA-DPM.
LANGTRAJ: Diffusion Model and Dataset for Language-Conditioned Trajectory Simulation ICCV 2025
Evaluating autonomous vehicles with controllability enables scalable testing in counterfactual or structured settings, enhancing both efficiency and safety. We introduce LangTraj, a language-conditioned scene-diffusion model that simulates the joint behavior of all agents in traffic scenarios. By conditioning on natural language inputs, LangTraj provides flexible and intuitive control over interactive behaviors, generating nuanced and realistic scenarios. Unlike prior approaches that depend on domain-specific guidance functions, LangTraj incorporates language conditioning during training, facilitating more intuitive traffic simulation control. We propose a novel closed-loop training strategy for diffusion models, explicitly tailored to enhance stability and realism during closed-loop simulation. To support language-conditioned simulation, we develop Inter-Drive, a large-scale dataset with diverse and interactive labels for training language-conditioned diffusion models. Our dataset is built upon a scalable pipeline for annotating agent-agent interactions and single-agent behaviors, ensuring rich and varied supervision. Validated on the Waymo Open Motion Dataset, LangTraj demonstrates strong performance in realism, language controllability, and language-conditioned safety-critical simulation, establishing a new paradigm for flexible and scalable autonomous vehicle testing. Project Website: https://langtraj.github.io/
comment: ICCV 2025
Robust Deterministic Policy Gradient for Disturbance Attenuation and Its Application to Quadrotor Control
Practical control systems pose significant challenges in identifying optimal control policies due to uncertainties in the system model and external disturbances. While $H_\infty$ control techniques are commonly used to design robust controllers that mitigate the effects of disturbances, these methods often require complex and computationally intensive calculations. To address this issue, this paper proposes a reinforcement learning algorithm called robust deterministic policy gradient (RDPG), which formulates the $H_\infty$ control problem as a two-player zero-sum dynamic game. In this formulation, one player (the user) aims to minimize the cost, while the other player (the adversary) seeks to maximize it. We then employ deterministic policy gradient (DPG) and its deep reinforcement learning counterpart to train a robust control policy with effective disturbance attenuation. In particular, for practical implementation, we introduce an algorithm called robust deep deterministic policy gradient (RDDPG), which employs a deep neural network architecture and integrates techniques from the twin-delayed deep deterministic policy gradient (TD3) to enhance stability and learning efficiency. To evaluate the proposed algorithm, we implement it on an unmanned aerial vehicle (UAV) tasked with following a predefined path in a disturbance-prone environment. The experimental results demonstrate that the proposed method outperforms other control approaches in terms of robustness against disturbances, enabling precise real-time tracking of moving targets even under severe disturbance conditions.
comment: 24 pages
From Perception Logs to Failure Modes: Language-Driven Semantic Clustering of Failures for Robot Safety
As robotic systems become increasingly integrated into real-world environments -- ranging from autonomous vehicles to household assistants -- they inevitably encounter diverse and unstructured scenarios that lead to failures. While such failures pose safety and reliability challenges, they also provide rich perceptual data for improving future performance. However, manually analyzing large-scale failure datasets is impractical. In this work, we present a method for automatically organizing large-scale robotic failure data into semantically meaningful failure clusters, enabling scalable learning from failure without human supervision. Our approach leverages the reasoning capabilities of Multimodal Large Language Models (MLLMs), trained on internet-scale data, to infer high-level failure causes from raw perceptual trajectories and discover interpretable structure within uncurated failure logs. These semantic clusters reveal patterns and hypothesized causes of failure, enabling scalable learning from experience. We demonstrate that the discovered failure modes can guide targeted data collection for policy refinement, accelerating iterative improvement in agent policies and overall safety. Additionally, we show that these semantic clusters can benefit online failure monitoring systems, offering a lightweight yet powerful safeguard for real-time operation. We demonstrate that this framework enhances robot learning and robustness by transforming real-world failures into actionable and interpretable signals for adaptation.
COMPASS: Cooperative Multi-Agent Persistent Monitoring using Spatio-Temporal Attention Network
Persistent monitoring of dynamic targets is essential in real-world applications such as disaster response, environmental sensing, and wildlife conservation, where mobile agents must continuously gather information under uncertainty. We propose COMPASS, a multi-agent reinforcement learning (MARL) framework that enables decentralized agents to persistently monitor multiple moving targets efficiently. We model the environment as a graph, where nodes represent spatial locations and edges capture topological proximity, allowing agents to reason over structured layouts and revisit informative regions as needed. Each agent independently selects actions based on a shared spatio-temporal attention network that we design to integrate historical observations and spatial context. We model target dynamics using Gaussian Processes (GPs), which support principled belief updates and enable uncertainty-aware planning. We train COMPASS using centralized value estimation and decentralized policy execution under an adaptive reward setting. Our extensive experiments demonstrate that COMPASS consistently outperforms strong baselines in uncertainty reduction, target coverage, and coordination efficiency across dynamic multi-target scenarios.
comment: Accepted at IEEE MRS 2025
Optimal swimming with body compliance in an overdamped medium
Elongate animals and robots use undulatory body waves to locomote through diverse environments. Geometric mechanics provides a framework to model and optimize such systems in highly damped environments, connecting a prescribed shape change pattern (gait) with locomotion displacement. However, the practical applicability of controlling compliant physical robots remains to be demonstrated. In this work, we develop a framework based on geometric mechanics to predict locomotor performance and search for optimal swimming strategies of compliant swimmers. We introduce a compliant extension of Purcell's three-link swimmer by incorporating series-connected springs at the joints. Body dynamics are derived using resistive force theory. Geometric mechanics is incorporated into movement prediction and into an optimization framework that identifies strategies for controlling compliant swimmers to achieve maximal displacement. We validate our framework on a physical cable-driven three-link limbless robot and demonstrate accurate prediction and optimization of locomotor performance under varied programmed, state-dependent compliance in a granular medium. Our results establish a systematic, physics-based approach for modeling and controlling compliant swimming locomotion, highlighting compliance as a design feature that can be exploited for robust movement in both homogeneous and heterogeneous environments.
Validation of collision-free spheres of Stewart-Gough platforms for constant orientations using the Application Programming Interface of a CAD software
This paper presents a method of validation of the size of the largest collision-free sphere (CFS) of a 6-6 Stewart-Gough platform manipulator (SGPM) for a given orientation of its moving platform (MP) using the Application Programming Interface (API) of a CAD software. The position of the MP is updated via the API in an automated manner over a set of samples within a shell enclosing the surface of the CFS. For each pose of the manipulator, each pair of legs is investigated for mutual collisions. The CFS is considered safe or validated iff none of the points falling inside the CFS lead to a collision between any pair of legs. This approach can not only validate the safety of a precomputed CFS, but also estimate the same for any spatial parallel manipulator.
FlySearch: Exploring how vision-language models explore NeurIPS 2025
The real world is messy and unstructured. Uncovering critical information often requires active, goal-driven exploration. It remains to be seen whether Vision-Language Models (VLMs), which recently emerged as a popular zero-shot tool in many difficult tasks, can operate effectively in such conditions. In this paper, we answer this question by introducing FlySearch, a 3D, outdoor, photorealistic environment for searching and navigating to objects in complex scenes. We define three sets of scenarios with varying difficulty and observe that state-of-the-art VLMs cannot reliably solve even the simplest exploration tasks, with the gap to human performance increasing as the tasks get harder. We identify a set of central causes, ranging from vision hallucination, through context misunderstanding, to task planning failures, and we show that some of them can be addressed by finetuning. We publicly release the benchmark, scenarios, and the underlying codebase.
comment: NeurIPS 2025 Datasets and Benchmarks track
Sim2Dust: Mastering Dynamic Waypoint Tracking on Granular Media
Reliable autonomous navigation across the unstructured terrains of distant planetary surfaces is a critical enabler for future space exploration. However, the deployment of learning-based controllers is hindered by the inherent sim-to-real gap, particularly for the complex dynamics of wheel interactions with granular media. This work presents a complete sim-to-real framework for developing and validating robust control policies for dynamic waypoint tracking on such challenging surfaces. We leverage massively parallel simulation to train reinforcement learning agents across a vast distribution of procedurally generated environments with randomized physics. These policies are then transferred zero-shot to a physical wheeled rover operating in a lunar-analogue facility. Our experiments systematically compare multiple reinforcement learning algorithms and action smoothing filters to identify the most effective combinations for real-world deployment. Crucially, we provide strong empirical evidence that agents trained with procedural diversity achieve superior zero-shot performance compared to those trained on static scenarios. We also analyze the trade-offs of fine-tuning with high-fidelity particle physics, which offers minor gains in low-speed precision at a significant computational cost. Together, these contributions establish a validated workflow for creating reliable learning-based navigation systems, marking a substantial step towards deploying autonomous robots in the final frontier.
comment: Accepted for publication at the 2025 International Conference on Space Robotics (iSpaRo) | The source code is available at https://github.com/AndrejOrsula/space_robotics_bench
STITCHER: Constrained Trajectory Planning in Complex Environments with Real-Time Motion Primitive Search
Autonomous high-speed navigation through large, complex environments requires real-time generation of agile trajectories that are dynamically feasible, collision-free, and satisfy state or actuator constraints. Modern trajectory planning techniques primarily use numerical optimization, as they enable the systematic computation of high-quality, expressive trajectories that satisfy various constraints. However, stringent requirements on computation time and the risk of numerical instability can limit the use of optimization-based planners in safety-critical scenarios. This work presents an optimization-free planning framework called STITCHER that stitches short trajectory segments together with graph search to compute long-range, expressive, and near-optimal trajectories in real-time. STITCHER outperforms modern optimization-based planners through our innovative planning architecture and several algorithmic developments that make real-time planning possible. Extensive simulation testing is performed to analyze the algorithmic components that make up STITCHER, along with a thorough comparison with two state-of-the-art optimization planners. Simulation tests show that safe trajectories can be created within a few milliseconds for paths that span the entirety of two 50 m x 50 m environments. Hardware tests with a custom quadrotor verify that STITCHER can produce trackable paths in real-time while respecting nonconvex constraints, such as limits on tilt angle and motor forces, which are otherwise hard to include in optimization-based planners.
SDTagNet: Leveraging Text-Annotated Navigation Maps for Online HD Map Construction NeurIPS 2025
Autonomous vehicles rely on detailed and accurate environmental information to operate safely. High definition (HD) maps offer a promising solution, but their high maintenance cost poses a significant barrier to scalable deployment. This challenge is addressed by online HD map construction methods, which generate local HD maps from live sensor data. However, these methods are inherently limited by the short perception range of onboard sensors. To overcome this limitation and improve general performance, recent approaches have explored the use of standard definition (SD) maps as prior, which are significantly easier to maintain. We propose SDTagNet, the first online HD map construction method that fully utilizes the information of widely available SD maps, like OpenStreetMap, to enhance far range detection accuracy. Our approach introduces two key innovations. First, in contrast to previous work, we incorporate not only polyline SD map data with manually selected classes, but additional semantic information in the form of textual annotations. In this way, we enrich SD vector map tokens with NLP-derived features, eliminating the dependency on predefined specifications or exhaustive class taxonomies. Second, we introduce a point-level SD map encoder together with orthogonal element identifiers to uniformly integrate all types of map elements. Experiments on Argoverse 2 and nuScenes show that this boosts map perception performance by up to +5.9 mAP (+45%) w.r.t. map construction without priors and up to +3.2 mAP (+20%) w.r.t. previous approaches that already use SD map priors. Code is available at https://github.com/immel-f/SDTagNet
comment: 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
Hierarchical Planning for Long-Horizon Multi-Target Tracking Under Target Motion Uncertainty
Achieving persistent tracking of multiple dynamic targets over a large spatial area poses significant challenges for a single-robot system with constrained sensing capabilities. As the robot moves to track different targets, the ones outside the field of view accumulate uncertainty, making them progressively harder to track. An effective path planning algorithm must manage uncertainty over a long horizon and account for the risk of permanently losing track of targets that remain unseen for too long. However, most existing approaches rely on short planning horizons and assume small, bounded environments, resulting in poor tracking performance and target loss in large-scale scenarios. In this paper, we present a hierarchical planner for tracking multiple moving targets with an aerial vehicle. To address the challenge of tracking non-static targets, our method incorporates motion models and uncertainty propagation during path execution, allowing for more informed decision-making. We decompose the multi-target tracking task into sub-tasks of single target search and detection, and our proposed pipeline consists a novel low-level coverage planner that enables searching for a target in an evolving belief area, and an estimation method to assess the likelihood of success for each sub-task, making it possible to convert the active target tracking task to a Markov decision process (MDP) that we solve with a tree-based algorithm to determine the sequence of sub-tasks. We validate our approach in simulation, demonstrating its effectiveness compared to existing planners for active target tracking tasks, and our proposed planner outperforms existing approaches, achieving a reduction of 11-70% in final uncertainty across different environments.
comment: Accepted to IEEE Robotics and Automation Letters (RA-L), 2025
Learning by Watching: A Review of Video-based Learning Approaches for Robot Manipulation
Robot learning of manipulation skills is hindered by the scarcity of diverse, unbiased datasets. While curated datasets can help, challenges remain in generalizability and real-world transfer. Meanwhile, large-scale "in-the-wild" video datasets have driven progress in computer vision through self-supervised techniques. Translating this to robotics, recent works have explored learning manipulation skills by passively watching abundant videos sourced online. Showing promising results, such video-based learning paradigms provide scalable supervision while reducing dataset bias. This survey reviews foundations such as video feature representation learning techniques, object affordance understanding, 3D hand/body modeling, and large-scale robot resources, as well as emerging techniques for acquiring robot manipulation skills from uncontrolled video demonstrations. We discuss how learning only from observing large-scale human videos can enhance generalization and sample efficiency for robotic manipulation. The survey summarizes video-based learning approaches, analyses their benefits over standard datasets, survey metrics, and benchmarks, and discusses open challenges and future directions in this nascent domain at the intersection of computer vision, natural language processing, and robot learning.
comment: Published at IEEE Access
Multiagent Systems
Executable Knowledge Graphs for Replicating AI Research
Replicating AI research is a crucial yet challenging task for large language model (LLM) agents. Existing approaches often struggle to generate executable code, primarily due to insufficient background knowledge and the limitations of retrieval-augmented generation (RAG) methods, which fail to capture latent technical details hidden in referenced papers. Furthermore, previous approaches tend to overlook valuable implementation-level code signals and lack structured knowledge representations that support multi-granular retrieval and reuse. To overcome these challenges, we propose Executable Knowledge Graphs (xKG), a modular and pluggable knowledge base that automatically integrates technical insights, code snippets, and domain-specific knowledge extracted from scientific literature. When integrated into three agent frameworks with two different LLMs, xKG shows substantial performance gains (10.9% with o3-mini) on PaperBench, demonstrating its effectiveness as a general and extensible solution for automated AI research replication. Code will released at https://github.com/zjunlp/xKG.
comment: Work in progress
A Principle of Targeted Intervention for Multi-Agent Reinforcement Learning NeurIPS 2025
Steering cooperative multi-agent reinforcement learning (MARL) towards desired outcomes is challenging, particularly when the global guidance from a human on the whole multi-agent system is impractical in a large-scale MARL. On the other hand, designing mechanisms to coordinate agents most relies on empirical studies, lacking a easy-to-use research tool. In this work, we employ multi-agent influence diagrams (MAIDs) as a graphical framework to address the above issues. First, we introduce interaction paradigms that leverage MAIDs to analyze and visualize existing approaches in MARL. Then, we design a new interaction paradigm based on MAIDs, referred to as targeted intervention that is applied to only a single targeted agent, so the problem of global guidance can be mitigated. In our implementation, we introduce a causal inference technique-referred to as Pre-Strategy Intervention (PSI)-to realize the targeted intervention paradigm. Since MAIDs can be regarded as a special class of causal diagrams, a composite desired outcome that integrates the primary task goal and an additional desired outcome can be achieved by maximizing the corresponding causal effect through the PSI. Moreover, the bundled relevance graph analysis of MAIDs provides a tool to identify whether an MARL learning paradigm is workable under the design of an interaction paradigm. In experiments, we demonstrate the effectiveness of our proposed targeted intervention, and verify the result of relevance graph analysis.
comment: Accepted to NeurIPS 2025
Intent-Driven LLM Ensemble Planning for Flexible Multi-Robot Disassembly: Demonstration on EV Batteries
This paper addresses the problem of planning complex manipulation tasks, in which multiple robots with different end-effectors and capabilities, informed by computer vision, must plan and execute concatenated sequences of actions on a variety of objects that can appear in arbitrary positions and configurations in unstructured scenes. We propose an intent-driven planning pipeline which can robustly construct such action sequences with varying degrees of supervisory input from a human using simple language instructions. The pipeline integrates: (i) perception-to-text scene encoding, (ii) an ensemble of large language models (LLMs) that generate candidate removal sequences based on the operator's intent, (iii) an LLM-based verifier that enforces formatting and precedence constraints, and (iv) a deterministic consistency filter that rejects hallucinated objects. The pipeline is evaluated on an example task in which two robot arms work collaboratively to dismantle an Electric Vehicle battery for recycling applications. A variety of components must be grasped and removed in specific sequences, determined by human instructions and/or by task-order feasibility decisions made by the autonomous system. On 200 real scenes with 600 operator prompts across five component classes, we used metrics of full-sequence correctness and next-task correctness to evaluate and compare five LLM-based planners (including ablation analyses of pipeline components). We also evaluated the LLM-based human interface in terms of time to execution and NASA TLX with human participant experiments. Results indicate that our ensemble-with-verification approach reliably maps operator intent to safe, executable multi-robot plans while maintaining low user effort.
comment: This work is funded by the project called "Research and Development of a Highly Automated and Safe Streamlined Process for Increasing Lithium-ion Battery Repurposing and Recycling" (REBELION) under Grant 101104241, and partially supported by the Ministry of National Education, Republic of Turkey. Submitted to Frontiers for Review
Strategyproof Facility Location for Five Agents on a Circle using PCD
We consider the strategyproof facility location problem on a circle. We focus on the case of 5 agents, and find a tight bound for the PCD strategyproof mechanism, which selects the reported location of an agent in proportion to the length of the arc in front of it. We methodically "reduce" the size of the instance space and then use standard optimization techniques to find and prove the bound is tight. Moreover we hypothesize the approximation ratio of PCD for general odd $n$.
Diverse Planning with Simulators via Linear Temporal Logic
Autonomous agents rely on automated planning algorithms to achieve their objectives. Simulation-based planning offers a significant advantage over declarative models in modelling complex environments. However, relying solely on a planner that produces a single plan may not be practical, as the generated plans may not always satisfy the agent's preferences. To address this limitation, we introduce $\texttt{FBI}_\texttt{LTL}$, a diverse planner explicitly designed for simulation-based planning problems. $\texttt{FBI}_\texttt{LTL}$ utilises Linear Temporal Logic (LTL) to define semantic diversity criteria, enabling agents to specify what constitutes meaningfully different plans. By integrating these LTL-based diversity models directly into the search process, $\texttt{FBI}_\texttt{LTL}$ ensures the generation of semantically diverse plans, addressing a critical limitation of existing diverse planning approaches that may produce syntactically different but semantically identical solutions. Extensive evaluations on various benchmarks consistently demonstrate that $\texttt{FBI}_\texttt{LTL}$ generates more diverse plans compared to a baseline approach. This work establishes the feasibility of semantically-guided diverse planning in simulation-based environments, paving the way for innovative approaches in realistic, non-symbolic domains where traditional model-based approaches fail.
BenCao: An Instruction-Tuned Large Language Model for Traditional Chinese Medicine
Traditional Chinese Medicine (TCM), with a history spanning over two millennia, plays a role in global healthcare. However, applying large language models (LLMs) to TCM remains challenging due to its reliance on holistic reasoning, implicit logic, and multimodal diagnostic cues. Existing TCM-domain LLMs have made progress in text-based understanding but lack multimodal integration, interpretability, and clinical applicability. To address these limitations, we developed BenCao, a ChatGPT-based multimodal assistant for TCM, integrating structured knowledge bases, diagnostic data, and expert feedback refinement. BenCao was trained through natural language instruction tuning rather than parameter retraining, aligning with expert-level reasoning and ethical norms specific to TCM. The system incorporates a comprehensive knowledge base of over 1,000 classical and modern texts, a scenario-based instruction framework for diverse interactions, a chain-of-thought simulation mechanism for interpretable reasoning, and a feedback refinement process involving licensed TCM practitioners. BenCao connects to external APIs for tongue-image classification and multimodal database retrieval, enabling dynamic access to diagnostic resources. In evaluations across single-choice question benchmarks and multimodal classification tasks, BenCao achieved superior accuracy to general-domain and TCM-domain models, particularly in diagnostics, herb recognition, and constitution classification. The model was deployed as an interactive application on the OpenAI GPTs Store, accessed by nearly 1,000 users globally as of October 2025. This study demonstrates the feasibility of developing a TCM-domain LLM through natural language-based instruction tuning and multimodal integration, offering a practical framework for aligning generative AI with traditional medical reasoning and a scalable pathway for real-world deployment.
MiCRO for Multilateral Negotiations
Recently, a very simple new bilateral negotiation strategy called MiCRO was introduced that does not make use of any kind of opponent modeling or machine learning techniques and that does not require fine-tuning of any parameters. Despite its simplicity, it was shown that MiCRO performs similar to -- or even better than -- most state-of-the-art negotiation strategies. This lead its authors to argue that the benchmark domains on which negotiation algorithms are typically tested may be too simplistic. However, one question that was left open, was how MiCRO could be generalized to multilateral negotiations. In this paper we fill this gap by introducing a multilateral variant of MiCRO. We compare it with the winners of the Automated Negotiating Agents Competitions (ANAC) of 2015, 2017 and 2018 and show that it outperforms them. Furthermore, we perform an empirical game-theoretical analysis to show that our new version of MiCRO forms an empirical Nash equilibrium.
comment: Extended version of short-paper presented at PRIMA2025
Graph Attention-Guided Search for Dense Multi-Agent Pathfinding
Finding near-optimal solutions for dense multi-agent pathfinding (MAPF) problems in real-time remains challenging even for state-of-the-art planners. To this end, we develop a hybrid framework that integrates a learned heuristic derived from MAGAT, a neural MAPF policy with a graph attention scheme, into a leading search-based algorithm, LaCAM. While prior work has explored learning-guided search in MAPF, such methods have historically underperformed. In contrast, our approach, termed LaGAT, outperforms both purely search-based and purely learning-based methods in dense scenarios. This is achieved through an enhanced MAGAT architecture, a pre-train-then-fine-tune strategy on maps of interest, and a deadlock detection scheme to account for imperfect neural guidance. Our results demonstrate that, when carefully designed, hybrid search offers a powerful solution for tightly coupled, challenging multi-agent coordination problems.
ATL*AS: An Automata-Theoretic Approach and Tool for the Verification of Strategic Abilities in Multi-Agent Systems
We present two novel symbolic algorithms for model checking the Alternating-time Temporal Logic ATL*, over both the infinite-trace and the finite-trace semantics. In particular, for infinite traces we design a novel symbolic reduction to parity games. We implement both methods in the ATL*AS model checker and evaluate it using synthetic benchmarks as well as a cybersecurity scenario. Our results demonstrate that the symbolic approach significantly outperforms the explicit-state representation and we find that our parity-game-based algorithm offers a more scalable and efficient solution for infinite-trace verification, outperforming previously available tools. Our results also confirm that finite-trace model checking yields substantial performance benefits over infinite-trace verification. As such, we provide a comprehensive toolset for verifying multiagent systems against specifications in ATL*.
Verification-Aware Planning for Multi-Agent Systems
Large language model (LLM) agents are increasingly deployed to tackle complex tasks, often necessitating collaboration among multiple specialized agents. However, multi-agent collaboration introduces new challenges in planning, coordination, and verification. Execution failures frequently arise not from flawed reasoning alone, but from subtle misalignments in task interpretation, output format, or inter-agent handoffs. To address these challenges, we present VeriMAP, a framework for multi-agent collaboration with verification-aware planning. The VeriMAP planner decomposes tasks, models subtask dependencies, and encodes planner-defined passing criteria as subtask verification functions (VFs) in Python and natural language. We evaluate VeriMAP on diverse datasets, demonstrating that it outperforms both single- and multi-agent baselines while enhancing system robustness and interpretability. Our analysis highlights how verification-aware planning enables reliable coordination and iterative refinement in multi-agent systems, without relying on external labels or annotations.
comment: Submission for ARR Oct
R2BC: Multi-Agent Imitation Learning from Single-Agent Demonstrations
Imitation Learning (IL) is a natural way for humans to teach robots, particularly when high-quality demonstrations are easy to obtain. While IL has been widely applied to single-robot settings, relatively few studies have addressed the extension of these methods to multi-agent systems, especially in settings where a single human must provide demonstrations to a team of collaborating robots. In this paper, we introduce and study Round-Robin Behavior Cloning (R2BC), a method that enables a single human operator to effectively train multi-robot systems through sequential, single-agent demonstrations. Our approach allows the human to teleoperate one agent at a time and incrementally teach multi-agent behavior to the entire system, without requiring demonstrations in the joint multi-agent action space. We show that R2BC methods match, and in some cases surpass, the performance of an oracle behavior cloning approach trained on privileged synchronized demonstrations across four multi-agent simulated tasks. Finally, we deploy R2BC on two physical robot tasks trained using real human demonstrations.
comment: 9 pages, 6 figures
On Condorcet's Jury Theorem with Abstention
The well-known Condorcet Jury Theorem states that, under majority rule, the better of two alternatives is chosen with probability approaching one as the population grows. We study an asymmetric setting where voters face varying participation costs and share a possibly heuristic belief about their pivotality (ability to influence the outcome). In a costly voting setup where voters abstain if their participation cost is greater than their pivotality estimate, we identify a single property of the heuristic belief -- weakly vanishing pivotality -- that gives rise to multiple stable equilibria in which elections are nearly tied. In contrast, strongly vanishing pivotality (as in the standard Calculus of Voting model) yields a unique, trivial equilibrium where only zero-cost voters participate as the population grows. We then characterize when nontrivial equilibria satisfy a version of the Jury Theorem: below a sharp threshold, the majority-preferred candidate wins with probability approaching one; above it, both candidates either win with equal probability.
OPTAGENT: Optimizing Multi-Agent LLM Interactions Through Verbal Reinforcement Learning for Enhanced Reasoning
Large Language Models (LLMs) have shown remarkable reasoning capabilities in mathematical and scientific tasks. To enhance complex reasoning, multi-agent systems have been proposed to harness the collective intelligence of LLM agents. However, existing collaboration structures are either predefined or rely on majority voting or round-table debates, which can suppress correct but less dominant agent contributions. Recent approaches model multi-agent systems as graph networks but optimize purely for agent performance, neglecting the quality of interactions. We hypothesize that effective agent communication is crucial for multi-agent reasoning and that debating quality plays a significant role. To address this, we propose $\ours$, a multi-agent verbal reinforcement learning algorithm that dynamically constructs and refines multi-agent collaboration structures. Our method defines action spaces and a feedback mechanism that evaluates communication robustness and coherence throughout the debate. The final decision is achieved through a majority vote over all the agents. We assess $\ours$ on various reasoning tasks, including mathematical reasoning, creative writing, scientific reasoning, and numerical sorting. Results demonstrate that our approach significantly outperforms single-agent prompting methods and state-of-the-art multi-agent frameworks on diverse tasks.
comment: 8 pages for main content
PLAGUE: Plug-and-play framework for Lifelong Adaptive Generation of Multi-turn Exploits
Large Language Models (LLMs) are improving at an exceptional rate. With the advent of agentic workflows, multi-turn dialogue has become the de facto mode of interaction with LLMs for completing long and complex tasks. While LLM capabilities continue to improve, they remain increasingly susceptible to jailbreaking, especially in multi-turn scenarios where harmful intent can be subtly injected across the conversation to produce nefarious outcomes. While single-turn attacks have been extensively explored, adaptability, efficiency and effectiveness continue to remain key challenges for their multi-turn counterparts. To address these gaps, we present PLAGUE, a novel plug-and-play framework for designing multi-turn attacks inspired by lifelong-learning agents. PLAGUE dissects the lifetime of a multi-turn attack into three carefully designed phases (Primer, Planner and Finisher) that enable a systematic and information-rich exploration of the multi-turn attack family. Evaluations show that red-teaming agents designed using PLAGUE achieve state-of-the-art jailbreaking results, improving attack success rates (ASR) by more than 30% across leading models in a lesser or comparable query budget. Particularly, PLAGUE enables an ASR (based on StrongReject) of 81.4% on OpenAI's o3 and 67.3% on Claude's Opus 4.1, two models that are considered highly resistant to jailbreaks in safety literature. Our work offers tools and insights to understand the importance of plan initialization, context optimization and lifelong learning in crafting multi-turn attacks for a comprehensive model vulnerability evaluation.
Evolution of AI Agent Registry Solutions: Centralized, Enterprise, and Distributed Approaches
Autonomous AI agents now operate across cloud, enterprise, and decentralized domains, creating demand for registry infrastructures that enable trustworthy discovery, capability negotiation, and identity assurance. We analyze five prominent approaches: (1) MCP Registry (centralized publication of mcp.json descriptors), (2) A2A Agent Cards (decentralized self-describing JSON capability manifests), (3) AGNTCY Agent Directory Service (IPFS Kademlia DHT content routing extended for semantic taxonomy-based content discovery, OCI artifact storage, and Sigstore-backed integrity), (4) Microsoft Entra Agent ID (enterprise SaaS directory with policy and zero-trust integration), and (5) NANDA Index AgentFacts (cryptographically verifiable, privacy-preserving fact model with credentialed assertions). Using four evaluation dimensions: security, authentication, scalability, and maintainability, we surface architectural trade-offs between centralized control, enterprise governance, and distributed resilience. We conclude with design recommendations for an emerging Internet of AI Agents requiring verifiable identity, adaptive discovery flows, and interoperable capability semantics.
Asynchronous Agents with Perfect Recall: Model Reductions, Knowledge-Based Construction, and Model Checking for Coalitional Strategies
Model checking of strategic abilities for agents with memory is a notoriously hard problem, and very few attempts have been made to tackle it. In this paper, we present two important steps towards this goal. First, we take the partial-order reduction scheme that was recently proved to preserve individual and coalitional abilities of memoryless agents, and show that it also works for agents with memory. Secondly, we take the Knowledge-Based Subset Construction, that was recently studied for synchronous concurrent games, and adapt it to preserve abilities of memoryful agents in asynchronous MAS. On the way, we also propose a new execution semantics for strategies in asynchronous MAS, that combines elements of Concurrent Game Structures and Interleaved Interpreted Systems in a natural and intuitive way.
First Field-Trial Demonstration of L4 Autonomous Optical Network for Distributed AI Training Communication: An LLM-Powered Multi-AI-Agent Solution
We demonstrate the first cross-domain cross-layer level-4 autonomous optical network via a multi-AI-agent system. Field trials show ~98% task completion rate across the distributed AI training lifecycle-3.2x higher than single agents using state-of-the-art LLMs.
comment: Accepted by 51st European Conference on Optical Communication (ECOC 2025), paper W.02.01.177
COMPASS: Cooperative Multi-Agent Persistent Monitoring using Spatio-Temporal Attention Network
Persistent monitoring of dynamic targets is essential in real-world applications such as disaster response, environmental sensing, and wildlife conservation, where mobile agents must continuously gather information under uncertainty. We propose COMPASS, a multi-agent reinforcement learning (MARL) framework that enables decentralized agents to persistently monitor multiple moving targets efficiently. We model the environment as a graph, where nodes represent spatial locations and edges capture topological proximity, allowing agents to reason over structured layouts and revisit informative regions as needed. Each agent independently selects actions based on a shared spatio-temporal attention network that we design to integrate historical observations and spatial context. We model target dynamics using Gaussian Processes (GPs), which support principled belief updates and enable uncertainty-aware planning. We train COMPASS using centralized value estimation and decentralized policy execution under an adaptive reward setting. Our extensive experiments demonstrate that COMPASS consistently outperforms strong baselines in uncertainty reduction, target coverage, and coordination efficiency across dynamic multi-target scenarios.
comment: Accepted at IEEE MRS 2025
Value-Based Large Language Model Agent Simulation for Mutual Evaluation of Trust and Interpersonal Closeness
Large language models (LLMs) have emerged as powerful tools for simulating complex social phenomena using human-like agents with specific traits. In human societies, value similarity is important for building trust and close relationships; however, it remains unexplored whether this principle holds true in artificial societies comprising LLM agents. Therefore, this study investigates the influence of value similarity on relationship-building among LLM agents through two experiments. First, in a preliminary experiment, we evaluated the controllability of values in LLMs to identify the most effective model and prompt design for controlling the values. Subsequently, in the main experiment, we generated pairs of LLM agents imbued with specific values and analyzed their mutual evaluations of trust and interpersonal closeness following a dialogue. The experiments were conducted in English and Japanese to investigate language dependence. The results confirmed that pairs of agents with higher value similarity exhibited greater mutual trust and interpersonal closeness. Our findings demonstrate that the LLM agent simulation serves as a valid testbed for social science theories, contributes to elucidating the mechanisms by which values influence relationship building, and provides a foundation for inspiring new theories and insights into the social sciences.
Safe Voting: Resilience to Abstention and Sybils
Voting rules may implement the will of the society when all eligible voters vote, and only them. However, they may fail to do so when sybil (fake or duplicate) votes are present and when only some honest (non sybil) voters actively participate. As, unfortunately, sometimes this is the case, our aim here is to address social choice in the presence of sybils and voter abstention. % To do so, we build upon the framework of Reality-aware Social Choice: we assume the status quo as an ever-present distinguished alternative, and study \emph{status quo Enforcing (QUE) voting rules}, which add virtual votes in support of the status quo. We characterize the tradeoff between \emph{safety} and \emph{liveness} (the ability of active honest voters to maintain/change the status quo, respectively) in several domains, and show that the voting rules are often optimal. \revision{Our characterization identifies the exact conditions under which mechanisms remain both resilient to sybils and responsive to verified participation, offering a quantitative tool for designers to measure the benefit of increased participation and verified identities.
Systems and Control (CS)
Admittance Matrix Concentration Inequalities for Understanding Uncertain Power Networks
This paper presents probabilistic bounds for the spectrum of the admittance matrix and classical linear power flow models under uncertain network parameters; for example, probabilistic line contingencies. Our proposed approach imports tools from probability theory, such as concentration inequalities for random matrices with independent entries. It yields error bounds for common approximations of the AC power flow equations under parameter uncertainty, including the DC and LinDistFlow approximations.
comment: 9 pages, 1 figure
Data-driven Communication and Control Design for Distributed Frequency Regulation with Black-box Inverters SC
The increasing penetration of inverter-based resources into the power grid, with often only black-box models available, challenges long-standing frequency control methods. Most recent works take a decentralized approach without online device coordination via communication. This paper considers both dynamic behavior and communication within secondary frequency control on an intermediate timescale. We develop a distributed data-driven approach that utilizes peer-to-peer communication between inverters to avoid the need for a central control center. To enable a trade off between communication network requirements and control performance, we present a framework to guide communication topology design for secondary frequency regulation. Following design of the inter-agent information exchange scheme, we design a controller that is structured according to the communication topology with a closed-loop stability guarantee. Case studies on the IEEE 39-bus system validate the framework and illustrate the trade-off between communication requirements and control performance that is enabled by our approach.
comment: Preprint submitted to PSCC 2026
Trajectory Optimization for Minimum Threat Exposure using Physics-Informed Neural Networks
We apply a physics-informed neural network (PINN) to solve the two-point boundary value problem (BVP) arising from the necessary conditions postulated by Pontryagin's Minimum Principle for optimal control. Such BVPs are known to be numerically difficult to solve by traditional shooting methods due to extremely high sensitivity to initial guesses. In the light of recent successes in applying PINNs for solving high-dimensional differential equations, we develop a PINN to solve the problem of finding trajectories with minimum exposure to a spatiotemporal threat for a vehicle kinematic model. First, we implement PINNs that are trained to solve the BVP for a given pair of initial and final states for a given threat field. Next, we implement a PINN conditioned on the initial state for a given threat field, which eliminates the need for retraining for each initial state. We demonstrate that the PINN outputs satisfy the necessary conditions with low numerical error.
comment: 2025 Indian Control Conference
Artificial magnetic conductor backed dual-mode sectoral cylindrical DRA for off-body biomedical telemetry
This research investigates the potential of a sectoral Cylindrical Dielectric Resonator Antenna (CDRA) for biomedical telemetry. CDRAs are known for their low loss, ruggedness, and stability, but their limited bandwidth and size make them unsuitable for wearable devices. The research addresses these limitations by proposing a dual mode antenna that operates in EH110 and TE210 modes. The sectoral CDRA is a quarter segment with Perfect Electric Conductor boundaries, reducing its size by a factor of four. Mathematical derivations of the field components for both modes are derived to support the design. To minimize specific absorption rate (SAR), an Artificial Magnetic Conductor (AMC) surface is applied to the antennas backside, enhancing compatibility with the transverse electric modes. The antenna achieves a bandwidth of 0.7 GHz (5.2-5.9 GHz), suitable for biomedical applications, with a measured peak gain of 7.9 dBi and a SAR of 1.24 W/kg when applied to a human arm.
comment: 13 pages
An Empirical Study of Lagrangian Methods in Safe Reinforcement Learning
In safety-critical domains such as robotics, navigation and power systems, constrained optimization problems arise where maximizing performance must be carefully balanced with associated constraints. Safe reinforcement learning provides a framework to address these challenges, with Lagrangian methods being a popular choice. However, the effectiveness of Lagrangian methods crucially depends on the choice of the Lagrange multiplier $\lambda$, which governs the trade-off between return and constraint cost. A common approach is to update the multiplier automatically during training. Although this is standard in practice, there remains limited empirical evidence on the robustness of an automated update and its influence on overall performance. Therefore, we analyze (i) optimality and (ii) stability of Lagrange multipliers in safe reinforcement learning across a range of tasks. We provide $\lambda$-profiles that give a complete visualization of the trade-off between return and constraint cost of the optimization problem. These profiles show the highly sensitive nature of $\lambda$ and moreover confirm the lack of general intuition for choosing the optimal value $\lambda^*$. Our findings additionally show that automated multiplier updates are able to recover and sometimes even exceed the optimal performance found at $\lambda^*$ due to the vast difference in their learning trajectories. Furthermore, we show that automated multiplier updates exhibit oscillatory behavior during training, which can be mitigated through PID-controlled updates. However, this method requires careful tuning to achieve consistently better performance across tasks. This highlights the need for further research on stabilizing Lagrangian methods in safe reinforcement learning. The code used to reproduce our results can be found at https://github.com/lindsayspoor/Lagrangian_SafeRL.
A condensing approach for linear-quadratic optimization with geometric constraints
Optimization problems with convex quadratic cost and polyhedral constraints are ubiquitous in signal processing, automatic control and decision-making. We consider here an enlarged problem class that allows to encode logical conditions and cardinality constraints, among others. In particular, we cover also situations where parts of the constraints are nonconvex and possibly complicated, but it is practical to compute projections onto this nonconvex set. Our approach combines the augmented Lagrangian framework with a solver-agnostic structure-exploiting subproblem reformulation. While convergence guarantees follow from the former, the proposed condensing technique leads to significant improvements in computational performance.
comment: 14 pages, 5 figures
ORIX: Orchestration of RIS with xApps for Smart Wireless Factory Environments
The vision of a smart wireless factory (SWF) demands highly flexible, low-latency, and reliable connectivity that goes beyond conventional wireless solutions. Reconfigurable intelligent surface (RIS)-empowered communications, when integrated with the open radio access network (O-RAN) architectures, have emerged as a promising enabler to meet these challenging requirements. This article introduces the methodology for the orchestration of RIS with xApps (ORIX), bringing the RIS technology into the O-RAN ecosystem through xApp-based control for SWF environments. ORIX features three key components: an O-RAN-compliant RIS service model for dynamic configuration, an RIS channel simulator that supports 3GPP indoor factory models with multiple industrial scenarios, and practical RIS optimization strategies with finite-resolution control. Together, these elements provide a realistic end-to-end emulation platform for evaluating RIS placement, control, and performance in SWF environments prior to deployment. The presented case study demonstrates how ORIX enables the evaluation of achievable performance gains, exploration of trade-offs among key RIS design parameters, and identification of deployment strategies that balance system performance with practical implementation constraints. By bridging theoretical advances with industrial feasibility, ORIX lays the groundwork for RIS-assisted O-RAN networks to power next-generation wireless communication in industrial scenarios.
comment: Submitted in IEEE
Inverse Optimal Control of Muscle Force Sharing During Pathological Gait
Muscle force sharing is typically resolved by minimizing a specific objective function to approximate neural control strategies. An inverse optimal control approach was applied to identify the "best" objective function, among a positive linear combination of basis objective functions, associated with the gait of two post-stroke males, one high-functioning (subject S1) and one low-functioning (subject S2). It was found that the "best" objective function is subject- and leg-specific. No single function works universally well, yet the best options are usually differently weighted combinations of muscle activation- and power-minimization. Subject-specific inverse optimal control models performed best on their respective limbs (\textbf{RMSE 178/213 N, CC 0.71/0.61} for non-paretic and paretic legs of S1; \textbf{RMSE 205/165 N, CC 0.88/0.85} for respective legs of S2), but cross-subject generalization was poor, particularly for paretic legs. Moreover, minimizing the root mean square of muscle power emerged as important for paretic limbs, while minimizing activation-based functions dominated for non-paretic limbs. This may suggest different neural control strategies between affected and unaffected sides, possibly altered by the presence of spasticity. Among the 15 considered objective functions commonly used in inverse dynamics-based computations, the root mean square of muscle power was the only one explicitly incorporating muscle velocity, leading to a possible model for spasticity in the paretic limbs. Although this objective function has been rarely used, it may be relevant for modeling pathological gait, such as post-stroke gait.
Integrating Trustworthy Artificial Intelligence with Energy-Efficient Robotic Arms for Waste Sorting
This paper presents a novel methodology that integrates trustworthy artificial intelligence (AI) with an energy-efficient robotic arm for intelligent waste classification and sorting. By utilizing a convolutional neural network (CNN) enhanced through transfer learning with MobileNetV2, the system accurately classifies waste into six categories: plastic, glass, metal, paper, cardboard, and trash. The model achieved a high training accuracy of 99.8% and a validation accuracy of 80.5%, demonstrating strong learning and generalization. A robotic arm simulator is implemented to perform virtual sorting, calculating the energy cost for each action using Euclidean distance to ensure optimal and efficient movement. The framework incorporates key elements of trustworthy AI, such as transparency, robustness, fairness, and safety, making it a reliable and scalable solution for smart waste management systems in urban settings.
comment: 5 pages, 2 figures
Process Automation Architecture Using RFID for Transparent Voting Systems
This paper presents the development of a process automation architecture leveraging Radio Frequency Identification (RFID) technology for secure, transparent and efficient voting systems. The proposed architecture automates the voting workflow through RFID-enabled voter identification, encrypted vote casting, and secure data transmission. Each eligible voter receives a smart RFID card containing a uniquely encrypted identifier, which is verified using an RC522 reader interfaced with a microcontroller. Upon successful verification, the voter interacts with a touchscreen interface to cast a vote, which is then encrypted using AES-128 and securely stored on a local SD card or transmitted via GSM to a central server. A tamper-proof monitoring mechanism records each session with time-stamped digital signatures, ensuring auditability and data integrity. The architecture is designed to function in both online and offline modes, with an automated batch synchronization mechanism that updates vote records once network connectivity is restored. System testing in simulated environments confirmed 100% voter authentication accuracy, minimized latency (average voting time of 11.5 seconds), and robustness against cloning, double voting, and data interception. The integration of real-time monitoring and secure process control modules enables electoral authorities to automate data logging, detect anomalies, and validate system integrity dynamically. This work demonstrates a scalable, automation-driven solution for voting infrastructure, offering enhanced transparency, resilience, and deployment flexibility, especially in environments where digital transformation of electoral processes is critically needed.
comment: 7 pages, 5 figures, 1 table
Accelerating Adaptive Systems via Normalized Parameter Estimation Laws
In this paper, we propose a new class of parameter estimation laws for adaptive systems, called \emph{normalized parameter estimation laws}. A key feature of these estimation laws is that they accelerate the convergence of the system state, $\mathit{x(t)}$, to the origin. We quantify this improvement by showing that our estimation laws guarantee finite integrability of the $\mathit{r}$-th root of the squared norm of the system state, i.e., \( \mathit{\|x(t)\|}_2^{2/\mathit{r}} \in \mathcal{L}_1, \) where $\mathit{r} \geq 1$ is a pre-specified parameter that, for a broad class of systems, can be chosen arbitrarily large. In contrast, standard Lyapunov-based estimation laws only guarantee integrability of $\mathit{\|x(t)\|}_2^2$ (i.e., $\mathit{r} = 1$). We motivate our method by showing that, for large values of $r$, this guarantee serves as a sparsity-promoting mechanism in the time domain, meaning that it penalizes prolonged signal duration and slow decay, thereby promoting faster convergence of $\mathit{x(t)}$. The proposed estimation laws do not rely on time-varying or high adaptation gains and do not require persistent excitation. Moreover, they can be applied to systems with matched and unmatched uncertainties, regardless of their dynamic structure, as long as a control Lyapunov function (CLF) exists. Finally, they are compatible with any CLF-based certainty equivalence controllers. We further develop higher-order extensions of our estimation laws by incorporating momentum into the estimation dynamics. We illustrate the performance improvements achieved with the proposed scheme through various numerical experiments.
Assessing the Quality of a Set of Basis Functions for Inverse Optimal Control via Projection onto Global Minimizers
Inverse optimization (Inverse optimal control) is the task of imputing a cost function such that given test points (trajectories) are (nearly) optimal with respect to the discovered cost. Prior methods in inverse optimization assume that the true cost is a convex combination of a set of convex basis functions and that this basis is consistent with the test points. However, the consistency assumption is not always justified, as in many applications the principles by which the data is generated are not well understood. This work proposes using the distance between a test point and the set of global optima generated by the convex combinations of the convex basis functions as a measurement for the expressive quality of the basis with respect to the test point. A large minimal distance invalidates the set of basis functions. The concept of a set of global optima is introduced and its properties are explored in unconstrained and constrained settings. Upper and lower bounds for the minimum distance in the convex quadratic setting are implemented by bi-level gradient descent and an enriched linear matrix inequality respectively. Extensions to this framework include max-representable basis functions, nonconvex basis functions (local minima), and applying polynomial optimization techniques.
comment: 8 pages, 4 figures
Comparison and performance analysis of dynamic encrypted control approaches
Encrypted controllers using homomorphic encryption have proven to guarantee the privacy of measurement and control signals, as well as system and controller parameters, while regulating the system as intended. However, encrypting dynamic controllers has remained a challenge due to growing noise and overflow issues in the encoding. In this paper, we review recent approaches to dynamic encrypted control, such as bootstrapping, periodic resets of the controller state, integer reformulations, and FIR controllers, and equip them with a stability and performance analysis to evaluate their suitability. We complement the analysis with a numerical performance comparison on a benchmark system.
A polynomial-based QCQP solver for encrypted optimization
In this paper, we present a novel method for solving a class of quadratically constrained quadratic optimization problems using only additions and multiplications. This approach enables solving constrained optimization problems on private data since the operations involved are compatible with the capabilities of homomorphic encryption schemes. To solve the constrained optimization problem, a sequence of polynomial penalty functions of increasing degree is introduced, which are sufficiently steep at the boundary of the feasible set. Adding the penalty function to the original cost function creates a sequence of unconstrained optimization problems whose minimizer always lies in the admissible set and converges to the minimizer of the constrained problem. A gradient descent method is used to generate a sequence of iterates associated with these problems. For the algorithm, it is shown that the iterate converges to a minimizer of the original problem, and the feasible set is positively invariant under the iteration. Finally, the method is demonstrated on an illustrative cryptographic problem, finding the smaller value of two numbers, and the encrypted implementability is discussed.
comment: Accepted for presentation at the 64th IEEE Conference on Decision and Control (CDC2025)
Enhanced Ground-Satellite Direct Access via Onboard Rydberg Atomic Quantum Receivers
Ground-satellite links for 6G networks face critical challenges, including severe path loss, tight size-weight-power limits, and congested spectrum, all of which significantly hinder the performance of traditional radio frequency (RF) front ends. This article introduces the Rydberg Atomic Quantum Receiver (RAQR) for onboard satellite systems, a millimeter-scale front end that converts radio fields to optical signals through atomic electromagnetically induced transparency. RAQR's high sensitivity and high frequency selectivity address link budget, payload, and interference challenges while fitting within space constraints. A hybrid atomic-electronic design and supporting signal model demonstrate enhanced data rate, coverage, and sensing accuracy relative to conventional RF receivers. The article concludes with integration strategies, distributed-satellite concepts, and open research problems for bringing RAQR-enabled satellite payloads into service.
comment: Submitted to IEEE Journal
Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems
Control-flow hijacking attacks manipulate orchestration mechanisms in multi-agent systems into performing unsafe actions that compromise the system and exfiltrate sensitive information. Recently proposed defenses, such as LlamaFirewall, rely on alignment checks of inter-agent communications to ensure that all agent invocations are "related to" and "likely to further" the original objective. We start by demonstrating control-flow hijacking attacks that evade these defenses even if alignment checks are performed by advanced LLMs. We argue that the safety and functionality objectives of multi-agent systems fundamentally conflict with each other. This conflict is exacerbated by the brittle definitions of "alignment" and the checkers' incomplete visibility into the execution context. We then propose, implement, and evaluate ControlValve, a new defense inspired by the principles of control-flow integrity and least privilege. ControlValve (1) generates permitted control-flow graphs for multi-agent systems, and (2) enforces that all executions comply with these graphs, along with contextual rules (generated in a zero-shot manner) for each agent invocation.
Floating-Base Deep Lagrangian Networks
Grey-box methods for system identification combine deep learning with physics-informed constraints, capturing complex dependencies while improving out-of-distribution generalization. Yet, despite the growing importance of floating-base systems such as humanoids and quadrupeds, current grey-box models ignore their specific physical constraints. For instance, the inertia matrix is not only positive definite but also exhibits branch-induced sparsity and input independence. Moreover, the 6x6 composite spatial inertia of the floating base inherits properties of single-rigid-body inertia matrices. As we show, this includes the triangle inequality on the eigenvalues of the composite rotational inertia. To address the lack of physical consistency in deep learning models of floating-base systems, we introduce a parameterization of inertia matrices that satisfies all these constraints. Inspired by Deep Lagrangian Networks (DeLaN), we train neural networks to predict physically plausible inertia matrices that minimize inverse dynamics error under Lagrangian mechanics. For evaluation, we collected and released a dataset on multiple quadrupeds and humanoids. In these experiments, our Floating-Base Deep Lagrangian Networks (FeLaN) achieve highly competitive performance on both simulated and real robots, while providing greater physical interpretability.
Generalized Group Selection Strategies for Self-sustainable RIS-aided Communication
Reconfigurable intelligent surface (RIS) is a cutting-edge communication technology that has been proposed as aviable option for beyond fifth-generation wireless communication networks. This paper investigates various group selection strategies in the context of grouping-based self-sustainable RIS-aided device-to-device (D2D) communication with spatially correlated wireless channels. Specifically, we consider both power splitting (PS) and time switching (TS) configurations, of the self-sustainable RIS to analyze the system performance and propose appropriate bounds on the choice of system parameters. The analysis takes into account a simplified linear energy harvesting (EH) model as well as a practical non-linear EH model. Based on the application requirements, we propose various group selection strategies at the RIS. Notably, each strategy schedules the k-th best available group at the RIS based on the end-to-end signal-to-noise ratio (SNR) and also the energy harvested at a particular group of the RIS. Accordingly, by using tools from high order statistics, we derive analytical expressions for the outage probability of each selection strategy. Moreover, by applying the tools from extreme value theory, we also investigate an asymptotic scenario, where the number of groups available for selection at an RIS approaches infinity. The nontrivial insights obtained from this approach is especially beneficial in applications like large intelligent surface-aided wireless communication. Finally, the numerical results demonstrate the importance and benefits of the proposed approaches in terms of metrics such as the data throughput and the outage (both data and energy) performance.
comment: This work has been submitted to an IEEE journal for possible publication
A Data-Driven Framework for Online Mitigation of False Data Injection Signals in Networked Control Systems
This paper introduces a novel two-stage framework for online mitigation of False Data Injection (FDI) signals to improve the resiliency of Networked Control Systems (NCSs) and ensure their safe operation in the presence of malicious activities. The first stage involves meta learning to select a base time series forecasting model within a stacked ensemble learning architecture. This is achieved by converting time series data into scalograms using continuous wavelet transform, which are then split into image frames to generate a scalo-temporal representation of the data and to distinguish between different complexity levels of time series data based on an entropy metric using a convolutional neural network. In the second stage, the selected model mitigates false data injection signals in real-time. The proposed framework's effectiveness is demonstrated through rigorous simulations involving the formation control of differential drive mobile robots. By addressing the security challenges in NCSs, this framework offers a promising approach to maintaining system integrity and ensuring operational safety.
comment: 17 pages, 9 figures
Semantic Intelligence: A Bio-Inspired Cognitive Framework for Embodied Agents
Recent advancements in Large Language Models (LLMs) have greatly enhanced natural language understanding and content generation. However, these models primarily operate in disembodied digital environments and lack interaction with the physical world. To address this limitation, Embodied Artificial Intelligence (EAI) has emerged, focusing on agents that can perceive and interact with their surroundings. Despite progress, current embodied agents face challenges in unstructured real-world environments due to insufficient semantic intelligence, which is critical for understanding and reasoning about complex tasks. This paper introduces the Semantic Intelligence-Driven Embodied (SIDE) agent framework, which integrates a hierarchical semantic cognition architecture with a semantic-driven decision-making process. This enables agents to reason about and interact with the physical world in a contextually adaptive manner. The framework is inspired by biological cognitive mechanisms and utilizes bio-inspired principles to design a semantic cognitive architecture that mimics how humans and animals integrate and process sensory information. We present this framework as a step toward developing more intelligent and versatile embodied agents.
Quantum Key Distribution for Virtual Power Plant Communication: A Lightweight Key-Aware Scheduler with Provable Stability
Virtual power plants (VPPs) are becoming a cornerstone of future grids, aggregating distributed PV, wind, storage, and flexible loads for market participation and real-time balancing. As operations move to minute-- and second--level feedback, communication security shifts from a compliance item to an operational constraint: latency, reliability, and confidentiality jointly determine whether dispatch, protection, and settlement signals arrive on time. Conventional PKI and key-rotation schemes struggle with cross-domain, high-frequency messaging and face long-term quantum threats. Quantum key distribution (QKD) offers information-theoretic key freshness, but its key yield is scarce and stochastic, often misaligned with bursty VPP traffic. This paper proposes a key-aware priority and quota framework that treats quantum keys as first-class scheduling resources. The design combines (i) forecast-driven long-term quotas and short-term tokens, (ii) key-aware deficit-round-robin arbitration, (iii) a preemptive emergency key reserve, and (iv) graceful degradation via encryption-mode switching and controlled down-sampling for non-critical traffic. A drift-plus-penalty analysis establishes strong stability under average supply--demand balance with quantifiable bounds on backlog and tail latency, providing interpretable operating guarantees. We build a reproducible testbed on IEEE 33- and 123-bus VPP systems and evaluate normal, degraded, and outage regimes with industry-consistent message classes and TTLs. Against FIFO, fixed-priority, and static-quota baselines, the proposed scheme consistently reduces tail delay and passive timeouts for critical messages, improves per-bit key utility, and enhances power-tracking reliability during key scarcity and regime switches.
Differentiating Through Power Flow Solutions for Admittance and Topology Control
The power flow equations relate bus voltage phasors to power injections via the network admittance matrix. These equations are central to the key operational and protection functions of power systems (e.g., optimal power flow scheduling and control, state estimation, protection, and fault location, among others). As control, optimization, and estimation of network admittance parameters are central to multiple avenues of research in electric power systems, we propose a linearization of power flow solutions obtained by implicitly differentiating them with respect to the network admittance parameters. This is achieved by utilizing the implicit function theorem, in which we show that such a differentiation is guaranteed to exist under mild conditions and is applicable to generic power systems (radial or meshed). The proposed theory is applied to derive sensitivities of complex voltages, line currents, and power flows. The developed theory of linearizing the power flow equations around changes in the complex network admittance parameters has numerous applications. We demonstrate several of these applications, such as predicting the nodal voltages when the network topology changes without solving the power flow equations. We showcase the application for continuous admittance control, which is used to increase the hosting capacity of a given distribution network.
comment: 10 pages, 6 figures
ANGEL: A Novel Gripper for Versatile and Light-touch Fruit Harvesting
Fruit harvesting remains predominantly a labor-intensive process, motivating the development of research for robotic grippers. Conventional rigid or vacuum-driven grippers require complex mechanical design or high energy consumption. Current enveloping-based fruit harvesting grippers lack adaptability to fruits of different sizes. This paper introduces a drawstring-inspired, cable-driven soft gripper for versatile and gentle fruit harvesting. The design employs 3D-printed Thermoplastic Polyurethane (TPU) pockets with integrated steel wires that constrict around the fruit when actuated, distributing pressure uniformly to minimize bruising and allow versatility to fruits of varying sizes. The lightweight structure, which requires few components, reduces mechanical complexity and cost compared to other grippers. Actuation is achieved through servo-driven cable control, while motor feedback provides autonomous grip adjustment with tunable grip strength. Experimental validation shows that, for tomatoes within the gripper's effective size range, harvesting was achieved with a 0% immediate damage rate and a bruising rate of less than 9% after five days, reinforcing the gripper's suitability for fruit harvesting.
Provably Optimal Reinforcement Learning under Safety Filtering
Recent advances in reinforcement learning (RL) enable its use on increasingly complex tasks, but the lack of formal safety guarantees still limits its application in safety-critical settings. A common practical approach is to augment the RL policy with a safety filter that overrides unsafe actions to prevent failures during both training and deployment. However, safety filtering is often perceived as sacrificing performance and hindering the learning process. We show that this perceived safety-performance tradeoff is not inherent and prove, for the first time, that enforcing safety with a sufficiently permissive safety filter does not degrade asymptotic performance. We formalize RL safety with a safety-critical Markov decision process (SC-MDP), which requires categorical, rather than high-probability, avoidance of catastrophic failure states. Additionally, we define an associated filtered MDP in which all actions result in safe effects, thanks to a safety filter that is considered to be a part of the environment. Our main theorem establishes that (i) learning in the filtered MDP is safe categorically, (ii) standard RL convergence carries over to the filtered MDP, and (iii) any policy that is optimal in the filtered MDP-when executed through the same filter-achieves the same asymptotic return as the best safe policy in the SC-MDP, yielding a complete separation between safety enforcement and performance optimization. We validate the theory on Safety Gymnasium with representative tasks and constraints, observing zero violations during training and final performance matching or exceeding unfiltered baselines. Together, these results shed light on a long-standing question in safety-filtered learning and provide a simple, principled recipe for safe RL: train and deploy RL policies with the most permissive safety filter that is available.
comment: 17 pages, 3 figures
Prompt-to-Primal Teaching
This paper introduces Prompt-to-Primal (P2P) Teaching, an AI-integrated instructional approach that links prompt-driven exploration with first-principles reasoning, guided and moderated by the instructor within the classroom setting. In P2P teaching, student-generated AI prompts serve as entry points for inquiry and initial discussions in class, while the instructor guides learners to validate, challenge, and reconstruct AI responses through fundamental physical and mathematical laws. The approach encourages self-reflective development, critical evaluation of AI outputs, and conceptual foundational knowledge of the core engineering principles. A large language model (LLM) can be a highly effective tool for those who already possess foundational knowledge of a subject; however, it may also mislead students who lack sufficient background in the subject matter. Results from two student cohorts across different semesters suggest the pedagogical effectiveness of the P2P teaching framework in enhancing both AI literacy and engineering reasoning.
comment: 9 pages, 5 figures
An Exact Quantile-Energy Equality for Terminal Halfspaces in Linear-Gaussian Control with a Discrete-Time Companion, KL/Schrodinger Links, and High-Precision Validation
We prove an exact equality between the minimal quadratic control energy and the squared normal-quantile gap for terminal halfspaces in linear-Gaussian systems with additive control and quadratic effort $E(u) = \tfrac12\!\int u^\top M u\,dt$ where $M = B^\top\Sigma^{-1}B$. For terminal halfspace events, the minimal energy equals the squared normal-quantile gap divided by twice a controllability-to-noise ratio $R_T^2(w)=(w^\top W_c^M w)/(w^\top V_T w)$ and is attained by a matched-filter control. We provide an exact zero-order-hold discrete-time companion via block exponentials, relate the result to minimum-energy control, Gaussian isoperimetry, risk-sensitive/KL control, and Schrodinger bridges, and validate to high precision with Monte Carlo. We state assumptions, singular-$M$ handling, and edge cases. The statement is a compact synthesis and design-ready translator, not a universal principle. Novelty: while the ingredients (Gramians, Cauchy-Schwarz, Gaussian isoperimetry) are classical, to our knowledge the explicit quantile-energy equality with a constructive matched-filter achiever for terminal halfspaces, and its discrete-time companion, are not recorded together in the cited literature.
Model-Free Dynamic Consensus in Multi-Agent Systems: A Q-Function Perspective
This paper presents a new method for achieving dynamic consensus in linear discrete-time homogeneous multi-agent systems (MAS) with marginally stable or unstable dynamics. The guarantee of consensus in this setting involves a set of constraints based on the graph's spectral properties, complicating the design of the coupling gains. This challenge intensifies for large-scale systems with diverse graph Laplacian spectra. The proposed approach reformulates the dynamic consensus problem with a prescribed convergence rate using a state-action value function framework inspired by optimal control theory. Specifically, a synthetic linear quadratic regulation (LQR) formulation is introduced to encode the consensus objective, enabling its translation into a convex semidefinite programming (SDP) problem. The resulting SDP is applicable in both model-based and model-free settings for jointly designing the local feedback and coupling gains. To handle the inherent non-convex feasibility conditions, a convex-concave decomposition strategy is employed. Adaptation of the method in a completely model-free set-up eliminates the need for system identification or knowledge of the agents' dynamics. Instead, it relies on input-state data collection and offers an entirely data-driven equivalent SDP formulation. Finally, a new algorithm balancing feasibility, convergence rate, robustness, and energy efficiency, is established to provide design flexibility. Numerical simulations validate the method's effectiveness in various scenarios.
Direct data-driven interpolation and approximation of linear parameter-varying system trajectories
We consider the problem of estimating missing values in trajectories of linear parameter-varying (LPV) systems. We solve this interpolation problem for the class of shifted-affine LPV systems. Conditions for the existence and uniqueness of solutions are given and a direct data-driven algorithm for its computation is presented, i.e., the data-generating system is not given by a parametric model but is implicitly specified by data. We illustrate the applicability of the proposed solution on illustrative examples of a mass-spring-damper system with exogenous and endogenous parameter variation.
comment: 10 pages, 5 figures, submitted for review
Sparse Identification of Nonlinear Dynamics Enhanced by Ensemble Learning, Multi-Step Prediction Evaluation, Elite Strategy, and Classification Techniques for Applications to Industrial Systems
This paper proposes a sparse identification of nonlinear dynamics (SINDy) with control and exogenous inputs for highly accurate and reliable prediction. Although SINDy is recognized as a remarkable approach for identifying nonlinear systems, several challenges remain. Its application to industrial systems remains limited, and multi-step predictions are not guaranteed due to overfitting and noisy data. This phenomenon is often caused by the increase in basis functions resulting from the extension of coordinates, such as time-delay embedding. To address these problems, this study proposes an emphasized SINDy framework by integrating ensemble-learning, multi-step prediction evaluations, elite strategy, and classification techniques (EMEC-SINDy), while preserving convex optimization. The proposed method employs library bagging and extracts elites with an R-squared greater than 90%. Then, clustering is performed on the surviving elites because physically motivated basis functions are not always available, and the elites obtained do not always have similar basis functions. After the classification, discrete model candidates are obtained by taking the mean of each classified elite. Finally, the best model is selected. Simulation results demonstrate that EMEC-SINDy significantly outperforms original SINDy approaches in multi-step prediction accuracy under noisy conditions, validating its applicability to the diesel engine airpath system, which is known as a complex and highly coupled nonlinear multi-input multi-output system.
Observer Design over Hypercomplex Quaternions
We develop observer design over hypercomplex quaternions in a characteristic-polynomial-free framework. Using the standard right-module convention, we derive a right observable companion form and its companion polynomial that encodes error dynamics via right-eigenvalue similarity classes. The design mirrors the real/complex case - coefficient updates in companion coordinates, followed by a similarity back - yet avoids determinants, characteristic/minimal polynomials, and Cayley-Hamilton identities that do not transfer to quaternions. We also give an Ackermann-type construction for the important case of closed-loop companion polynomials with real coefficients, ensuring similarity-equivariant evaluation. The results yield simple recipes for full-order observers directly over quaternions, clarify the role of right spectra and their similarity classes, and pinpoint when classical one-shot formulas remain valid. Numerical examples illustrate the method and advantages over vectorized or complex-adjoint surrogates.
comment: Submitted for presentation at the 24th European Control Conference (ECC 2026), Reykjavik, Iceland. This work was co-funded by the European Union under the project ROBOPROX (reg. no. CZ.02.01.01/00/22 008/0004590)
Optimal control of stochastic reaction networks with entropic control cost and emergence of mode-switching strategies
Controlling the stochastic dynamics of biological populations is a challenge that arises across various biological contexts. However, these dynamics are inherently nonlinear and involve a discrete state space, i.e., the number of molecules, cells, or organisms. Additionally, the possibility of extinction has a significant impact on both dynamics and control strategies, particularly when the population size is small. These factors hamper the direct application of conventional control theories to biological systems. To address these challenges, we formulate the optimal control problem for stochastic population dynamics by utilizing control cost functions based on the f-divergence, which naturally accounts for population-specific factors. If Kullback-Leibler (KL) divergence is adopted for the cost function, the complex nonlinear Hamilton-Jacobi-Bellman equation is simplified into a linear form, facilitating efficient computation of optimal solutions. We demonstrate the effectiveness of our approach by applying it to the control of interacting random walkers, Moran processes, and SIR models, and observe the mode-switching phenomena in the control strategies. Our approach provides new opportunities for applying control theory to a wide range of biological problems.
comment: 12 pages, 4 figures
Informativity Conditions for Multiple Signals: Properties, Experimental Design, and Applications
Recent studies highlight the importance of persistently exciting condition in single signal sequence for model identification and data-driven control methodologies. However, maintaining prolonged excitation in control signals introduces significant challenges, as continuous excitation can reduce the lifetime of mechanical devices. In this paper, we introduce three informativity conditions for various types of multi-signal data, each augmented by weight factors. We explore the interrelations between these conditions and their rank properties in linear time-invariant systems. Furthermore, we introduce open-loop experimental design methods tailored to each of the three conditions, which can synthesize the required excitation conditions either offline or online, even in the presence of limited information within each signal segment. We demonstrate the effectiveness of these informativity conditions in least-squares identification. Additionally, all three conditions can extend Willems' fundamental lemma and are utilized to assess the properties of the system. Illustrative examples confirm that these conditions yield satisfactory outcomes in both least-squares identification and the construction of data-driven controllers.
Accurate Small-Signal Modeling of Digitally Controlled Buck Converters with ADC-PWM Synchronization
Digital control has become increasingly widespread in modern power electronic converters. When acquiring feedback signals such as the inductor current, synchronizing the analog-to-digital converter (ADC) with the digital pulse-width modulator (DPWM) is commonly employed to accurately track their steady-state average. However, the small-signal implications of such synchronization have not been investigated. This paper presents an exact small-signal model for digitally controlled buck converters operating in forced continuous-conduction mode (FCCM) under constant-frequency current-mode control, explicitly accounting for DPWM-ADC synchronization. Using a sampled-data framework, the proposed model captures all sideband effects introduced by the sampling process, yielding precise predictions of both analog and digital loop gains, even at frequencies beyond the switching and sampling frequencies. Both asymmetrical and symmetrical carrier modulations are considered. Furthermore, the digital loop gain is derived in closed form using the modified z-transform, enabling low-complexity compensator design and stability assessment. Within this framework, the analog loop gain can be directly obtained from the digital loop gain, thereby eliminating the need for computationally intensive infinite series evaluations. The validity of the proposed model is confirmed through both simulation and experimental results.
Modeling the Impact of Communication and Human Uncertainties on Runway Capacity in Terminal Airspace
We investigate the potential impact of communication and human performance uncertainties on runway operations. Specifically, we consider these impacts within the context of an arrival scenario with two converging flows: a straight-in approach stream and a downwind stream merging into it. Both arrival stream are modeled using a modified Possion distribution that incorporate the separation minima as well as the runway occupancy time. Various system level uncertainties are addressed in this process, including communication link- and human-related uncertainties. In this research, we first build a Monte Carlo-based discrete-time simulation, where aircraft arrivals are generated by modified Poisson processes subject to minimum separation constraints, simulating various traffic operations. The merging logic incorporates standard bank angle continuous turn-to-final, pilot response delays, and dynamic gap availability in real time. Then, we investigate an automated final approach vectoring model (i.e., Auto-ATC), in which inverse optimal control is used to learn decision advisories from human expert records. By augmenting trajectories and incorporating the aforementioned uncertainties into the planning scenario, we create a setup analogous to the discrete event simulation. For both studies, runway capacity is measured by runway throughput, the fraction of downwind arrivals that merge immediately without holding, and the average delay (i.e., holding time/distance) experienced on the downwind leg. This research provides a method for runway capacity estimation in merging scenarios, and demonstrates that aeronautical communication link uncertainties significantly affect runway capacity in current voice-based operations, whereas the impact can be mitigated in autonomous operational settings.
Identification and Adaptive Control of Markov Jump Systems: Sample Complexity and Regret Bounds
Learning how to effectively control unknown dynamical systems is crucial for intelligent autonomous systems. This task becomes a significant challenge when the underlying dynamics are changing with time. Motivated by this challenge, this paper considers the problem of controlling an unknown Markov jump linear system (MJS) to optimize a quadratic objective. By taking a model-based perspective, we consider identification-based adaptive control of MJSs. We first provide a system identification algorithm for MJS to learn the dynamics in each mode as well as the Markov transition matrix, underlying the evolution of the mode switches, from a single trajectory of the system states, inputs, and modes. Through martingale-based arguments, sample complexity of this algorithm is shown to be $\mathcal{O}(1/\sqrt{T})$. We then propose an adaptive control scheme that performs system identification together with certainty equivalent control to adapt the controllers in an episodic fashion. Combining our sample complexity results with recent perturbation results for certainty equivalent control, we prove that when the episode lengths are appropriately chosen, the proposed adaptive control scheme achieves $\mathcal{O}(\sqrt{T})$ regret, which can be improved to $\mathcal{O}(polylog(T))$ with partial knowledge of the system. Our proof strategy introduces innovations to handle Markovian jumps and a weaker notion of stability common in MJSs. Our analysis provides insights into system theoretic quantities that affect learning accuracy and control performance. Numerical simulations are presented to further reinforce these insights.
comment: Improved results using Martingale-based arguments
Systems and Control (EESS)
Admittance Matrix Concentration Inequalities for Understanding Uncertain Power Networks
This paper presents probabilistic bounds for the spectrum of the admittance matrix and classical linear power flow models under uncertain network parameters; for example, probabilistic line contingencies. Our proposed approach imports tools from probability theory, such as concentration inequalities for random matrices with independent entries. It yields error bounds for common approximations of the AC power flow equations under parameter uncertainty, including the DC and LinDistFlow approximations.
comment: 9 pages, 1 figure
Data-driven Communication and Control Design for Distributed Frequency Regulation with Black-box Inverters SC
The increasing penetration of inverter-based resources into the power grid, with often only black-box models available, challenges long-standing frequency control methods. Most recent works take a decentralized approach without online device coordination via communication. This paper considers both dynamic behavior and communication within secondary frequency control on an intermediate timescale. We develop a distributed data-driven approach that utilizes peer-to-peer communication between inverters to avoid the need for a central control center. To enable a trade off between communication network requirements and control performance, we present a framework to guide communication topology design for secondary frequency regulation. Following design of the inter-agent information exchange scheme, we design a controller that is structured according to the communication topology with a closed-loop stability guarantee. Case studies on the IEEE 39-bus system validate the framework and illustrate the trade-off between communication requirements and control performance that is enabled by our approach.
comment: Preprint submitted to PSCC 2026
Trajectory Optimization for Minimum Threat Exposure using Physics-Informed Neural Networks
We apply a physics-informed neural network (PINN) to solve the two-point boundary value problem (BVP) arising from the necessary conditions postulated by Pontryagin's Minimum Principle for optimal control. Such BVPs are known to be numerically difficult to solve by traditional shooting methods due to extremely high sensitivity to initial guesses. In the light of recent successes in applying PINNs for solving high-dimensional differential equations, we develop a PINN to solve the problem of finding trajectories with minimum exposure to a spatiotemporal threat for a vehicle kinematic model. First, we implement PINNs that are trained to solve the BVP for a given pair of initial and final states for a given threat field. Next, we implement a PINN conditioned on the initial state for a given threat field, which eliminates the need for retraining for each initial state. We demonstrate that the PINN outputs satisfy the necessary conditions with low numerical error.
comment: 2025 Indian Control Conference
Artificial magnetic conductor backed dual-mode sectoral cylindrical DRA for off-body biomedical telemetry
This research investigates the potential of a sectoral Cylindrical Dielectric Resonator Antenna (CDRA) for biomedical telemetry. CDRAs are known for their low loss, ruggedness, and stability, but their limited bandwidth and size make them unsuitable for wearable devices. The research addresses these limitations by proposing a dual mode antenna that operates in EH110 and TE210 modes. The sectoral CDRA is a quarter segment with Perfect Electric Conductor boundaries, reducing its size by a factor of four. Mathematical derivations of the field components for both modes are derived to support the design. To minimize specific absorption rate (SAR), an Artificial Magnetic Conductor (AMC) surface is applied to the antennas backside, enhancing compatibility with the transverse electric modes. The antenna achieves a bandwidth of 0.7 GHz (5.2-5.9 GHz), suitable for biomedical applications, with a measured peak gain of 7.9 dBi and a SAR of 1.24 W/kg when applied to a human arm.
comment: 13 pages
An Empirical Study of Lagrangian Methods in Safe Reinforcement Learning
In safety-critical domains such as robotics, navigation and power systems, constrained optimization problems arise where maximizing performance must be carefully balanced with associated constraints. Safe reinforcement learning provides a framework to address these challenges, with Lagrangian methods being a popular choice. However, the effectiveness of Lagrangian methods crucially depends on the choice of the Lagrange multiplier $\lambda$, which governs the trade-off between return and constraint cost. A common approach is to update the multiplier automatically during training. Although this is standard in practice, there remains limited empirical evidence on the robustness of an automated update and its influence on overall performance. Therefore, we analyze (i) optimality and (ii) stability of Lagrange multipliers in safe reinforcement learning across a range of tasks. We provide $\lambda$-profiles that give a complete visualization of the trade-off between return and constraint cost of the optimization problem. These profiles show the highly sensitive nature of $\lambda$ and moreover confirm the lack of general intuition for choosing the optimal value $\lambda^*$. Our findings additionally show that automated multiplier updates are able to recover and sometimes even exceed the optimal performance found at $\lambda^*$ due to the vast difference in their learning trajectories. Furthermore, we show that automated multiplier updates exhibit oscillatory behavior during training, which can be mitigated through PID-controlled updates. However, this method requires careful tuning to achieve consistently better performance across tasks. This highlights the need for further research on stabilizing Lagrangian methods in safe reinforcement learning. The code used to reproduce our results can be found at https://github.com/lindsayspoor/Lagrangian_SafeRL.
A condensing approach for linear-quadratic optimization with geometric constraints
Optimization problems with convex quadratic cost and polyhedral constraints are ubiquitous in signal processing, automatic control and decision-making. We consider here an enlarged problem class that allows to encode logical conditions and cardinality constraints, among others. In particular, we cover also situations where parts of the constraints are nonconvex and possibly complicated, but it is practical to compute projections onto this nonconvex set. Our approach combines the augmented Lagrangian framework with a solver-agnostic structure-exploiting subproblem reformulation. While convergence guarantees follow from the former, the proposed condensing technique leads to significant improvements in computational performance.
comment: 14 pages, 5 figures
ORIX: Orchestration of RIS with xApps for Smart Wireless Factory Environments
The vision of a smart wireless factory (SWF) demands highly flexible, low-latency, and reliable connectivity that goes beyond conventional wireless solutions. Reconfigurable intelligent surface (RIS)-empowered communications, when integrated with the open radio access network (O-RAN) architectures, have emerged as a promising enabler to meet these challenging requirements. This article introduces the methodology for the orchestration of RIS with xApps (ORIX), bringing the RIS technology into the O-RAN ecosystem through xApp-based control for SWF environments. ORIX features three key components: an O-RAN-compliant RIS service model for dynamic configuration, an RIS channel simulator that supports 3GPP indoor factory models with multiple industrial scenarios, and practical RIS optimization strategies with finite-resolution control. Together, these elements provide a realistic end-to-end emulation platform for evaluating RIS placement, control, and performance in SWF environments prior to deployment. The presented case study demonstrates how ORIX enables the evaluation of achievable performance gains, exploration of trade-offs among key RIS design parameters, and identification of deployment strategies that balance system performance with practical implementation constraints. By bridging theoretical advances with industrial feasibility, ORIX lays the groundwork for RIS-assisted O-RAN networks to power next-generation wireless communication in industrial scenarios.
comment: Submitted in IEEE
Inverse Optimal Control of Muscle Force Sharing During Pathological Gait
Muscle force sharing is typically resolved by minimizing a specific objective function to approximate neural control strategies. An inverse optimal control approach was applied to identify the "best" objective function, among a positive linear combination of basis objective functions, associated with the gait of two post-stroke males, one high-functioning (subject S1) and one low-functioning (subject S2). It was found that the "best" objective function is subject- and leg-specific. No single function works universally well, yet the best options are usually differently weighted combinations of muscle activation- and power-minimization. Subject-specific inverse optimal control models performed best on their respective limbs (\textbf{RMSE 178/213 N, CC 0.71/0.61} for non-paretic and paretic legs of S1; \textbf{RMSE 205/165 N, CC 0.88/0.85} for respective legs of S2), but cross-subject generalization was poor, particularly for paretic legs. Moreover, minimizing the root mean square of muscle power emerged as important for paretic limbs, while minimizing activation-based functions dominated for non-paretic limbs. This may suggest different neural control strategies between affected and unaffected sides, possibly altered by the presence of spasticity. Among the 15 considered objective functions commonly used in inverse dynamics-based computations, the root mean square of muscle power was the only one explicitly incorporating muscle velocity, leading to a possible model for spasticity in the paretic limbs. Although this objective function has been rarely used, it may be relevant for modeling pathological gait, such as post-stroke gait.
Integrating Trustworthy Artificial Intelligence with Energy-Efficient Robotic Arms for Waste Sorting
This paper presents a novel methodology that integrates trustworthy artificial intelligence (AI) with an energy-efficient robotic arm for intelligent waste classification and sorting. By utilizing a convolutional neural network (CNN) enhanced through transfer learning with MobileNetV2, the system accurately classifies waste into six categories: plastic, glass, metal, paper, cardboard, and trash. The model achieved a high training accuracy of 99.8% and a validation accuracy of 80.5%, demonstrating strong learning and generalization. A robotic arm simulator is implemented to perform virtual sorting, calculating the energy cost for each action using Euclidean distance to ensure optimal and efficient movement. The framework incorporates key elements of trustworthy AI, such as transparency, robustness, fairness, and safety, making it a reliable and scalable solution for smart waste management systems in urban settings.
comment: 5 pages, 2 figures
Process Automation Architecture Using RFID for Transparent Voting Systems
This paper presents the development of a process automation architecture leveraging Radio Frequency Identification (RFID) technology for secure, transparent and efficient voting systems. The proposed architecture automates the voting workflow through RFID-enabled voter identification, encrypted vote casting, and secure data transmission. Each eligible voter receives a smart RFID card containing a uniquely encrypted identifier, which is verified using an RC522 reader interfaced with a microcontroller. Upon successful verification, the voter interacts with a touchscreen interface to cast a vote, which is then encrypted using AES-128 and securely stored on a local SD card or transmitted via GSM to a central server. A tamper-proof monitoring mechanism records each session with time-stamped digital signatures, ensuring auditability and data integrity. The architecture is designed to function in both online and offline modes, with an automated batch synchronization mechanism that updates vote records once network connectivity is restored. System testing in simulated environments confirmed 100% voter authentication accuracy, minimized latency (average voting time of 11.5 seconds), and robustness against cloning, double voting, and data interception. The integration of real-time monitoring and secure process control modules enables electoral authorities to automate data logging, detect anomalies, and validate system integrity dynamically. This work demonstrates a scalable, automation-driven solution for voting infrastructure, offering enhanced transparency, resilience, and deployment flexibility, especially in environments where digital transformation of electoral processes is critically needed.
comment: 7 pages, 5 figures, 1 table
Accelerating Adaptive Systems via Normalized Parameter Estimation Laws
In this paper, we propose a new class of parameter estimation laws for adaptive systems, called \emph{normalized parameter estimation laws}. A key feature of these estimation laws is that they accelerate the convergence of the system state, $\mathit{x(t)}$, to the origin. We quantify this improvement by showing that our estimation laws guarantee finite integrability of the $\mathit{r}$-th root of the squared norm of the system state, i.e., \( \mathit{\|x(t)\|}_2^{2/\mathit{r}} \in \mathcal{L}_1, \) where $\mathit{r} \geq 1$ is a pre-specified parameter that, for a broad class of systems, can be chosen arbitrarily large. In contrast, standard Lyapunov-based estimation laws only guarantee integrability of $\mathit{\|x(t)\|}_2^2$ (i.e., $\mathit{r} = 1$). We motivate our method by showing that, for large values of $r$, this guarantee serves as a sparsity-promoting mechanism in the time domain, meaning that it penalizes prolonged signal duration and slow decay, thereby promoting faster convergence of $\mathit{x(t)}$. The proposed estimation laws do not rely on time-varying or high adaptation gains and do not require persistent excitation. Moreover, they can be applied to systems with matched and unmatched uncertainties, regardless of their dynamic structure, as long as a control Lyapunov function (CLF) exists. Finally, they are compatible with any CLF-based certainty equivalence controllers. We further develop higher-order extensions of our estimation laws by incorporating momentum into the estimation dynamics. We illustrate the performance improvements achieved with the proposed scheme through various numerical experiments.
Assessing the Quality of a Set of Basis Functions for Inverse Optimal Control via Projection onto Global Minimizers
Inverse optimization (Inverse optimal control) is the task of imputing a cost function such that given test points (trajectories) are (nearly) optimal with respect to the discovered cost. Prior methods in inverse optimization assume that the true cost is a convex combination of a set of convex basis functions and that this basis is consistent with the test points. However, the consistency assumption is not always justified, as in many applications the principles by which the data is generated are not well understood. This work proposes using the distance between a test point and the set of global optima generated by the convex combinations of the convex basis functions as a measurement for the expressive quality of the basis with respect to the test point. A large minimal distance invalidates the set of basis functions. The concept of a set of global optima is introduced and its properties are explored in unconstrained and constrained settings. Upper and lower bounds for the minimum distance in the convex quadratic setting are implemented by bi-level gradient descent and an enriched linear matrix inequality respectively. Extensions to this framework include max-representable basis functions, nonconvex basis functions (local minima), and applying polynomial optimization techniques.
comment: 8 pages, 4 figures
Comparison and performance analysis of dynamic encrypted control approaches
Encrypted controllers using homomorphic encryption have proven to guarantee the privacy of measurement and control signals, as well as system and controller parameters, while regulating the system as intended. However, encrypting dynamic controllers has remained a challenge due to growing noise and overflow issues in the encoding. In this paper, we review recent approaches to dynamic encrypted control, such as bootstrapping, periodic resets of the controller state, integer reformulations, and FIR controllers, and equip them with a stability and performance analysis to evaluate their suitability. We complement the analysis with a numerical performance comparison on a benchmark system.
A polynomial-based QCQP solver for encrypted optimization
In this paper, we present a novel method for solving a class of quadratically constrained quadratic optimization problems using only additions and multiplications. This approach enables solving constrained optimization problems on private data since the operations involved are compatible with the capabilities of homomorphic encryption schemes. To solve the constrained optimization problem, a sequence of polynomial penalty functions of increasing degree is introduced, which are sufficiently steep at the boundary of the feasible set. Adding the penalty function to the original cost function creates a sequence of unconstrained optimization problems whose minimizer always lies in the admissible set and converges to the minimizer of the constrained problem. A gradient descent method is used to generate a sequence of iterates associated with these problems. For the algorithm, it is shown that the iterate converges to a minimizer of the original problem, and the feasible set is positively invariant under the iteration. Finally, the method is demonstrated on an illustrative cryptographic problem, finding the smaller value of two numbers, and the encrypted implementability is discussed.
comment: Accepted for presentation at the 64th IEEE Conference on Decision and Control (CDC2025)
Enhanced Ground-Satellite Direct Access via Onboard Rydberg Atomic Quantum Receivers
Ground-satellite links for 6G networks face critical challenges, including severe path loss, tight size-weight-power limits, and congested spectrum, all of which significantly hinder the performance of traditional radio frequency (RF) front ends. This article introduces the Rydberg Atomic Quantum Receiver (RAQR) for onboard satellite systems, a millimeter-scale front end that converts radio fields to optical signals through atomic electromagnetically induced transparency. RAQR's high sensitivity and high frequency selectivity address link budget, payload, and interference challenges while fitting within space constraints. A hybrid atomic-electronic design and supporting signal model demonstrate enhanced data rate, coverage, and sensing accuracy relative to conventional RF receivers. The article concludes with integration strategies, distributed-satellite concepts, and open research problems for bringing RAQR-enabled satellite payloads into service.
comment: Submitted to IEEE Journal
Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems
Control-flow hijacking attacks manipulate orchestration mechanisms in multi-agent systems into performing unsafe actions that compromise the system and exfiltrate sensitive information. Recently proposed defenses, such as LlamaFirewall, rely on alignment checks of inter-agent communications to ensure that all agent invocations are "related to" and "likely to further" the original objective. We start by demonstrating control-flow hijacking attacks that evade these defenses even if alignment checks are performed by advanced LLMs. We argue that the safety and functionality objectives of multi-agent systems fundamentally conflict with each other. This conflict is exacerbated by the brittle definitions of "alignment" and the checkers' incomplete visibility into the execution context. We then propose, implement, and evaluate ControlValve, a new defense inspired by the principles of control-flow integrity and least privilege. ControlValve (1) generates permitted control-flow graphs for multi-agent systems, and (2) enforces that all executions comply with these graphs, along with contextual rules (generated in a zero-shot manner) for each agent invocation.
Floating-Base Deep Lagrangian Networks
Grey-box methods for system identification combine deep learning with physics-informed constraints, capturing complex dependencies while improving out-of-distribution generalization. Yet, despite the growing importance of floating-base systems such as humanoids and quadrupeds, current grey-box models ignore their specific physical constraints. For instance, the inertia matrix is not only positive definite but also exhibits branch-induced sparsity and input independence. Moreover, the 6x6 composite spatial inertia of the floating base inherits properties of single-rigid-body inertia matrices. As we show, this includes the triangle inequality on the eigenvalues of the composite rotational inertia. To address the lack of physical consistency in deep learning models of floating-base systems, we introduce a parameterization of inertia matrices that satisfies all these constraints. Inspired by Deep Lagrangian Networks (DeLaN), we train neural networks to predict physically plausible inertia matrices that minimize inverse dynamics error under Lagrangian mechanics. For evaluation, we collected and released a dataset on multiple quadrupeds and humanoids. In these experiments, our Floating-Base Deep Lagrangian Networks (FeLaN) achieve highly competitive performance on both simulated and real robots, while providing greater physical interpretability.
Generalized Group Selection Strategies for Self-sustainable RIS-aided Communication
Reconfigurable intelligent surface (RIS) is a cutting-edge communication technology that has been proposed as aviable option for beyond fifth-generation wireless communication networks. This paper investigates various group selection strategies in the context of grouping-based self-sustainable RIS-aided device-to-device (D2D) communication with spatially correlated wireless channels. Specifically, we consider both power splitting (PS) and time switching (TS) configurations, of the self-sustainable RIS to analyze the system performance and propose appropriate bounds on the choice of system parameters. The analysis takes into account a simplified linear energy harvesting (EH) model as well as a practical non-linear EH model. Based on the application requirements, we propose various group selection strategies at the RIS. Notably, each strategy schedules the k-th best available group at the RIS based on the end-to-end signal-to-noise ratio (SNR) and also the energy harvested at a particular group of the RIS. Accordingly, by using tools from high order statistics, we derive analytical expressions for the outage probability of each selection strategy. Moreover, by applying the tools from extreme value theory, we also investigate an asymptotic scenario, where the number of groups available for selection at an RIS approaches infinity. The nontrivial insights obtained from this approach is especially beneficial in applications like large intelligent surface-aided wireless communication. Finally, the numerical results demonstrate the importance and benefits of the proposed approaches in terms of metrics such as the data throughput and the outage (both data and energy) performance.
comment: This work has been submitted to an IEEE journal for possible publication
A Data-Driven Framework for Online Mitigation of False Data Injection Signals in Networked Control Systems
This paper introduces a novel two-stage framework for online mitigation of False Data Injection (FDI) signals to improve the resiliency of Networked Control Systems (NCSs) and ensure their safe operation in the presence of malicious activities. The first stage involves meta learning to select a base time series forecasting model within a stacked ensemble learning architecture. This is achieved by converting time series data into scalograms using continuous wavelet transform, which are then split into image frames to generate a scalo-temporal representation of the data and to distinguish between different complexity levels of time series data based on an entropy metric using a convolutional neural network. In the second stage, the selected model mitigates false data injection signals in real-time. The proposed framework's effectiveness is demonstrated through rigorous simulations involving the formation control of differential drive mobile robots. By addressing the security challenges in NCSs, this framework offers a promising approach to maintaining system integrity and ensuring operational safety.
comment: 17 pages, 9 figures
Semantic Intelligence: A Bio-Inspired Cognitive Framework for Embodied Agents
Recent advancements in Large Language Models (LLMs) have greatly enhanced natural language understanding and content generation. However, these models primarily operate in disembodied digital environments and lack interaction with the physical world. To address this limitation, Embodied Artificial Intelligence (EAI) has emerged, focusing on agents that can perceive and interact with their surroundings. Despite progress, current embodied agents face challenges in unstructured real-world environments due to insufficient semantic intelligence, which is critical for understanding and reasoning about complex tasks. This paper introduces the Semantic Intelligence-Driven Embodied (SIDE) agent framework, which integrates a hierarchical semantic cognition architecture with a semantic-driven decision-making process. This enables agents to reason about and interact with the physical world in a contextually adaptive manner. The framework is inspired by biological cognitive mechanisms and utilizes bio-inspired principles to design a semantic cognitive architecture that mimics how humans and animals integrate and process sensory information. We present this framework as a step toward developing more intelligent and versatile embodied agents.
Quantum Key Distribution for Virtual Power Plant Communication: A Lightweight Key-Aware Scheduler with Provable Stability
Virtual power plants (VPPs) are becoming a cornerstone of future grids, aggregating distributed PV, wind, storage, and flexible loads for market participation and real-time balancing. As operations move to minute-- and second--level feedback, communication security shifts from a compliance item to an operational constraint: latency, reliability, and confidentiality jointly determine whether dispatch, protection, and settlement signals arrive on time. Conventional PKI and key-rotation schemes struggle with cross-domain, high-frequency messaging and face long-term quantum threats. Quantum key distribution (QKD) offers information-theoretic key freshness, but its key yield is scarce and stochastic, often misaligned with bursty VPP traffic. This paper proposes a key-aware priority and quota framework that treats quantum keys as first-class scheduling resources. The design combines (i) forecast-driven long-term quotas and short-term tokens, (ii) key-aware deficit-round-robin arbitration, (iii) a preemptive emergency key reserve, and (iv) graceful degradation via encryption-mode switching and controlled down-sampling for non-critical traffic. A drift-plus-penalty analysis establishes strong stability under average supply--demand balance with quantifiable bounds on backlog and tail latency, providing interpretable operating guarantees. We build a reproducible testbed on IEEE 33- and 123-bus VPP systems and evaluate normal, degraded, and outage regimes with industry-consistent message classes and TTLs. Against FIFO, fixed-priority, and static-quota baselines, the proposed scheme consistently reduces tail delay and passive timeouts for critical messages, improves per-bit key utility, and enhances power-tracking reliability during key scarcity and regime switches.
Differentiating Through Power Flow Solutions for Admittance and Topology Control
The power flow equations relate bus voltage phasors to power injections via the network admittance matrix. These equations are central to the key operational and protection functions of power systems (e.g., optimal power flow scheduling and control, state estimation, protection, and fault location, among others). As control, optimization, and estimation of network admittance parameters are central to multiple avenues of research in electric power systems, we propose a linearization of power flow solutions obtained by implicitly differentiating them with respect to the network admittance parameters. This is achieved by utilizing the implicit function theorem, in which we show that such a differentiation is guaranteed to exist under mild conditions and is applicable to generic power systems (radial or meshed). The proposed theory is applied to derive sensitivities of complex voltages, line currents, and power flows. The developed theory of linearizing the power flow equations around changes in the complex network admittance parameters has numerous applications. We demonstrate several of these applications, such as predicting the nodal voltages when the network topology changes without solving the power flow equations. We showcase the application for continuous admittance control, which is used to increase the hosting capacity of a given distribution network.
comment: 10 pages, 6 figures
ANGEL: A Novel Gripper for Versatile and Light-touch Fruit Harvesting
Fruit harvesting remains predominantly a labor-intensive process, motivating the development of research for robotic grippers. Conventional rigid or vacuum-driven grippers require complex mechanical design or high energy consumption. Current enveloping-based fruit harvesting grippers lack adaptability to fruits of different sizes. This paper introduces a drawstring-inspired, cable-driven soft gripper for versatile and gentle fruit harvesting. The design employs 3D-printed Thermoplastic Polyurethane (TPU) pockets with integrated steel wires that constrict around the fruit when actuated, distributing pressure uniformly to minimize bruising and allow versatility to fruits of varying sizes. The lightweight structure, which requires few components, reduces mechanical complexity and cost compared to other grippers. Actuation is achieved through servo-driven cable control, while motor feedback provides autonomous grip adjustment with tunable grip strength. Experimental validation shows that, for tomatoes within the gripper's effective size range, harvesting was achieved with a 0% immediate damage rate and a bruising rate of less than 9% after five days, reinforcing the gripper's suitability for fruit harvesting.
Provably Optimal Reinforcement Learning under Safety Filtering
Recent advances in reinforcement learning (RL) enable its use on increasingly complex tasks, but the lack of formal safety guarantees still limits its application in safety-critical settings. A common practical approach is to augment the RL policy with a safety filter that overrides unsafe actions to prevent failures during both training and deployment. However, safety filtering is often perceived as sacrificing performance and hindering the learning process. We show that this perceived safety-performance tradeoff is not inherent and prove, for the first time, that enforcing safety with a sufficiently permissive safety filter does not degrade asymptotic performance. We formalize RL safety with a safety-critical Markov decision process (SC-MDP), which requires categorical, rather than high-probability, avoidance of catastrophic failure states. Additionally, we define an associated filtered MDP in which all actions result in safe effects, thanks to a safety filter that is considered to be a part of the environment. Our main theorem establishes that (i) learning in the filtered MDP is safe categorically, (ii) standard RL convergence carries over to the filtered MDP, and (iii) any policy that is optimal in the filtered MDP-when executed through the same filter-achieves the same asymptotic return as the best safe policy in the SC-MDP, yielding a complete separation between safety enforcement and performance optimization. We validate the theory on Safety Gymnasium with representative tasks and constraints, observing zero violations during training and final performance matching or exceeding unfiltered baselines. Together, these results shed light on a long-standing question in safety-filtered learning and provide a simple, principled recipe for safe RL: train and deploy RL policies with the most permissive safety filter that is available.
comment: 17 pages, 3 figures
Prompt-to-Primal Teaching
This paper introduces Prompt-to-Primal (P2P) Teaching, an AI-integrated instructional approach that links prompt-driven exploration with first-principles reasoning, guided and moderated by the instructor within the classroom setting. In P2P teaching, student-generated AI prompts serve as entry points for inquiry and initial discussions in class, while the instructor guides learners to validate, challenge, and reconstruct AI responses through fundamental physical and mathematical laws. The approach encourages self-reflective development, critical evaluation of AI outputs, and conceptual foundational knowledge of the core engineering principles. A large language model (LLM) can be a highly effective tool for those who already possess foundational knowledge of a subject; however, it may also mislead students who lack sufficient background in the subject matter. Results from two student cohorts across different semesters suggest the pedagogical effectiveness of the P2P teaching framework in enhancing both AI literacy and engineering reasoning.
comment: 9 pages, 5 figures
An Exact Quantile-Energy Equality for Terminal Halfspaces in Linear-Gaussian Control with a Discrete-Time Companion, KL/Schrodinger Links, and High-Precision Validation
We prove an exact equality between the minimal quadratic control energy and the squared normal-quantile gap for terminal halfspaces in linear-Gaussian systems with additive control and quadratic effort $E(u) = \tfrac12\!\int u^\top M u\,dt$ where $M = B^\top\Sigma^{-1}B$. For terminal halfspace events, the minimal energy equals the squared normal-quantile gap divided by twice a controllability-to-noise ratio $R_T^2(w)=(w^\top W_c^M w)/(w^\top V_T w)$ and is attained by a matched-filter control. We provide an exact zero-order-hold discrete-time companion via block exponentials, relate the result to minimum-energy control, Gaussian isoperimetry, risk-sensitive/KL control, and Schrodinger bridges, and validate to high precision with Monte Carlo. We state assumptions, singular-$M$ handling, and edge cases. The statement is a compact synthesis and design-ready translator, not a universal principle. Novelty: while the ingredients (Gramians, Cauchy-Schwarz, Gaussian isoperimetry) are classical, to our knowledge the explicit quantile-energy equality with a constructive matched-filter achiever for terminal halfspaces, and its discrete-time companion, are not recorded together in the cited literature.
Model-Free Dynamic Consensus in Multi-Agent Systems: A Q-Function Perspective
This paper presents a new method for achieving dynamic consensus in linear discrete-time homogeneous multi-agent systems (MAS) with marginally stable or unstable dynamics. The guarantee of consensus in this setting involves a set of constraints based on the graph's spectral properties, complicating the design of the coupling gains. This challenge intensifies for large-scale systems with diverse graph Laplacian spectra. The proposed approach reformulates the dynamic consensus problem with a prescribed convergence rate using a state-action value function framework inspired by optimal control theory. Specifically, a synthetic linear quadratic regulation (LQR) formulation is introduced to encode the consensus objective, enabling its translation into a convex semidefinite programming (SDP) problem. The resulting SDP is applicable in both model-based and model-free settings for jointly designing the local feedback and coupling gains. To handle the inherent non-convex feasibility conditions, a convex-concave decomposition strategy is employed. Adaptation of the method in a completely model-free set-up eliminates the need for system identification or knowledge of the agents' dynamics. Instead, it relies on input-state data collection and offers an entirely data-driven equivalent SDP formulation. Finally, a new algorithm balancing feasibility, convergence rate, robustness, and energy efficiency, is established to provide design flexibility. Numerical simulations validate the method's effectiveness in various scenarios.
Direct data-driven interpolation and approximation of linear parameter-varying system trajectories
We consider the problem of estimating missing values in trajectories of linear parameter-varying (LPV) systems. We solve this interpolation problem for the class of shifted-affine LPV systems. Conditions for the existence and uniqueness of solutions are given and a direct data-driven algorithm for its computation is presented, i.e., the data-generating system is not given by a parametric model but is implicitly specified by data. We illustrate the applicability of the proposed solution on illustrative examples of a mass-spring-damper system with exogenous and endogenous parameter variation.
comment: 10 pages, 5 figures, submitted for review
Sparse Identification of Nonlinear Dynamics Enhanced by Ensemble Learning, Multi-Step Prediction Evaluation, Elite Strategy, and Classification Techniques for Applications to Industrial Systems
This paper proposes a sparse identification of nonlinear dynamics (SINDy) with control and exogenous inputs for highly accurate and reliable prediction. Although SINDy is recognized as a remarkable approach for identifying nonlinear systems, several challenges remain. Its application to industrial systems remains limited, and multi-step predictions are not guaranteed due to overfitting and noisy data. This phenomenon is often caused by the increase in basis functions resulting from the extension of coordinates, such as time-delay embedding. To address these problems, this study proposes an emphasized SINDy framework by integrating ensemble-learning, multi-step prediction evaluations, elite strategy, and classification techniques (EMEC-SINDy), while preserving convex optimization. The proposed method employs library bagging and extracts elites with an R-squared greater than 90%. Then, clustering is performed on the surviving elites because physically motivated basis functions are not always available, and the elites obtained do not always have similar basis functions. After the classification, discrete model candidates are obtained by taking the mean of each classified elite. Finally, the best model is selected. Simulation results demonstrate that EMEC-SINDy significantly outperforms original SINDy approaches in multi-step prediction accuracy under noisy conditions, validating its applicability to the diesel engine airpath system, which is known as a complex and highly coupled nonlinear multi-input multi-output system.
Observer Design over Hypercomplex Quaternions
We develop observer design over hypercomplex quaternions in a characteristic-polynomial-free framework. Using the standard right-module convention, we derive a right observable companion form and its companion polynomial that encodes error dynamics via right-eigenvalue similarity classes. The design mirrors the real/complex case - coefficient updates in companion coordinates, followed by a similarity back - yet avoids determinants, characteristic/minimal polynomials, and Cayley-Hamilton identities that do not transfer to quaternions. We also give an Ackermann-type construction for the important case of closed-loop companion polynomials with real coefficients, ensuring similarity-equivariant evaluation. The results yield simple recipes for full-order observers directly over quaternions, clarify the role of right spectra and their similarity classes, and pinpoint when classical one-shot formulas remain valid. Numerical examples illustrate the method and advantages over vectorized or complex-adjoint surrogates.
comment: Submitted for presentation at the 24th European Control Conference (ECC 2026), Reykjavik, Iceland. This work was co-funded by the European Union under the project ROBOPROX (reg. no. CZ.02.01.01/00/22 008/0004590)
Optimal control of stochastic reaction networks with entropic control cost and emergence of mode-switching strategies
Controlling the stochastic dynamics of biological populations is a challenge that arises across various biological contexts. However, these dynamics are inherently nonlinear and involve a discrete state space, i.e., the number of molecules, cells, or organisms. Additionally, the possibility of extinction has a significant impact on both dynamics and control strategies, particularly when the population size is small. These factors hamper the direct application of conventional control theories to biological systems. To address these challenges, we formulate the optimal control problem for stochastic population dynamics by utilizing control cost functions based on the f-divergence, which naturally accounts for population-specific factors. If Kullback-Leibler (KL) divergence is adopted for the cost function, the complex nonlinear Hamilton-Jacobi-Bellman equation is simplified into a linear form, facilitating efficient computation of optimal solutions. We demonstrate the effectiveness of our approach by applying it to the control of interacting random walkers, Moran processes, and SIR models, and observe the mode-switching phenomena in the control strategies. Our approach provides new opportunities for applying control theory to a wide range of biological problems.
comment: 12 pages, 4 figures
Informativity Conditions for Multiple Signals: Properties, Experimental Design, and Applications
Recent studies highlight the importance of persistently exciting condition in single signal sequence for model identification and data-driven control methodologies. However, maintaining prolonged excitation in control signals introduces significant challenges, as continuous excitation can reduce the lifetime of mechanical devices. In this paper, we introduce three informativity conditions for various types of multi-signal data, each augmented by weight factors. We explore the interrelations between these conditions and their rank properties in linear time-invariant systems. Furthermore, we introduce open-loop experimental design methods tailored to each of the three conditions, which can synthesize the required excitation conditions either offline or online, even in the presence of limited information within each signal segment. We demonstrate the effectiveness of these informativity conditions in least-squares identification. Additionally, all three conditions can extend Willems' fundamental lemma and are utilized to assess the properties of the system. Illustrative examples confirm that these conditions yield satisfactory outcomes in both least-squares identification and the construction of data-driven controllers.
Accurate Small-Signal Modeling of Digitally Controlled Buck Converters with ADC-PWM Synchronization
Digital control has become increasingly widespread in modern power electronic converters. When acquiring feedback signals such as the inductor current, synchronizing the analog-to-digital converter (ADC) with the digital pulse-width modulator (DPWM) is commonly employed to accurately track their steady-state average. However, the small-signal implications of such synchronization have not been investigated. This paper presents an exact small-signal model for digitally controlled buck converters operating in forced continuous-conduction mode (FCCM) under constant-frequency current-mode control, explicitly accounting for DPWM-ADC synchronization. Using a sampled-data framework, the proposed model captures all sideband effects introduced by the sampling process, yielding precise predictions of both analog and digital loop gains, even at frequencies beyond the switching and sampling frequencies. Both asymmetrical and symmetrical carrier modulations are considered. Furthermore, the digital loop gain is derived in closed form using the modified z-transform, enabling low-complexity compensator design and stability assessment. Within this framework, the analog loop gain can be directly obtained from the digital loop gain, thereby eliminating the need for computationally intensive infinite series evaluations. The validity of the proposed model is confirmed through both simulation and experimental results.
Modeling the Impact of Communication and Human Uncertainties on Runway Capacity in Terminal Airspace
We investigate the potential impact of communication and human performance uncertainties on runway operations. Specifically, we consider these impacts within the context of an arrival scenario with two converging flows: a straight-in approach stream and a downwind stream merging into it. Both arrival stream are modeled using a modified Possion distribution that incorporate the separation minima as well as the runway occupancy time. Various system level uncertainties are addressed in this process, including communication link- and human-related uncertainties. In this research, we first build a Monte Carlo-based discrete-time simulation, where aircraft arrivals are generated by modified Poisson processes subject to minimum separation constraints, simulating various traffic operations. The merging logic incorporates standard bank angle continuous turn-to-final, pilot response delays, and dynamic gap availability in real time. Then, we investigate an automated final approach vectoring model (i.e., Auto-ATC), in which inverse optimal control is used to learn decision advisories from human expert records. By augmenting trajectories and incorporating the aforementioned uncertainties into the planning scenario, we create a setup analogous to the discrete event simulation. For both studies, runway capacity is measured by runway throughput, the fraction of downwind arrivals that merge immediately without holding, and the average delay (i.e., holding time/distance) experienced on the downwind leg. This research provides a method for runway capacity estimation in merging scenarios, and demonstrates that aeronautical communication link uncertainties significantly affect runway capacity in current voice-based operations, whereas the impact can be mitigated in autonomous operational settings.
Identification and Adaptive Control of Markov Jump Systems: Sample Complexity and Regret Bounds
Learning how to effectively control unknown dynamical systems is crucial for intelligent autonomous systems. This task becomes a significant challenge when the underlying dynamics are changing with time. Motivated by this challenge, this paper considers the problem of controlling an unknown Markov jump linear system (MJS) to optimize a quadratic objective. By taking a model-based perspective, we consider identification-based adaptive control of MJSs. We first provide a system identification algorithm for MJS to learn the dynamics in each mode as well as the Markov transition matrix, underlying the evolution of the mode switches, from a single trajectory of the system states, inputs, and modes. Through martingale-based arguments, sample complexity of this algorithm is shown to be $\mathcal{O}(1/\sqrt{T})$. We then propose an adaptive control scheme that performs system identification together with certainty equivalent control to adapt the controllers in an episodic fashion. Combining our sample complexity results with recent perturbation results for certainty equivalent control, we prove that when the episode lengths are appropriately chosen, the proposed adaptive control scheme achieves $\mathcal{O}(\sqrt{T})$ regret, which can be improved to $\mathcal{O}(polylog(T))$ with partial knowledge of the system. Our proof strategy introduces innovations to handle Markovian jumps and a weaker notion of stability common in MJSs. Our analysis provides insights into system theoretic quantities that affect learning accuracy and control performance. Numerical simulations are presented to further reinforce these insights.
comment: Improved results using Martingale-based arguments
Robotics
DINO-CVA: A Multimodal Goal-Conditioned Vision-to-Action Model for Autonomous Catheter Navigation
Cardiac catheterization remains a cornerstone of minimally invasive interventions, yet it continues to rely heavily on manual operation. Despite advances in robotic platforms, existing systems are predominantly follow-leader in nature, requiring continuous physician input and lacking intelligent autonomy. This dependency contributes to operator fatigue, more radiation exposure, and variability in procedural outcomes. This work moves towards autonomous catheter navigation by introducing DINO-CVA, a multimodal goal-conditioned behavior cloning framework. The proposed model fuses visual observations and joystick kinematics into a joint embedding space, enabling policies that are both vision-aware and kinematic-aware. Actions are predicted autoregressively from expert demonstrations, with goal conditioning guiding navigation toward specified destinations. A robotic experimental setup with a synthetic vascular phantom was designed to collect multimodal datasets and evaluate performance. Results show that DINO-CVA achieves high accuracy in predicting actions, matching the performance of a kinematics-only baseline while additionally grounding predictions in the anatomical environment. These findings establish the feasibility of multimodal, goal-conditioned architectures for catheter navigation, representing an important step toward reducing operator dependency and improving the reliability of catheterbased therapies.
Safe Payload Transfer with Ship-Mounted Cranes: A Robust Model Predictive Control Approach
Ensuring safe real-time control of ship-mounted cranes in unstructured transportation environments requires handling multiple safety constraints while maintaining effective payload transfer performance. Unlike traditional crane systems, ship-mounted cranes are consistently subjected to significant external disturbances affecting underactuated crane dynamics due to the ship's dynamic motion response to harsh sea conditions, which can lead to robustness issues. To tackle these challenges, we propose a robust and safe model predictive control (MPC) framework and demonstrate it on a 5-DOF crane system, where a Stewart platform simulates the external disturbances that ocean surface motions would have on the supporting ship. The crane payload transfer operation must avoid obstacles and accurately place the payload within a designated target area. We use a robust zero-order control barrier function (R-ZOCBF)-based safety constraint in the nonlinear MPC to ensure safe payload positioning, while time-varying bounding boxes are utilized for collision avoidance. We introduce a new optimization-based online robustness parameter adaptation scheme to reduce the conservativeness of R-ZOCBFs. Experimental trials on a crane prototype demonstrate the overall performance of our safe control approach under significant perturbing motions of the crane base. While our focus is on crane-facilitated transfer, the methods more generally apply to safe robotically-assisted parts mating and parts insertion.
Design of an Affordable, Fully-Actuated Biomimetic Hand for Dexterous Teleoperation Systems IROS2025
This paper addresses the scarcity of affordable, fully-actuated five-fingered hands for dexterous teleoperation, which is crucial for collecting large-scale real-robot data within the "Learning from Demonstrations" paradigm. We introduce the prototype version of the RAPID Hand, the first low-cost, 20-degree-of-actuation (DoA) dexterous hand that integrates a novel anthropomorphic actuation and transmission scheme with an optimized motor layout and structural design to enhance dexterity. Specifically, the RAPID Hand features a universal phalangeal transmission scheme for the non-thumb fingers and an omnidirectional thumb actuation mechanism. Prioritizing affordability, the hand employs 3D-printed parts combined with custom gears for easier replacement and repair. We assess the RAPID Hand's performance through quantitative metrics and qualitative testing in a dexterous teleoperation system, which is evaluated on three challenging tasks: multi-finger retrieval, ladle handling, and human-like piano playing. The results indicate that the RAPID Hand's fully actuated 20-DoF design holds significant promise for dexterous teleoperation.
comment: Accepted by IROS2025
C-Free-Uniform: A Map-Conditioned Trajectory Sampler for Model Predictive Path Integral Control ICRA
Trajectory sampling is a key component of sampling-based control mechanisms. Trajectory samplers rely on control input samplers, which generate control inputs u from a distribution p(u | x) where x is the current state. We introduce the notion of Free Configuration Space Uniformity (C-Free-Uniform for short) which has two key features: (i) it generates a control input distribution so as to uniformly sample the free configuration space, and (ii) in contrast to previously introduced trajectory sampling mechanisms where the distribution p(u | x) is independent of the environment, C-Free-Uniform is explicitly conditioned on the current local map. Next, we integrate this sampler into a new Model Predictive Path Integral (MPPI) Controller, CFU-MPPI. Experiments show that CFU-MPPI outperforms existing methods in terms of success rate in challenging navigation tasks in cluttered polygonal environments while requiring a much smaller sampling budget.
comment: Submitted to the 2026 IEEE International Conference on Robotics and Automation (ICRA). 8 pages, 4 figures
DiRAC - Distributed Robot Awareness and Consensus
DiRAC is a scalable, distributed framework designed to enable efficient task assignment and path planning in very large robotic swarms. It introduces a novel zone-partitioned architecture with dynamically elected leaders and a tick-synchronized consensus protocol that yields strong consistency and deterministic outcomes. For path planning, DiRAC uses a novel algorithm, a force-based decentralized planner for real-time collision resolution. Validated within ROS 2 middleware through preliminary simulation, DiRAC demonstrates architectural scalability and modular efficiency in simulated warehouse environments, laying the groundwork for real-world deployment in large-scale industrial and logistics domains.
An RGB-D Image Dataset for Lychee Detection and Maturity Classification for Robotic Harvesting
Lychee is a high-value subtropical fruit. The adoption of vision-based harvesting robots can significantly improve productivity while reduce reliance on labor. High-quality data are essential for developing such harvesting robots. However, there are currently no consistently and comprehensively annotated open-source lychee datasets featuring fruits in natural growing environments. To address this, we constructed a dataset to facilitate lychee detection and maturity classification. Color (RGB) images were acquired under diverse weather conditions, and at different times of the day, across multiple lychee varieties, such as Nuomici, Feizixiao, Heiye, and Huaizhi. The dataset encompasses three different ripeness stages and contains 11,414 images, consisting of 878 raw RGB images, 8,780 augmented RGB images, and 1,756 depth images. The images are annotated with 9,658 pairs of lables for lychee detection and maturity classification. To improve annotation consistency, three individuals independently labeled the data, and their results were then aggregated and verified by a fourth reviewer. Detailed statistical analyses were done to examine the dataset. Finally, we performed experiments using three representative deep learning models to evaluate the dataset. It is publicly available for academic
A Preliminary Exploration of the Differences and Conjunction of Traditional PNT and Brain-inspired PNT
Developing universal Positioning, Navigation, and Timing (PNT) is our enduring goal. Today's complex environments demand PNT that is more resilient, energy-efficient and cognitively capable. This paper asks how we can endow unmanned systems with brain-inspired spatial cognition navigation while exploiting the high precision of machine PNT to advance universal PNT. We provide a new perspective and roadmap for shifting PNT from "tool-oriented" to "cognition-driven". Contributions: (1) multi-level dissection of differences among traditional PNT, biological brain PNT and brain-inspired PNT; (2) a four-layer (observation-capability-decision-hardware) fusion framework that unites numerical precision and brain-inspired intelligence; (3) forward-looking recommendations for future development of brain-inspired PNT.
T3 Planner: A Self-Correcting LLM Framework for Robotic Motion Planning with Temporal Logic
Translating natural language instructions into executable motion plans is a fundamental challenge in robotics. Traditional approaches are typically constrained by their reliance on domain-specific expertise to customize planners, and often struggle with spatio-temporal couplings that usually lead to infeasible motions or discrepancies between task planning and motion execution. Despite the proficiency of Large Language Models (LLMs) in high-level semantic reasoning, hallucination could result in infeasible motion plans. In this paper, we introduce the T3 Planner, an LLM-enabled robotic motion planning framework that self-corrects it output with formal methods. The framework decomposes spatio-temporal task constraints via three cascaded modules, each of which stimulates an LLM to generate candidate trajectory sequences and examines their feasibility via a Signal Temporal Logic (STL) verifier until one that satisfies complex spatial, temporal, and logical constraints is found.Experiments across different scenarios show that T3 Planner significantly outperforms the baselines. The required reasoning can be distilled into a lightweight Qwen3-4B model that enables efficient deployment. All supplementary materials are accessible at https://github.com/leeejia/T3_Planner.
End-to-end Listen, Look, Speak and Act
Human interaction is inherently multimodal and full-duplex: we listen while watching, speak while acting, and fluidly adapt to turn-taking and interruptions. Realizing these capabilities is essential for building models simulating humans. We present ELLSA (End-to-end Listen, Look, Speak and Act), which, to our knowledge, is the first full-duplex, end-to-end model that simultaneously perceives and generates across vision, text, speech, and action within a single architecture, enabling interaction patterns previously out of reach, yielding more natural, human-like behaviors. At its core is a novel SA-MoE architecture (Self-Attention Mixture-of-Experts) that routes each modality to specialized experts and fuses them through a unified attention backbone. This provides a generalizable solution for joint multimodal perception and concurrent generation, leveraging strong pre-trained components while enabling efficient modality integration and mitigating modality interference. On speech-interaction and robot-manipulation benchmarks, ELLSA matches modality-specific baselines, while uniquely supporting advanced multimodal and full-duplex behaviors such as dialogue and action turn-taking, defective instruction rejection, speaking-while-acting, context-grounded visual question answering, and action barge-ins. We contend that ELLSA represents a step toward more natural and general interactive intelligence, contributing to the broader pursuit of artificial general intelligence. All data, code and model checkpoints will be released upon acceptance.
comment: 22 pages, 8 figures
Adaptive Invariant Extended Kalman Filter for Legged Robot State Estimation IROS
State estimation is crucial for legged robots as it directly affects control performance and locomotion stability. In this paper, we propose an Adaptive Invariant Extended Kalman Filter to improve proprioceptive state estimation for legged robots. The proposed method adaptively adjusts the noise level of the contact foot model based on online covariance estimation, leading to improved state estimation under varying contact conditions. It effectively handles small slips that traditional slip rejection fails to address, as overly sensitive slip rejection settings risk causing filter divergence. Our approach employs a contact detection algorithm instead of contact sensors, reducing the reliance on additional hardware. The proposed method is validated through real-world experiments on the quadruped robot LeoQuad, demonstrating enhanced state estimation performance in dynamic locomotion scenarios.
comment: 6 pages, accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025
Towards Active Excitation-Based Dynamic Inertia Identification in Satellites
This paper presents a comprehensive analysis of how excitation design influences the identification of the inertia properties of rigid nano- and micro-satellites. We simulate nonlinear attitude dynamics with reaction-wheel coupling, actuator limits, and external disturbances, and excite the system using eight torque profiles of varying spectral richness. Two estimators are compared, a batch Least Squares method and an Extended Kalman Filter, across three satellite configurations and time-varying inertia scenarios. Results show that excitation frequency content and estimator assumptions jointly determine estimation accuracy and robustness, offering practical guidance for in-orbit adaptive inertia identification by outlining the conditions under which each method performs best. The code is provided as open-source .
First Responders' Perceptions of Semantic Information for Situational Awareness in Robot-Assisted Emergency Response
This study investigates First Responders' (FRs) attitudes toward the use of semantic information and Situational Awareness (SA) in robotic systems during emergency operations. A structured questionnaire was administered to 22 FRs across eight countries, capturing their demographic profiles, general attitudes toward robots, and experiences with semantics-enhanced SA. Results show that most FRs expressed positive attitudes toward robots, and rated the usefulness of semantic information for building SA at an average of 3.6 out of 5. Semantic information was also valued for its role in predicting unforeseen emergencies (mean 3.9). Participants reported requiring an average of 74.6\% accuracy to trust semantic outputs and 67.8\% for them to be considered useful, revealing a willingness to use imperfect but informative AI support tools. To the best of our knowledge, this study offers novel insights by being one of the first to directly survey FRs on semantic-based SA in a cross-national context. It reveals the types of semantic information most valued in the field, such as object identity, spatial relationships, and risk context-and connects these preferences to the respondents' roles, experience, and education levels. The findings also expose a critical gap between lab-based robotics capabilities and the realities of field deployment, highlighting the need for more meaningful collaboration between FRs and robotics researchers. These insights contribute to the development of more user-aligned and situationally aware robotic systems for emergency response.
CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier Functions
Reinforcement learning (RL), while powerful and expressive, can often prioritize performance at the expense of safety. Yet safety violations can lead to catastrophic outcomes in real-world deployments. Control Barrier Functions (CBFs) offer a principled method to enforce dynamic safety -- traditionally deployed online via safety filters. While the result is safe behavior, the fact that the RL policy does not have knowledge of the CBF can lead to conservative behaviors. This paper proposes CBF-RL, a framework for generating safe behaviors with RL by enforcing CBFs in training. CBF-RL has two key attributes: (1) minimally modifying a nominal RL policy to encode safety constraints via a CBF term, (2) and safety filtering of the policy rollouts in training. Theoretically, we prove that continuous-time safety filters can be deployed via closed-form expressions on discrete-time roll-outs. Practically, we demonstrate that CBF-RL internalizes the safety constraints in the learned policy -- both enforcing safer actions and biasing towards safer rewards -- enabling safe deployment without the need for an online safety filter. We validate our framework through ablation studies on navigation tasks and on the Unitree G1 humanoid robot, where CBF-RL enables safer exploration, faster convergence, and robust performance under uncertainty, enabling the humanoid robot to avoid obstacles and climb stairs safely in real-world settings without a runtime safety filter.
comment: 8 pages
Architecture Is All You Need: Diversity-Enabled Sweet Spots for Robust Humanoid Locomotion
Robust humanoid locomotion in unstructured environments requires architectures that balance fast low-level stabilization with slower perceptual decision-making. We show that a simple layered control architecture (LCA), a proprioceptive stabilizer running at high rate, coupled with a compact low-rate perceptual policy, enables substantially more robust performance than monolithic end-to-end designs, even when using minimal perception encoders. Through a two-stage training curriculum (blind stabilizer pretraining followed by perceptual fine-tuning), we demonstrate that layered policies consistently outperform one-stage alternatives in both simulation and hardware. On a Unitree G1 humanoid, our approach succeeds across stair and ledge tasks where one-stage perceptual policies fail. These results highlight that architectural separation of timescales, rather than network scale or complexity, is the key enabler for robust perception-conditioned locomotion.
comment: 8 pages
Online automatic code generation for robot swarms: LLMs and self-organizing hierarchy IROS 2025
Our recently introduced self-organizing nervous system (SoNS) provides robot swarms with 1) ease of behavior design and 2) global estimation of the swarm configuration and its collective environment, facilitating the implementation of online automatic code generation for robot swarms. In a demonstration with 6 real robots and simulation trials with >30 robots, we show that when a SoNS-enhanced robot swarm gets stuck, it can automatically solicit and run code generated by an external LLM on the fly, completing its mission with an 85% success rate.
comment: This abstract was accepted to and presented at the "Multi-Agent Cooperative Systems and Swarm Robotics in the Era of Generative AI" (MACRAI) workshop at the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025)
VolleyBots: A Testbed for Multi-Drone Volleyball Game Combining Motion Control and Strategic Play NeurIPS 2025
Robot sports, characterized by well-defined objectives, explicit rules, and dynamic interactions, present ideal scenarios for demonstrating embodied intelligence. In this paper, we present VolleyBots, a novel robot sports testbed where multiple drones cooperate and compete in the sport of volleyball under physical dynamics. VolleyBots integrates three features within a unified platform: competitive and cooperative gameplay, turn-based interaction structure, and agile 3D maneuvering. These intertwined features yield a complex problem combining motion control and strategic play, with no available expert demonstrations. We provide a comprehensive suite of tasks ranging from single-drone drills to multi-drone cooperative and competitive tasks, accompanied by baseline evaluations of representative reinforcement learning (RL), multi-agent reinforcement learning (MARL) and game-theoretic algorithms. Simulation results show that on-policy RL methods outperform off-policy methods in single-agent tasks, but both approaches struggle in complex tasks that combine motion control and strategic play. We additionally design a hierarchical policy which achieves 69.5% win rate against the strongest baseline in the 3 vs 3 task, demonstrating its potential for tackling the complex interplay between low-level control and high-level strategy. To highlight VolleyBots' sim-to-real potential, we further demonstrate the zero-shot deployment of a policy trained entirely in simulation on real-world drones.
comment: Accepted by NeurIPS 2025
Learn2Decompose: Learning Problem Decomposition for Efficient Sequential Multi-object Manipulation Planning
We present a Reactive Task and Motion Planning (TAMP) approach for efficient sequential multi-object manipulation in dynamic environments. Conventional TAMP solvers experience an exponential increase in planning time as the planning horizon and number of objects grow, limiting their applicability in real-world scenarios. To address this, we propose learning problem decomposition from demonstrations to accelerate TAMP solvers. Our approach consists of three key components: goal decomposition learning, temporal distance learning, and object reduction. Goal decomposition identifies the necessary sequences of states that the system must pass through before reaching the final goal, treating them as subgoal sequences. Temporal distance learning predicts the temporal distance between two states, enabling the system to identify the closest subgoal from a disturbed state. Object reduction minimizes the set of active objects considered during replanning, further improving efficiency. We evaluate our approach on three benchmarks, demonstrating its effectiveness in improving replanning efficiency for sequential multi-object manipulation tasks in dynamic environments.
LIPM-Guided Reinforcement Learning for Stable and Perceptive Locomotion in Bipedal Robots
Achieving stable and robust perceptive locomotion for bipedal robots in unstructured outdoor environments remains a critical challenge due to complex terrain geometry and susceptibility to external disturbances. In this work, we propose a novel reward design inspired by the Linear Inverted Pendulum Model (LIPM) to enable perceptive and stable locomotion in the wild. The LIPM provides theoretical guidance for dynamic balance by regulating the center of mass (CoM) height and the torso orientation. These are key factors for terrain-aware locomotion, as they help ensure a stable viewpoint for the robot's camera. Building on this insight, we design a reward function that promotes balance and dynamic stability while encouraging accurate CoM trajectory tracking. To adaptively trade off between velocity tracking and stability, we leverage the Reward Fusion Module (RFM) approach that prioritizes stability when needed. A double-critic architecture is adopted to separately evaluate stability and locomotion objectives, improving training efficiency and robustness. We validate our approach through extensive experiments on a bipedal robot in both simulation and real-world outdoor environments. The results demonstrate superior terrain adaptability, disturbance rejection, and consistent performance across a wide range of speeds and perceptual conditions.
Exploring the Limits of Vision-Language-Action Manipulations in Cross-task Generalization
The generalization capabilities of vision-language-action (VLA) models to unseen tasks are crucial to achieving general-purpose robotic manipulation in open-world settings. However, the cross-task generalization capabilities of existing VLA models remain significantly underexplored. To address this gap, we introduce AGNOSTOS, a novel simulation benchmark designed to rigorously evaluate cross-task zero-shot generalization in manipulation. AGNOSTOS comprises 23 unseen manipulation tasks for testing, distinct from common training task distributions, and incorporates two levels of generalization difficulty to assess robustness. Our systematic evaluation reveals that current VLA models, despite being trained on diverse datasets, struggle to generalize effectively to these unseen tasks. To overcome this limitation, we propose Cross-Task In-Context Manipulation (X-ICM), a method that conditions large language models (LLMs) on in-context demonstrations from seen tasks to predict action sequences for unseen tasks. Additionally, we introduce a dynamics-guided sample selection strategy that identifies relevant demonstrations by capturing cross-task dynamics. On AGNOSTOS, X-ICM significantly improves cross-task zero-shot generalization performance over leading VLAs. We believe AGNOSTOS and X-ICM will serve as valuable tools for advancing general-purpose robotic manipulation.
comment: Project Page: https://jiaming-zhou.github.io/AGNOSTOS
AGENTSAFE: Benchmarking the Safety of Embodied Agents on Hazardous Instructions
The integration of vision-language models (VLMs) is driving a new generation of embodied agents capable of operating in human-centered environments. However, as deployment expands, these systems face growing safety risks, particularly when executing hazardous instructions. Current safety evaluation benchmarks remain limited: they cover only narrow scopes of hazards and focus primarily on final outcomes, neglecting the agent's full perception-planning-execution process and thereby obscuring critical failure modes. Therefore, we present SAFE, a benchmark for systematically assessing the safety of embodied VLM agents on hazardous instructions. SAFE comprises three components: SAFE-THOR, an extensible adversarial simulation sandbox with a universal adapter that maps high-level VLM outputs to low-level embodied controls, supporting diverse agent workflow integration; SAFE-VERSE, a risk-aware task suite inspired by Asimov's Three Laws of Robotics, comprising 45 adversarial scenarios, 1,350 hazardous tasks, and 9,900 instructions that span risks to humans, environments, and agents; and SAFE-DIAGNOSE, a multi-level and fine-grained evaluation protocol measuring agent performance across perception, planning, and execution. Applying SAFE to nine state-of-the-art VLMs and two embodied agent workflows, we uncover systematic failures in translating hazard recognition into safe planning and execution. Our findings reveal fundamental limitations in current safety alignment and demonstrate the necessity of a comprehensive, multi-stage evaluation for developing safer embodied intelligence.
ETA-IK: Execution-Time-Aware Inverse Kinematics for Dual-Arm Systems
This paper presents ETA-IK, a novel Execution-Time-Aware Inverse Kinematics method tailored for dual-arm robotic systems. The primary goal is to optimize motion execution time by leveraging the redundancy of both arms, specifically in tasks where only the relative pose of the robots is constrained, such as dual-arm scanning of unknown objects. Unlike traditional inverse kinematics methods that use surrogate metrics such as joint configuration distance, our method incorporates direct motion execution time and implicit collisions into the optimization process, thereby finding target joints that allow subsequent trajectory generation to get more efficient and collision-free motion. A neural network based execution time approximator is employed to predict time-efficient joint configurations while accounting for potential collisions. Through experimental evaluation on a system composed of a UR5 and a KUKA iiwa robot, we demonstrate significant reductions in execution time. The proposed method outperforms conventional approaches, showing improved motion efficiency without sacrificing positioning accuracy. These results highlight the potential of ETA-IK to improve the performance of dual-arm systems in applications, where efficiency and safety are paramount.
MoRe-ERL: Learning Motion Residuals using Episodic Reinforcement Learning
We propose MoRe-ERL, a framework that combines Episodic Reinforcement Learning (ERL) and residual learning, which refines preplanned reference trajectories into safe, feasible, and efficient task-specific trajectories. This framework is general enough to incorporate into arbitrary ERL methods and motion generators seamlessly. MoRe-ERL identifies trajectory segments requiring modification while preserving critical task-related maneuvers. Then it generates smooth residual adjustments using B-Spline-based movement primitives to ensure adaptability to dynamic task contexts and smoothness in trajectory refinement. Experimental results demonstrate that residual learning significantly outperforms training from scratch using ERL methods, achieving superior sample efficiency and task performance. Hardware evaluations further validate the framework, showing that policies trained in simulation can be directly deployed in real-world systems, exhibiting a minimal sim-to-real gap.
Safe Multi-Agent Reinforcement Learning for Behavior-Based Cooperative Navigation
In this paper, we address the problem of behavior-based cooperative navigation of mobile robots using safe multi-agent reinforcement learning~(MARL). Our work is the first to focus on cooperative navigation without individual reference targets for the robots, using a single target for the formation's centroid. This eliminates the complexities involved in having several path planners to control a team of robots. To ensure safety, our MARL framework uses model predictive control (MPC) to prevent actions that could lead to collisions during training and execution. We demonstrate the effectiveness of our method in simulation and on real robots, achieving safe behavior-based cooperative navigation without using individual reference targets, with zero collisions, and faster target reaching compared to baselines. Finally, we study the impact of MPC safety filters on the learning process, revealing that we achieve faster convergence during training and we show that our approach can be safely deployed on real robots, even during early stages of the training.
Multiagent Systems
ReclAIm: A multi-agent framework for degradation-aware performance tuning of medical imaging AI
Ensuring the long-term reliability of AI models in clinical practice requires continuous performance monitoring and corrective actions when degradation occurs. Addressing this need, this manuscript presents ReclAIm, a multi-agent framework capable of autonomously monitoring, evaluating, and fine-tuning medical image classification models. The system, built on a large language model core, operates entirely through natural language interaction, eliminating the need for programming expertise. ReclAIm successfully trains, evaluates, and maintains consistent performance of models across MRI, CT, and X-ray datasets. Once ReclAIm detects significant performance degradation, it autonomously executes state-of-the-art fine-tuning procedures that substantially reduce the performance gap. In cases with performance drops of up to -41.1% (MRI InceptionV3), ReclAIm managed to readjust performance metrics within 1.5% of the initial model results. ReclAIm enables automated, continuous maintenance of medical imaging AI models in a user-friendly and adaptable manner that facilitates broader adoption in both research and clinical environments.
comment: 25 pages, 4 figures
Lark: Biologically Inspired Neuroevolution for Multi-Stakeholder LLM Agents NeurIPS 2025
We present Lark, a biologically inspired decision-making framework that couples LLM-driven reasoning with an evolutionary, stakeholder-aware Multi-Agent System (MAS). To address verbosity and stakeholder trade-offs, we integrate four mechanisms: (i) plasticity, which applies concise adjustments to candidate solutions; (ii) duplication and maturation, which copy high-performing candidates and specialize them into new modules; (iii) ranked-choice stakeholder aggregation using influence-weighted Borda scoring; and (iv) compute awareness via token-based penalties that reward brevity. The system iteratively proposes diverse strategies, applies plasticity tweaks, simulates stakeholder evaluations, aggregates preferences, selects top candidates, and performs duplication/maturation while factoring compute cost into final scores. In a controlled evaluation over 30 rounds comparing 14 systems, Lark Full achieves a mean rank of 2.55 (95% CI [2.17, 2.93]) and a mean composite score of 29.4/50 (95% CI [26.34, 32.46]), finishing Top-3 in 80% of rounds while remaining cost competitive with leading commercial models ($0.016 per task). Paired Wilcoxon tests confirm that all four mechanisms contribute significantly as ablating duplication/maturation yields the largest deficit ({\Delta}Score = 3.5, Cohen's d_z = 2.53, p < 0.001), followed by plasticity ({\Delta}Score = 3.4, d_z = 1.86), ranked-choice voting ({\Delta}Score = 2.4, d_z = 1.20), and token penalties ({\Delta}Score = 2.2, d_z = 1.63). Rather than a formal Markov Decision Process with constrained optimization, Lark is a practical, compute-aware neuroevolutionary loop that scales stakeholder-aligned strategy generation and makes trade-offs transparent through per-step metrics. Our work presents proof-of-concept findings and invites community feedback as we expand toward real-world validation studies.
comment: 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: NeurIPS 2025 Workshop on Efficient Reasoning
DiRAC - Distributed Robot Awareness and Consensus
DiRAC is a scalable, distributed framework designed to enable efficient task assignment and path planning in very large robotic swarms. It introduces a novel zone-partitioned architecture with dynamically elected leaders and a tick-synchronized consensus protocol that yields strong consistency and deterministic outcomes. For path planning, DiRAC uses a novel algorithm, a force-based decentralized planner for real-time collision resolution. Validated within ROS 2 middleware through preliminary simulation, DiRAC demonstrates architectural scalability and modular efficiency in simulated warehouse environments, laying the groundwork for real-world deployment in large-scale industrial and logistics domains.
Surrogate Modeling and Explainable Artificial Intelligence for Complex Systems: A Workflow for Automated Simulation Exploration
Complex systems are increasingly explored through simulation-driven engineering workflows that combine physics-based and empirical models with optimization and analytics. Despite their power, these workflows face two central obstacles: (1) high computational cost, since accurate exploration requires many expensive simulator runs; and (2) limited transparency and reliability when decisions rely on opaque blackbox components. We propose a workflow that addresses both challenges by training lightweight emulators on compact designs of experiments that (i) provide fast, low-latency approximations of expensive simulators, (ii) enable rigorous uncertainty quantification, and (iii) are adapted for global and local Explainable Artificial Intelligence (XAI) analyses. This workflow unifies every simulation-based complex-system analysis tool, ranging from engineering design to agent-based models for socio-environmental understanding. In this paper, we proposea comparative methodology and practical recommendations for using surrogate-based explainability tools within the proposed workflow. The methodology supports continuous and categorical inputs, combines global-effect and uncertainty analyses with local attribution, and evaluates the consistency of explanations across surrogate models, thereby diagnosing surrogate adequacy and guiding further data collection or model refinement. We demonstrate the approach on two contrasting case studies: a multidisciplinary design analysis of a hybrid-electric aircraft and an agent-based model of urban segregation. Results show that the surrogate model and XAI coupling enables large-scale exploration in seconds, uncovers nonlinear interactions and emergent behaviors, identifies key design and policy levers, and signals regions where surrogates require more data or alternative architectures.
TACLA: An LLM-Based Multi-Agent Tool for Transactional Analysis Training in Education ICTAI 2025
Simulating nuanced human social dynamics with Large Language Models (LLMs) remains a significant challenge, particularly in achieving psychological depth and consistent persona behavior crucial for high-fidelity training tools. This paper introduces TACLA (Transactional Analysis Contextual LLM-based Agents), a novel Multi-Agent architecture designed to overcome these limitations. TACLA integrates core principles of Transactional Analysis (TA) by modeling agents as an orchestrated system of distinct Parent, Adult, and Child ego states, each with its own pattern memory. An Orchestrator Agent prioritizes ego state activation based on contextual triggers and an agent's life script, ensuring psychologically authentic responses. Validated in an educational scenario, TACLA demonstrates realistic ego state shifts in Student Agents, effectively modeling conflict de-escalation and escalation based on different teacher intervention strategies. Evaluation shows high conversational credibility and confirms TACLA's capacity to create dynamic, psychologically-grounded social simulations, advancing the development of effective AI tools for education and beyond.
comment: Accepted for publication in the proceedings of ICTAI 2025
A Vision for Access Control in LLM-based Agent Systems
The autonomy and contextual complexity of LLM-based agents render traditional access control (AC) mechanisms insufficient. Static, rule-based systems designed for predictable environments are fundamentally ill-equipped to manage the dynamic information flows inherent in agentic interactions. This position paper argues for a paradigm shift from binary access control to a more sophisticated model of information governance, positing that the core challenge is not merely about permission, but about governing the flow of information. We introduce Agent Access Control (AAC), a novel framework that reframes AC as a dynamic, context-aware process of information flow governance. AAC operates on two core modules: (1) multi-dimensional contextual evaluation, which assesses not just identity but also relationships, scenarios, and norms; and (2) adaptive response formulation, which moves beyond simple allow/deny decisions to shape information through redaction, summarization, and paraphrasing. This vision, powered by a dedicated AC reasoning engine, aims to bridge the gap between human-like nuanced judgment and scalable Al safety, proposing a new conceptual lens for future research in trustworthy agent design.
comment: 11 pages, 1 figure
Co-Alignment: Rethinking Alignment as Bidirectional Human-AI Cognitive Adaptation
Current AI alignment through RLHF follows a single directional paradigm that AI conforms to human preferences while treating human cognition as fixed. We propose a shift to co-alignment through Bidirectional Cognitive Alignment (BiCA), where humans and AI mutually adapt. BiCA uses learnable protocols, representation mapping, and KL-budget constraints for controlled co-evolution. In collaborative navigation, BiCA achieved 85.5% success versus 70.3% baseline, with 230% better mutual adaptation and 332% better protocol convergence. Emergent protocols outperformed handcrafted ones by 84%, while bidirectional adaptation unexpectedly improved safety (+23% out-of-distribution robustness). The 46% synergy improvement demonstrates optimal collaboration exists at the intersection, not union, of human and AI capabilities, validating the shift from single-directional to co-alignment paradigms.
Sequence Modeling for N-Agent Ad Hoc Teamwork
N-agent ad hoc teamwork (NAHT) is a newly introduced challenge in multi-agent reinforcement learning, where controlled subteams of varying sizes must dynamically collaborate with varying numbers and types of unknown teammates without pre-coordination. The existing learning algorithm (POAM) considers only independent learning for its flexibility in dealing with a changing number of agents. However, independent learning fails to fully capture the inter-agent dynamics essential for effective collaboration. Based on our observation that transformers deal effectively with sequences with varying lengths and have been shown to be highly effective for a variety of machine learning problems, this work introduces a centralized, transformer-based method for N-agent ad hoc teamwork. Our proposed approach incorporates historical observations and actions of all controlled agents, enabling optimal responses to diverse and unseen teammates in partially observable environments. Empirical evaluation on a StarCraft II task demonstrates that MAT-NAHT outperforms POAM, achieving superior sample efficiency and generalization, without auxiliary agent-modeling objectives.
comment: Presented at RLDM 2025
Smart Traffic Signals: Comparing MARL and Fixed-Time Strategies
Urban traffic congestion, particularly at intersections, significantly impacts travel time, fuel consumption, and emissions. Traditional fixed-time signal control systems often lack the adaptability to manage dynamic traffic patterns effectively. This study explores the application of multi-agent reinforcement learning (MARL) to optimize traffic signal coordination across multiple intersections within a simulated environment. Utilizing Pygame, a simulation was developed to model a network of interconnected intersections with randomly generated vehicle flows to reflect realistic traffic variability. A decentralized MARL controller was implemented, in which each traffic signal operates as an autonomous agent, making decisions based on local observations and information from neighboring agents. Performance was evaluated against a baseline fixed-time controller using metrics such as average vehicle wait time and overall throughput. The MARL approach demonstrated statistically significant improvements, including reduced average waiting times and improved throughput. These findings suggest that MARL-based dynamic control strategies hold substantial promise for improving urban traffic management efficiency. More research is recommended to address scalability and real-world implementation challenges.
Online automatic code generation for robot swarms: LLMs and self-organizing hierarchy IROS 2025
Our recently introduced self-organizing nervous system (SoNS) provides robot swarms with 1) ease of behavior design and 2) global estimation of the swarm configuration and its collective environment, facilitating the implementation of online automatic code generation for robot swarms. In a demonstration with 6 real robots and simulation trials with >30 robots, we show that when a SoNS-enhanced robot swarm gets stuck, it can automatically solicit and run code generated by an external LLM on the fly, completing its mission with an 85% success rate.
comment: This abstract was accepted to and presented at the "Multi-Agent Cooperative Systems and Swarm Robotics in the Era of Generative AI" (MACRAI) workshop at the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025)
Systems and Control (CS)
Safe Payload Transfer with Ship-Mounted Cranes: A Robust Model Predictive Control Approach
Ensuring safe real-time control of ship-mounted cranes in unstructured transportation environments requires handling multiple safety constraints while maintaining effective payload transfer performance. Unlike traditional crane systems, ship-mounted cranes are consistently subjected to significant external disturbances affecting underactuated crane dynamics due to the ship's dynamic motion response to harsh sea conditions, which can lead to robustness issues. To tackle these challenges, we propose a robust and safe model predictive control (MPC) framework and demonstrate it on a 5-DOF crane system, where a Stewart platform simulates the external disturbances that ocean surface motions would have on the supporting ship. The crane payload transfer operation must avoid obstacles and accurately place the payload within a designated target area. We use a robust zero-order control barrier function (R-ZOCBF)-based safety constraint in the nonlinear MPC to ensure safe payload positioning, while time-varying bounding boxes are utilized for collision avoidance. We introduce a new optimization-based online robustness parameter adaptation scheme to reduce the conservativeness of R-ZOCBFs. Experimental trials on a crane prototype demonstrate the overall performance of our safe control approach under significant perturbing motions of the crane base. While our focus is on crane-facilitated transfer, the methods more generally apply to safe robotically-assisted parts mating and parts insertion.
Ultra High Sensitivity Soil Moisture Detection Using Photonic Crystal Cavity with SIW Technology
Soil nutrients and water content are two crucial factors that significantly affect agricultural production yields. Hence, monitoring and measuring the water content and soil type are critical requirements. This study proposes a two-dimensional structure of photonic crystals centered around a symmetrical cross-shaped slot. The cross-slots act as resonators, and the photonic crystals surrounding the slots tune the resonance frequency of the resonators to enhance mode confinement within the resonator. The various resonant modes are located in the 2.1 GHz, 5.2 GHz, and 8.1 GHz bands, which correspond to the S band, C band, and X band, respectively. These bands are used to compare the absorption, whereas the upper resonant mode is of the order of 20 GHz. Band structure analysis was performed using the Plane Wave Method (PWM). The resonant frequency is computed using a 3D electromagnetic (EM) simulation software that utilizes the Finite Element Method (FEM) and lies in the radiation mode region of the band structure of the photonic crystal. Varying the incident angle had a negligible effect on the absorption characteristics of the sensor, allowing it to produce accurate sensing results regardless of the incident angle. The sensor's sensitivity is maximized using this design, which results in a sensitivity of 85.4 % in the 2.1 GHz resonant frequency, which is much higher than that of a single column of photonic crystal-based SIW, resulting in 50.6 % of sensitivity at 2.1 GHz, at which there is a frequency shift of the order of GHz. In contrast, in the proposed design, the frequency shift is on the order of MHz, resulting in ultra-high sensitivity.
Adaptive Invariant Extended Kalman Filter for Legged Robot State Estimation IROS
State estimation is crucial for legged robots as it directly affects control performance and locomotion stability. In this paper, we propose an Adaptive Invariant Extended Kalman Filter to improve proprioceptive state estimation for legged robots. The proposed method adaptively adjusts the noise level of the contact foot model based on online covariance estimation, leading to improved state estimation under varying contact conditions. It effectively handles small slips that traditional slip rejection fails to address, as overly sensitive slip rejection settings risk causing filter divergence. Our approach employs a contact detection algorithm instead of contact sensors, reducing the reliance on additional hardware. The proposed method is validated through real-world experiments on the quadruped robot LeoQuad, demonstrating enhanced state estimation performance in dynamic locomotion scenarios.
comment: 6 pages, accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025
A Control-Theoretic Approach to Dynamic Payment Routing for Success Rate Optimization
This paper introduces a control-theoretic framework for dynamic payment routing, implemented within JUSPAY's Payment Orchestrator to maximize transaction success rate. The routing system is modeled as a closed-loop feedback controller continuously sensing gateway performance, computing corrective actions, and dynamically routes transactions across gateway to ensure operational resilience. The system leverages concepts from control theory, reinforcement learning, and multi-armed bandit optimization to achieve both short-term responsiveness and long-term stability. Rather than relying on explicit PID regulation, the framework applies generalized feedback-based adaptation, ensuring that corrective actions remain proportional to observed performance deviations and the computed gateway score gradually converges toward the success rate. This hybrid approach unifies control theory and adaptive decision systems, enabling self-regulating transaction routing that dampens instability, and improves reliability. Live production results show an improvement of up to 1.15% in success rate over traditional rule-based routing, demonstrating the effectiveness of feedback-based control in payment systems.
comment: 7 Pages, 8 Figures
Local integral input-to-state stability for non-autonomous infinite-dimensional systems
In this paper, we prove comparison principles for nonlinear differential equations with time-varying coefficients and develop Lyapunov analytical tools for the integral input-to-state stability (iISS) analysis of nonlinear non-autonomous infinite-dimensional systems, which involve nonlinearities satisfying a superlinear growth, {bringing} difficulties to the iISS {analysis.} Specifically, our approach starts by establishing several forms of comparison principles for a wide range of ordinary differential equations having time-varying coefficients and superlinear terms, paving the way to conduct iISS assessment for general nonlinear non-autonomous infinite-dimensional systems within the Lyapunov stability framework. Then, by using the comparison principles, we prove a local {iISS} {(LiISS)} Lyapunov theorem for the nonlinear non-autonomous infinite-dimensional systems in the framework of Banach spaces. {Furthermore,} we provide sufficient conditions of the existence of a local iISS Lyapunonv functional (LiISS-LF) and construct LiISS-LFs for the systems in the framework of Hilbert spaces. Finally, we preset two examples to illustrate the proposed {Lyapunov} method for the LiISS analysis: one is to show how to obtain the LiISS of a nonlinear finite-dimensional system with time-varying coefficients and superlinear terms under linear state feedback control law while another one is to show how to employ the interpolation inequalities to handle superliner terms and establish the LiISS-LF for a class of multi-dimensional parabolic equations with space-time-varying coefficients. To demonstrate the validity of the results, numerical experiments are also conducted to verify the LiISS of these two classes of systems.
Linear State Estimation in Presence of Bounded Uncertainties: A Comparative Analysis
A variety of algorithms have been proposed to address the power system state estimation problem in the presence of uncertainties in the data. However, less emphasis has been given to handling perturbations in the model. In the context of linear state estimation (LSE), which is the focus of this paper, perturbations in the model come from variations in the line parameters. Since the actual values of the line parameters can be different from the values stored in a power utility's database, we investigate three approaches in this paper to estimate the states in the presence of bounded uncertainties in the data and the model. The first approach is based on interval arithmetic, the second is based on convex optimization, and the third is based on generalized linear fractional programming. The three algorithms are applied to multiple IEEE test systems and compared in terms of their speed and accuracy. The results indicate that the first two algorithms are extremely fast and give expected results, while the third suffers from scalability issues and is unsuitable for LSE.
Geometric Control Theory Over Networks: Minimal Node Cardinality Disturbance Decoupling Problems
In this paper we show how to formulate and solve disturbance decoupling problems over networks while choosing a minimal number of input and output nodes. Feedback laws that isolate and eliminate the impact of disturbance nodes on specific target nodes to be protected are provided using state, output, and dynamical feedback. For that, we leverage the fact that when reformulated in terms of sets of nodes rather than subspaces, the controlled and conditional invariance properties admit a simple graphical interpretation. For state and dynamical feedback, the minimal input and output cardinality solutions can be computed exactly in polynomial time, via min-cut/max-flow algorithms.
CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier Functions
Reinforcement learning (RL), while powerful and expressive, can often prioritize performance at the expense of safety. Yet safety violations can lead to catastrophic outcomes in real-world deployments. Control Barrier Functions (CBFs) offer a principled method to enforce dynamic safety -- traditionally deployed online via safety filters. While the result is safe behavior, the fact that the RL policy does not have knowledge of the CBF can lead to conservative behaviors. This paper proposes CBF-RL, a framework for generating safe behaviors with RL by enforcing CBFs in training. CBF-RL has two key attributes: (1) minimally modifying a nominal RL policy to encode safety constraints via a CBF term, (2) and safety filtering of the policy rollouts in training. Theoretically, we prove that continuous-time safety filters can be deployed via closed-form expressions on discrete-time roll-outs. Practically, we demonstrate that CBF-RL internalizes the safety constraints in the learned policy -- both enforcing safer actions and biasing towards safer rewards -- enabling safe deployment without the need for an online safety filter. We validate our framework through ablation studies on navigation tasks and on the Unitree G1 humanoid robot, where CBF-RL enables safer exploration, faster convergence, and robust performance under uncertainty, enabling the humanoid robot to avoid obstacles and climb stairs safely in real-world settings without a runtime safety filter.
comment: 8 pages
Architecture Is All You Need: Diversity-Enabled Sweet Spots for Robust Humanoid Locomotion
Robust humanoid locomotion in unstructured environments requires architectures that balance fast low-level stabilization with slower perceptual decision-making. We show that a simple layered control architecture (LCA), a proprioceptive stabilizer running at high rate, coupled with a compact low-rate perceptual policy, enables substantially more robust performance than monolithic end-to-end designs, even when using minimal perception encoders. Through a two-stage training curriculum (blind stabilizer pretraining followed by perceptual fine-tuning), we demonstrate that layered policies consistently outperform one-stage alternatives in both simulation and hardware. On a Unitree G1 humanoid, our approach succeeds across stair and ledge tasks where one-stage perceptual policies fail. These results highlight that architectural separation of timescales, rather than network scale or complexity, is the key enabler for robust perception-conditioned locomotion.
comment: 8 pages
Subgradient Method for System Identification with Non-Smooth Objectives
This paper investigates a subgradient-based algorithm to solve the system identification problem for linear time-invariant systems with non-smooth objectives. This is essential for robust system identification in safety-critical applications. While existing work provides theoretical exact recovery guarantees using optimization solvers, the design of fast learning algorithms with convergence guarantees for practical use remains unexplored. We analyze the subgradient method in this setting, where the optimization problems to be solved evolve over time as new measurements are collected, and we establish linear convergence to the ground-truth system for both the best and Polyak step sizes after a burn-in period. We further characterize sublinear convergence of the iterates under constant and diminishing step sizes, which require only minimal information and thus offer broad applicability. Finally, we compare the time complexity of standard solvers with the subgradient algorithm and support our findings with experimental results. This is the first work to analyze subgradient algorithms for system identification with non-smooth objectives.
comment: 20 pages, 2 figures
Performance Analysis of Underwater Optical Wireless Communication Using O-RIS and Fiber Optic Backhaul (Extended version)
This Letter presents a novel hybrid underwater wireless optical communication (UWOC) system that integrates underwater optical access points (UOAPs) with a passive optical network (PON)-based fiber-optic backhaul to provide a resilient backbone. A hard switching mechanism is employed between direct and optical reconfigurable intelligent surface (O-RIS)-assisted links to ensure reliable connectivity. Unlike previous studies, the proposed system is evaluated under both active and multiple passive O-RIS configurations. To enhance reliability, the Selection Combining (SC) and Maximal Ratio Combining (MRC) schemes are applied. Analytical and simulation results demonstrate that optimal O-RIS placement significantly enhances system performance. However, in the linear regime, placing it too close to the receiver causes degradation due to increased path loss and beam jitter in an identical water type. Moreover, increasing the number of O-RIS elements within practical limits further improves overall system performance and enhances adaptability to variations in the underwater channel.
comment: This is version 3 (v3) of the manuscript with further improvements and refinements
Interacting Particle Systems for Fast Linear Quadratic RL
This paper is concerned with the design of algorithms based on systems of interacting particles to represent, approximate, and learn the optimal control law for reinforcement learning (RL). The primary contribution is that convergence rates are greatly accelerated by the interactions between particles. Theory focuses on the linear quadratic stochastic optimal control problem for which a complete and novel theory is presented. Apart from the new algorithm, sample complexity bounds are obtained, and it is shown that the mean square error scales as $1/N$ where $N$ is the number of particles. The theoretical results and algorithms are illustrated with numerical experiments and comparisons with other recent approaches, where the faster convergence of the proposed algorithm is numerically demonstrated.
A Volumetric Privacy Measure for Dynamical Systems With Bounded Disturbance
This paper presents a volumetric privacy framework for dynamical systems subject to bounded disturbances, developed without requiring prior knowledge of their probability distributions. We consider systems with both public and private states, where a set containing the public state is shared as the observation. An adversary is assumed to execute an inference attack by exploiting the observed public state set to estimate an uncertainty set for the private state. The volume of this inferred set quantifies the adversary's estimation uncertainty and serves as the proposed volumetric privacy metric. Approximate set-membership estimation techniques are developed to compute the private-state uncertainty set, and the properties of the privacy measure are analyzed, demonstrating that it is bounded by the information gain from the observation set. Furthermore, an optimization-based privacy filter design problem is formulated, employing randomization and linear programming to enhance the volumetric privacy level. The effectiveness of the proposed approach is validated through a production-inventory case study. Results show that the optimal privacy filter significantly improves robustness against inference attacks and outperforms two baseline mechanisms based on additive noise and quantization.
Nash equilibrium seeking in coalition games for multiple Euler-Lagrange systems: Analysis and application to USV swarm confrontation
This paper addresses a class of Nash equilibrium (NE) seeking problems in coalition games involving both local and coupling constraints for multiple Euler-Lagrange (EL) systems subject to disturbances of unknown bounds. Within each coalition, agents cooperatively minimize a shared cost function while competing against other coalitions. A distributed strategy is proposed to seek the NE under informational constraints, where each agent has access only to its own action, cost function, and constraint parameters. In the proposed distributed NE seeking strategy, adaptive techniques are combined with sign functions to handle model uncertainties and disturbances with unknown bounds in the EL systems. To deal with the Lagrange multipliers associated with local and coupling constraints, primal-dual techniques are integrated with consensus protocols. Additionally, a dynamic average consensus algorithm is employed to estimate the gradient of the coalition cost function, while a leader-following protocol is utilized to estimate the actions of other agents. Under standard convexity and graph-connectivity assumptions, global convergence of the closed-loop EL system to the NE is established. As an illustrative application, a swarm confrontation of unmanned surface vehicles involving formation, encirclement, and interception tasks is modeled within the coalition game framework, and numerical simulations are conducted under this model to validate the theoretical results.
Transfer Learning-Enabled Efficient Raman Pump Tuning under Dynamic Launch Power for C+L Band Transmission
We propose a transfer learning-enabled Transformer framework to simultaneously realize accurate modeling and Raman pump design in C+L-band systems. The RMSE for modeling and peak-to-peak GSNR variation/deviation is within 0.22 dB and 0.86/0.1 dB, respectively.
comment: There are some rather serious problems in this paper
FIRE: A Failure-Adaptive Reinforcement Learning Framework for Edge Computing Migrations
In edge computing, users' service profiles are migrated due to user mobility. Reinforcement learning (RL) frameworks have been proposed to do so, often trained on simulated data. However, existing RL frameworks overlook occasional server failures, which although rare, impact latency-sensitive applications like autonomous driving and real-time obstacle detection. Nevertheless, these failures (rare events), being not adequately represented in historical training data, pose a challenge for data-driven RL algorithms. As it is impractical to adjust failure frequency in real-world applications for training, we introduce FIRE, a framework that adapts to rare events by training a RL policy in an edge computing digital twin environment. We propose ImRE, an importance sampling-based Q-learning algorithm, which samples rare events proportionally to their impact on the value function. FIRE considers delay, migration, failure, and backup placement costs across individual and shared service profiles. We prove ImRE's boundedness and convergence to optimality. Next, we introduce novel deep Q-learning (ImDQL) and actor critic (ImACRE) versions of our algorithm to enhance scalability. We extend our framework to accommodate users with varying risk tolerances. Through trace driven experiments, we show that FIRE reduces costs compared to vanilla RL and the greedy baseline in the event of failures.
comment: Accepted at IEEE Transactions on Services Computing
Systems and Control (EESS)
Safe Payload Transfer with Ship-Mounted Cranes: A Robust Model Predictive Control Approach
Ensuring safe real-time control of ship-mounted cranes in unstructured transportation environments requires handling multiple safety constraints while maintaining effective payload transfer performance. Unlike traditional crane systems, ship-mounted cranes are consistently subjected to significant external disturbances affecting underactuated crane dynamics due to the ship's dynamic motion response to harsh sea conditions, which can lead to robustness issues. To tackle these challenges, we propose a robust and safe model predictive control (MPC) framework and demonstrate it on a 5-DOF crane system, where a Stewart platform simulates the external disturbances that ocean surface motions would have on the supporting ship. The crane payload transfer operation must avoid obstacles and accurately place the payload within a designated target area. We use a robust zero-order control barrier function (R-ZOCBF)-based safety constraint in the nonlinear MPC to ensure safe payload positioning, while time-varying bounding boxes are utilized for collision avoidance. We introduce a new optimization-based online robustness parameter adaptation scheme to reduce the conservativeness of R-ZOCBFs. Experimental trials on a crane prototype demonstrate the overall performance of our safe control approach under significant perturbing motions of the crane base. While our focus is on crane-facilitated transfer, the methods more generally apply to safe robotically-assisted parts mating and parts insertion.
Ultra High Sensitivity Soil Moisture Detection Using Photonic Crystal Cavity with SIW Technology
Soil nutrients and water content are two crucial factors that significantly affect agricultural production yields. Hence, monitoring and measuring the water content and soil type are critical requirements. This study proposes a two-dimensional structure of photonic crystals centered around a symmetrical cross-shaped slot. The cross-slots act as resonators, and the photonic crystals surrounding the slots tune the resonance frequency of the resonators to enhance mode confinement within the resonator. The various resonant modes are located in the 2.1 GHz, 5.2 GHz, and 8.1 GHz bands, which correspond to the S band, C band, and X band, respectively. These bands are used to compare the absorption, whereas the upper resonant mode is of the order of 20 GHz. Band structure analysis was performed using the Plane Wave Method (PWM). The resonant frequency is computed using a 3D electromagnetic (EM) simulation software that utilizes the Finite Element Method (FEM) and lies in the radiation mode region of the band structure of the photonic crystal. Varying the incident angle had a negligible effect on the absorption characteristics of the sensor, allowing it to produce accurate sensing results regardless of the incident angle. The sensor's sensitivity is maximized using this design, which results in a sensitivity of 85.4 % in the 2.1 GHz resonant frequency, which is much higher than that of a single column of photonic crystal-based SIW, resulting in 50.6 % of sensitivity at 2.1 GHz, at which there is a frequency shift of the order of GHz. In contrast, in the proposed design, the frequency shift is on the order of MHz, resulting in ultra-high sensitivity.
Adaptive Invariant Extended Kalman Filter for Legged Robot State Estimation IROS
State estimation is crucial for legged robots as it directly affects control performance and locomotion stability. In this paper, we propose an Adaptive Invariant Extended Kalman Filter to improve proprioceptive state estimation for legged robots. The proposed method adaptively adjusts the noise level of the contact foot model based on online covariance estimation, leading to improved state estimation under varying contact conditions. It effectively handles small slips that traditional slip rejection fails to address, as overly sensitive slip rejection settings risk causing filter divergence. Our approach employs a contact detection algorithm instead of contact sensors, reducing the reliance on additional hardware. The proposed method is validated through real-world experiments on the quadruped robot LeoQuad, demonstrating enhanced state estimation performance in dynamic locomotion scenarios.
comment: 6 pages, accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025
A Control-Theoretic Approach to Dynamic Payment Routing for Success Rate Optimization
This paper introduces a control-theoretic framework for dynamic payment routing, implemented within JUSPAY's Payment Orchestrator to maximize transaction success rate. The routing system is modeled as a closed-loop feedback controller continuously sensing gateway performance, computing corrective actions, and dynamically routes transactions across gateway to ensure operational resilience. The system leverages concepts from control theory, reinforcement learning, and multi-armed bandit optimization to achieve both short-term responsiveness and long-term stability. Rather than relying on explicit PID regulation, the framework applies generalized feedback-based adaptation, ensuring that corrective actions remain proportional to observed performance deviations and the computed gateway score gradually converges toward the success rate. This hybrid approach unifies control theory and adaptive decision systems, enabling self-regulating transaction routing that dampens instability, and improves reliability. Live production results show an improvement of up to 1.15% in success rate over traditional rule-based routing, demonstrating the effectiveness of feedback-based control in payment systems.
comment: 7 Pages, 8 Figures
Local integral input-to-state stability for non-autonomous infinite-dimensional systems
In this paper, we prove comparison principles for nonlinear differential equations with time-varying coefficients and develop Lyapunov analytical tools for the integral input-to-state stability (iISS) analysis of nonlinear non-autonomous infinite-dimensional systems, which involve nonlinearities satisfying a superlinear growth, {bringing} difficulties to the iISS {analysis.} Specifically, our approach starts by establishing several forms of comparison principles for a wide range of ordinary differential equations having time-varying coefficients and superlinear terms, paving the way to conduct iISS assessment for general nonlinear non-autonomous infinite-dimensional systems within the Lyapunov stability framework. Then, by using the comparison principles, we prove a local {iISS} {(LiISS)} Lyapunov theorem for the nonlinear non-autonomous infinite-dimensional systems in the framework of Banach spaces. {Furthermore,} we provide sufficient conditions of the existence of a local iISS Lyapunonv functional (LiISS-LF) and construct LiISS-LFs for the systems in the framework of Hilbert spaces. Finally, we preset two examples to illustrate the proposed {Lyapunov} method for the LiISS analysis: one is to show how to obtain the LiISS of a nonlinear finite-dimensional system with time-varying coefficients and superlinear terms under linear state feedback control law while another one is to show how to employ the interpolation inequalities to handle superliner terms and establish the LiISS-LF for a class of multi-dimensional parabolic equations with space-time-varying coefficients. To demonstrate the validity of the results, numerical experiments are also conducted to verify the LiISS of these two classes of systems.
Linear State Estimation in Presence of Bounded Uncertainties: A Comparative Analysis
A variety of algorithms have been proposed to address the power system state estimation problem in the presence of uncertainties in the data. However, less emphasis has been given to handling perturbations in the model. In the context of linear state estimation (LSE), which is the focus of this paper, perturbations in the model come from variations in the line parameters. Since the actual values of the line parameters can be different from the values stored in a power utility's database, we investigate three approaches in this paper to estimate the states in the presence of bounded uncertainties in the data and the model. The first approach is based on interval arithmetic, the second is based on convex optimization, and the third is based on generalized linear fractional programming. The three algorithms are applied to multiple IEEE test systems and compared in terms of their speed and accuracy. The results indicate that the first two algorithms are extremely fast and give expected results, while the third suffers from scalability issues and is unsuitable for LSE.
Geometric Control Theory Over Networks: Minimal Node Cardinality Disturbance Decoupling Problems
In this paper we show how to formulate and solve disturbance decoupling problems over networks while choosing a minimal number of input and output nodes. Feedback laws that isolate and eliminate the impact of disturbance nodes on specific target nodes to be protected are provided using state, output, and dynamical feedback. For that, we leverage the fact that when reformulated in terms of sets of nodes rather than subspaces, the controlled and conditional invariance properties admit a simple graphical interpretation. For state and dynamical feedback, the minimal input and output cardinality solutions can be computed exactly in polynomial time, via min-cut/max-flow algorithms.
CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier Functions
Reinforcement learning (RL), while powerful and expressive, can often prioritize performance at the expense of safety. Yet safety violations can lead to catastrophic outcomes in real-world deployments. Control Barrier Functions (CBFs) offer a principled method to enforce dynamic safety -- traditionally deployed online via safety filters. While the result is safe behavior, the fact that the RL policy does not have knowledge of the CBF can lead to conservative behaviors. This paper proposes CBF-RL, a framework for generating safe behaviors with RL by enforcing CBFs in training. CBF-RL has two key attributes: (1) minimally modifying a nominal RL policy to encode safety constraints via a CBF term, (2) and safety filtering of the policy rollouts in training. Theoretically, we prove that continuous-time safety filters can be deployed via closed-form expressions on discrete-time roll-outs. Practically, we demonstrate that CBF-RL internalizes the safety constraints in the learned policy -- both enforcing safer actions and biasing towards safer rewards -- enabling safe deployment without the need for an online safety filter. We validate our framework through ablation studies on navigation tasks and on the Unitree G1 humanoid robot, where CBF-RL enables safer exploration, faster convergence, and robust performance under uncertainty, enabling the humanoid robot to avoid obstacles and climb stairs safely in real-world settings without a runtime safety filter.
comment: 8 pages
Architecture Is All You Need: Diversity-Enabled Sweet Spots for Robust Humanoid Locomotion
Robust humanoid locomotion in unstructured environments requires architectures that balance fast low-level stabilization with slower perceptual decision-making. We show that a simple layered control architecture (LCA), a proprioceptive stabilizer running at high rate, coupled with a compact low-rate perceptual policy, enables substantially more robust performance than monolithic end-to-end designs, even when using minimal perception encoders. Through a two-stage training curriculum (blind stabilizer pretraining followed by perceptual fine-tuning), we demonstrate that layered policies consistently outperform one-stage alternatives in both simulation and hardware. On a Unitree G1 humanoid, our approach succeeds across stair and ledge tasks where one-stage perceptual policies fail. These results highlight that architectural separation of timescales, rather than network scale or complexity, is the key enabler for robust perception-conditioned locomotion.
comment: 8 pages
Subgradient Method for System Identification with Non-Smooth Objectives
This paper investigates a subgradient-based algorithm to solve the system identification problem for linear time-invariant systems with non-smooth objectives. This is essential for robust system identification in safety-critical applications. While existing work provides theoretical exact recovery guarantees using optimization solvers, the design of fast learning algorithms with convergence guarantees for practical use remains unexplored. We analyze the subgradient method in this setting, where the optimization problems to be solved evolve over time as new measurements are collected, and we establish linear convergence to the ground-truth system for both the best and Polyak step sizes after a burn-in period. We further characterize sublinear convergence of the iterates under constant and diminishing step sizes, which require only minimal information and thus offer broad applicability. Finally, we compare the time complexity of standard solvers with the subgradient algorithm and support our findings with experimental results. This is the first work to analyze subgradient algorithms for system identification with non-smooth objectives.
comment: 20 pages, 2 figures
Performance Analysis of Underwater Optical Wireless Communication Using O-RIS and Fiber Optic Backhaul (Extended version)
This Letter presents a novel hybrid underwater wireless optical communication (UWOC) system that integrates underwater optical access points (UOAPs) with a passive optical network (PON)-based fiber-optic backhaul to provide a resilient backbone. A hard switching mechanism is employed between direct and optical reconfigurable intelligent surface (O-RIS)-assisted links to ensure reliable connectivity. Unlike previous studies, the proposed system is evaluated under both active and multiple passive O-RIS configurations. To enhance reliability, the Selection Combining (SC) and Maximal Ratio Combining (MRC) schemes are applied. Analytical and simulation results demonstrate that optimal O-RIS placement significantly enhances system performance. However, in the linear regime, placing it too close to the receiver causes degradation due to increased path loss and beam jitter in an identical water type. Moreover, increasing the number of O-RIS elements within practical limits further improves overall system performance and enhances adaptability to variations in the underwater channel.
comment: This is version 3 (v3) of the manuscript with further improvements and refinements
Interacting Particle Systems for Fast Linear Quadratic RL
This paper is concerned with the design of algorithms based on systems of interacting particles to represent, approximate, and learn the optimal control law for reinforcement learning (RL). The primary contribution is that convergence rates are greatly accelerated by the interactions between particles. Theory focuses on the linear quadratic stochastic optimal control problem for which a complete and novel theory is presented. Apart from the new algorithm, sample complexity bounds are obtained, and it is shown that the mean square error scales as $1/N$ where $N$ is the number of particles. The theoretical results and algorithms are illustrated with numerical experiments and comparisons with other recent approaches, where the faster convergence of the proposed algorithm is numerically demonstrated.
A Volumetric Privacy Measure for Dynamical Systems With Bounded Disturbance
This paper presents a volumetric privacy framework for dynamical systems subject to bounded disturbances, developed without requiring prior knowledge of their probability distributions. We consider systems with both public and private states, where a set containing the public state is shared as the observation. An adversary is assumed to execute an inference attack by exploiting the observed public state set to estimate an uncertainty set for the private state. The volume of this inferred set quantifies the adversary's estimation uncertainty and serves as the proposed volumetric privacy metric. Approximate set-membership estimation techniques are developed to compute the private-state uncertainty set, and the properties of the privacy measure are analyzed, demonstrating that it is bounded by the information gain from the observation set. Furthermore, an optimization-based privacy filter design problem is formulated, employing randomization and linear programming to enhance the volumetric privacy level. The effectiveness of the proposed approach is validated through a production-inventory case study. Results show that the optimal privacy filter significantly improves robustness against inference attacks and outperforms two baseline mechanisms based on additive noise and quantization.
Nash equilibrium seeking in coalition games for multiple Euler-Lagrange systems: Analysis and application to USV swarm confrontation
This paper addresses a class of Nash equilibrium (NE) seeking problems in coalition games involving both local and coupling constraints for multiple Euler-Lagrange (EL) systems subject to disturbances of unknown bounds. Within each coalition, agents cooperatively minimize a shared cost function while competing against other coalitions. A distributed strategy is proposed to seek the NE under informational constraints, where each agent has access only to its own action, cost function, and constraint parameters. In the proposed distributed NE seeking strategy, adaptive techniques are combined with sign functions to handle model uncertainties and disturbances with unknown bounds in the EL systems. To deal with the Lagrange multipliers associated with local and coupling constraints, primal-dual techniques are integrated with consensus protocols. Additionally, a dynamic average consensus algorithm is employed to estimate the gradient of the coalition cost function, while a leader-following protocol is utilized to estimate the actions of other agents. Under standard convexity and graph-connectivity assumptions, global convergence of the closed-loop EL system to the NE is established. As an illustrative application, a swarm confrontation of unmanned surface vehicles involving formation, encirclement, and interception tasks is modeled within the coalition game framework, and numerical simulations are conducted under this model to validate the theoretical results.
Transfer Learning-Enabled Efficient Raman Pump Tuning under Dynamic Launch Power for C+L Band Transmission
We propose a transfer learning-enabled Transformer framework to simultaneously realize accurate modeling and Raman pump design in C+L-band systems. The RMSE for modeling and peak-to-peak GSNR variation/deviation is within 0.22 dB and 0.86/0.1 dB, respectively.
comment: There are some rather serious problems in this paper
FIRE: A Failure-Adaptive Reinforcement Learning Framework for Edge Computing Migrations
In edge computing, users' service profiles are migrated due to user mobility. Reinforcement learning (RL) frameworks have been proposed to do so, often trained on simulated data. However, existing RL frameworks overlook occasional server failures, which although rare, impact latency-sensitive applications like autonomous driving and real-time obstacle detection. Nevertheless, these failures (rare events), being not adequately represented in historical training data, pose a challenge for data-driven RL algorithms. As it is impractical to adjust failure frequency in real-world applications for training, we introduce FIRE, a framework that adapts to rare events by training a RL policy in an edge computing digital twin environment. We propose ImRE, an importance sampling-based Q-learning algorithm, which samples rare events proportionally to their impact on the value function. FIRE considers delay, migration, failure, and backup placement costs across individual and shared service profiles. We prove ImRE's boundedness and convergence to optimality. Next, we introduce novel deep Q-learning (ImDQL) and actor critic (ImACRE) versions of our algorithm to enhance scalability. We extend our framework to accommodate users with varying risk tolerances. Through trace driven experiments, we show that FIRE reduces costs compared to vanilla RL and the greedy baseline in the event of failures.
comment: Accepted at IEEE Transactions on Services Computing
Robotics
Structured Interfaces for Automated Reasoning with 3D Scene Graphs
In order to provide a robot with the ability to understand and react to a user's natural language inputs, the natural language must be connected to the robot's underlying representations of the world. Recently, large language models (LLMs) and 3D scene graphs (3DSGs) have become a popular choice for grounding natural language and representing the world. In this work, we address the challenge of using LLMs with 3DSGs to ground natural language. Existing methods encode the scene graph as serialized text within the LLM's context window, but this encoding does not scale to large or rich 3DSGs. Instead, we propose to use a form of Retrieval Augmented Generation to select a subset of the 3DSG relevant to the task. We encode a 3DSG in a graph database and provide a query language interface (Cypher) as a tool to the LLM with which it can retrieve relevant data for language grounding. We evaluate our approach on instruction following and scene question-answering tasks and compare against baseline context window and code generation methods. Our results show that using Cypher as an interface to 3D scene graphs scales significantly better to large, rich graphs on both local and cloud-based models. This leads to large performance improvements in grounded language tasks while also substantially reducing the token count of the scene graph content. A video supplement is available at https://www.youtube.com/watch?v=zY_YI9giZSA.
comment: 25 pages, 3 figures
Self-Supervised Learning to Fly using Efficient Semantic Segmentation and Metric Depth Estimation for Low-Cost Autonomous UAVs
This paper presents a vision-only autonomous flight system for small UAVs operating in controlled indoor environments. The system combines semantic segmentation with monocular depth estimation to enable obstacle avoidance, scene exploration, and autonomous safe landing operations without requiring GPS or expensive sensors such as LiDAR. A key innovation is an adaptive scale factor algorithm that converts non-metric monocular depth predictions into accurate metric distance measurements by leveraging semantic ground plane detection and camera intrinsic parameters, achieving a mean distance error of 14.4 cm. The approach uses a knowledge distillation framework where a color-based Support Vector Machine (SVM) teacher generates training data for a lightweight U-Net student network (1.6M parameters) capable of real-time semantic segmentation. For more complex environments, the SVM teacher can be replaced with a state-of-the-art segmentation model. Testing was conducted in a controlled 5x4 meter laboratory environment with eight cardboard obstacles simulating urban structures. Extensive validation across 30 flight tests in a real-world environment and 100 flight tests in a digital-twin environment demonstrates that the combined segmentation and depth approach increases the distance traveled during surveillance and reduces mission time while maintaining 100% success rates. The system is further optimized through end-to-end learning, where a compact student neural network learns complete flight policies from demonstration data generated by our best-performing method, achieving an 87.5% autonomous mission success rate. This work advances practical vision-based drone navigation in structured environments, demonstrating solutions for metric depth estimation and computational efficiency challenges that enable deployment on resource-constrained platforms.
MoS-VLA: A Vision-Language-Action Model with One-Shot Skill Adaptation
Vision-Language-Action (VLA) models trained on large robot datasets promise general-purpose, robust control across diverse domains and embodiments. However, existing approaches often fail out-of-the-box when deployed in novel environments, embodiments, or tasks. We introduce Mixture of Skills VLA (MoS-VLA), a framework that represents robot manipulation policies as linear combinations of a finite set of learned basis functions. During pretraining, MoS-VLA jointly learns these basis functions across datasets from the Open X-Embodiment project, producing a structured skill space. At test time, adapting to a new task requires only a single expert demonstration. The corresponding skill representation is then inferred via a lightweight convex optimization problem that minimizes the L1 action error, without requiring gradient updates. This gradient-free adaptation incurs minimal overhead while enabling rapid instantiation of new skills. Empirically, MoS-VLA achieves lower action-prediction error on five out of five unseen datasets and succeeds in both simulation and real-robot tasks where a pretrained VLA model fails outright. Project page: mos-vla.github.io/
Semi-Peaucellier Linkage and Differential Mechanism for Linear Pinching and Self-Adaptive Grasping
This paper presents the SP-Diff parallel gripper system, addressing the limited adaptability of conventional end-effectors in intelligent industrial automation. The proposed design employs an innovative differential linkage mechanism with a modular symmetric dual-finger configuration to achieve linear-parallel grasping. By integrating a planetary gear transmission, the system enables synchronized linear motion and independent finger pose adjustment while maintaining structural rigidity, reducing Z-axis recalibration requirements by 30% compared to arc-trajectory grippers. The compact palm architecture incorporates a kinematically optimized parallelogram linkage and Differential mechanism, demonstrating adaptive grasping capabilities for diverse industrial workpieces and deformable objects such as citrus fruits. Future-ready interfaces are embedded for potential force/vision sensor integration to facilitate multimodal data acquisition (e.g., trajectory planning and object deformation) in digital twin frameworks. Designed as a flexible manufacturing solution, SP-Diff advances robotic end-effector intelligence through its adaptive architecture, showing promising applications in collaborative robotics, logistics automation, and specialized operational scenarios.
comment: 6 pages, 9 figures, Accepted author manuscript for IEEE CASE 2025
DIV-Nav: Open-Vocabulary Spatial Relationships for Multi-Object Navigation
Advances in open-vocabulary semantic mapping and object navigation have enabled robots to perform an informed search of their environment for an arbitrary object. However, such zero-shot object navigation is typically designed for simple queries with an object name like "television" or "blue rug". Here, we consider more complex free-text queries with spatial relationships, such as "find the remote on the table" while still leveraging robustness of a semantic map. We present DIV-Nav, a real-time navigation system that efficiently addresses this problem through a series of relaxations: i) Decomposing natural language instructions with complex spatial constraints into simpler object-level queries on a semantic map, ii) computing the Intersection of individual semantic belief maps to identify regions where all objects co-exist, and iii) Validating the discovered objects against the original, complex spatial constrains via a LVLM. We further investigate how to adapt the frontier exploration objectives of online semantic mapping to such spatial search queries to more effectively guide the search process. We validate our system through extensive experiments on the MultiON benchmark and real-world deployment on a Boston Dynamics Spot robot using a Jetson Orin AGX. More details and videos are available at https://anonsub42.github.io/reponame/
A Novel Gripper with Semi-Peaucellier Linkage and Idle-Stroke Mechanism for Linear Pinching and Self-Adaptive Grasping IROS 2025
This paper introduces a novel robotic gripper, named as the SPD gripper. It features a palm and two mechanically identical and symmetrically arranged fingers, which can be driven independently or by a single motor. The fingertips of the fingers follow a linear motion trajectory, facilitating the grasping of objects of various sizes on a tabletop without the need to adjust the overall height of the gripper. Traditional industrial grippers with parallel gripping capabilities often exhibit an arcuate motion at the fingertips, requiring the entire robotic arm to adjust its height to avoid collisions with the tabletop. The SPD gripper, with its linear parallel gripping mechanism, effectively addresses this issue. Furthermore, the SPD gripper possesses adaptive capabilities, accommodating objects of different shapes and sizes. This paper presents the design philosophy, fundamental composition principles, and optimization analysis theory of the SPD gripper. Based on the design theory, a robotic gripper prototype was developed and tested. The experimental results demonstrate that the robotic gripper successfully achieves linear parallel gripping functionality and exhibits good adaptability. In the context of the ongoing development of embodied intelligence technologies, this robotic gripper can assist various robots in achieving effective grasping, laying a solid foundation for collecting data to enhance deep learning training.
comment: Accepted author manuscript (AAM) for IEEE/RSJ IROS 2025. 6 pages, 10 figures
Advancing Off-Road Autonomous Driving: The Large-Scale ORAD-3D Dataset and Comprehensive Benchmarks
A major bottleneck in off-road autonomous driving research lies in the scarcity of large-scale, high-quality datasets and benchmarks. To bridge this gap, we present ORAD-3D, which, to the best of our knowledge, is the largest dataset specifically curated for off-road autonomous driving. ORAD-3D covers a wide spectrum of terrains, including woodlands, farmlands, grasslands, riversides, gravel roads, cement roads, and rural areas, while capturing diverse environmental variations across weather conditions (sunny, rainy, foggy, and snowy) and illumination levels (bright daylight, daytime, twilight, and nighttime). Building upon this dataset, we establish a comprehensive suite of benchmark evaluations spanning five fundamental tasks: 2D free-space detection, 3D occupancy prediction, rough GPS-guided path planning, vision-language model-driven autonomous driving, and world model for off-road environments. Together, the dataset and benchmarks provide a unified and robust resource for advancing perception and planning in challenging off-road scenarios. The dataset and code will be made publicly available at https://github.com/chaytonmin/ORAD-3D.
comment: Off-road robotics
NavQ: Learning a Q-Model for Foresighted Vision-and-Language Navigation ICCV 2025
In this work we concentrate on the task of goal-oriented Vision-and-Language Navigation (VLN). Existing methods often make decisions based on historical information, overlooking the future implications and long-term outcomes of the actions. In contrast, we aim to develop a foresighted agent. Specifically, we draw upon Q-learning to train a Q-model using large-scale unlabeled trajectory data, in order to learn the general knowledge regarding the layout and object relations within indoor scenes. This model can generate a Q-feature, analogous to the Q-value in traditional Q-network, for each candidate action, which describes the potential future information that may be observed after taking the specific action. Subsequently, a cross-modal future encoder integrates the task-agnostic Q-feature with navigation instructions to produce a set of action scores reflecting future prospects. These scores, when combined with the original scores based on history, facilitate an A*-style searching strategy to effectively explore the regions that are more likely to lead to the destination. Extensive experiments conducted on widely used goal-oriented VLN datasets validate the effectiveness of the proposed method.
comment: ICCV 2025
RefAtomNet++: Advancing Referring Atomic Video Action Recognition using Semantic Retrieval based Multi-Trajectory Mamba ECCV 2024
Referring Atomic Video Action Recognition (RAVAR) aims to recognize fine-grained, atomic-level actions of a specific person of interest conditioned on natural language descriptions. Distinct from conventional action recognition and detection tasks, RAVAR emphasizes precise language-guided action understanding, which is particularly critical for interactive human action analysis in complex multi-person scenarios. In this work, we extend our previously introduced RefAVA dataset to RefAVA++, which comprises >2.9 million frames and >75.1k annotated persons in total. We benchmark this dataset using baselines from multiple related domains, including atomic action localization, video question answering, and text-video retrieval, as well as our earlier model, RefAtomNet. Although RefAtomNet surpasses other baselines by incorporating agent attention to highlight salient features, its ability to align and retrieve cross-modal information remains limited, leading to suboptimal performance in localizing the target person and predicting fine-grained actions. To overcome the aforementioned limitations, we introduce RefAtomNet++, a novel framework that advances cross-modal token aggregation through a multi-hierarchical semantic-aligned cross-attention mechanism combined with multi-trajectory Mamba modeling at the partial-keyword, scene-attribute, and holistic-sentence levels. In particular, scanning trajectories are constructed by dynamically selecting the nearest visual spatial tokens at each timestep for both partial-keyword and scene-attribute levels. Moreover, we design a multi-hierarchical semantic-aligned cross-attention strategy, enabling more effective aggregation of spatial and temporal tokens across different semantic hierarchies. Experiments show that RefAtomNet++ establishes new state-of-the-art results. The dataset and code are released at https://github.com/KPeng9510/refAVA2.
comment: Extended version of ECCV 2024 paper arXiv:2407.01872. The dataset and code are released at https://github.com/KPeng9510/refAVA2
What Questions Should Robots Be Able to Answer? A Dataset of User Questions for Explainable Robotics
With the growing use of large language models and conversational interfaces in human-robot interaction, robots' ability to answer user questions is more important than ever. We therefore introduce a dataset of 1,893 user questions for household robots, collected from 100 participants and organized into 12 categories and 70 subcategories. Most work in explainable robotics focuses on why-questions. In contrast, our dataset provides a wide variety of questions, from questions about simple execution details to questions about how the robot would act in hypothetical scenarios -- thus giving roboticists valuable insights into what questions their robot needs to be able to answer. To collect the dataset, we created 15 video stimuli and 7 text stimuli, depicting robots performing varied household tasks. We then asked participants on Prolific what questions they would want to ask the robot in each portrayed situation. In the final dataset, the most frequent categories are questions about task execution details (22.5%), the robot's capabilities (12.7%), and performance assessments (11.3%). Although questions about how robots would handle potentially difficult scenarios and ensure correct behavior are less frequent, users rank them as the most important for robots to be able to answer. Moreover, we find that users who identify as novices in robotics ask different questions than more experienced users. Novices are more likely to inquire about simple facts, such as what the robot did or the current state of the environment. As robots enter environments shared with humans and language becomes central to giving instructions and interaction, this dataset provides a valuable foundation for (i) identifying the information robots need to log and expose to conversational interfaces, (ii) benchmarking question-answering modules, and (iii) designing explanation strategies that align with user expectations.
Learning to Optimize Edge Robotics: A Fast Integrated Perception-Motion-Communication Approach
Edge robotics involves frequent exchanges of large-volume multi-modal data. Existing methods ignore the interdependency between robotic functionalities and communication conditions, leading to excessive communication overhead. This paper revolutionizes edge robotics systems through integrated perception, motion, and communication (IPMC). As such, robots can dynamically adapt their communication strategies (i.e., compression ratio, transmission frequency, transmit power) by leveraging the knowledge of robotic perception and motion dynamics, thus reducing the need for excessive sensor data uploads. Furthermore, by leveraging the learning to optimize (LTO) paradigm, an imitation learning neural network is designed and implemented, which reduces the computational complexity by over 10x compared to state-of-the art optimization solvers. Experiments demonstrate the superiority of the proposed IPMC and the real-time execution capability of LTO.
Conformal Prediction in The Loop: A Feedback-Based Uncertainty Model for Trajectory Optimization NeurIPS 2025
Conformal Prediction (CP) is a powerful statistical machine learning tool to construct uncertainty sets with coverage guarantees, which has fueled its extensive adoption in generating prediction regions for decision-making tasks, e.g., Trajectory Optimization (TO) in uncertain environments. However, existing methods predominantly employ a sequential scheme, where decisions rely unidirectionally on the prediction regions, and consequently the information from decision-making fails to be fed back to instruct CP. In this paper, we propose a novel Feedback-Based CP (Fb-CP) framework for shrinking-horizon TO with a joint risk constraint over the entire mission time. Specifically, a CP-based posterior risk calculation method is developed by fully leveraging the realized trajectories to adjust the posterior allowable risk, which is then allocated to future times to update prediction regions. In this way, the information in the realized trajectories is continuously fed back to the CP, enabling attractive feedback-based adjustments of the prediction regions and a provable online improvement in trajectory performance. Furthermore, we theoretically prove that such adjustments consistently maintain the coverage guarantees of the prediction regions, thereby ensuring provable safety. Additionally, we develop a decision-focused iterative risk allocation algorithm with theoretical convergence analysis for allocating the posterior allowable risk which closely aligns with Fb-CP. Furthermore, we extend the proposed method to handle distribution shift. The effectiveness and superiority of the proposed method are demonstrated through benchmark experiments.
comment: Accepted by NeurIPS 2025 Main Track
Manual2Skill++: Connector-Aware General Robotic Assembly from Instruction Manuals via Vision-Language Models
Assembly hinges on reliably forming connections between parts; yet most robotic approaches plan assembly sequences and part poses while treating connectors as an afterthought. Connections represent the critical "last mile" of assembly execution, while task planning may sequence operations and motion plan may position parts, the precise establishment of physical connections ultimately determines assembly success or failure. In this paper, we consider connections as first-class primitives in assembly representation, including connector types, specifications, quantities, and placement locations. Drawing inspiration from how humans learn assembly tasks through step-by-step instruction manuals, we present Manual2Skill++, a vision-language framework that automatically extracts structured connection information from assembly manuals. We encode assembly tasks as hierarchical graphs where nodes represent parts and sub-assemblies, and edges explicitly model connection relationships between components. A large-scale vision-language model parses symbolic diagrams and annotations in manuals to instantiate these graphs, leveraging the rich connection knowledge embedded in human-designed instructions. We curate a dataset containing over 20 assembly tasks with diverse connector types to validate our representation extraction approach, and evaluate the complete task understanding-to-execution pipeline across four complex assembly scenarios in simulation, spanning furniture, toys, and manufacturing components with real-world correspondence.
SPOT: Sensing-augmented Trajectory Planning via Obstacle Threat Modeling
UAVs equipped with a single depth camera encounter significant challenges in dynamic obstacle avoidance due to limited field of view and inevitable blind spots. While active vision strategies that steer onboard cameras have been proposed to expand sensing coverage, most existing methods separate motion planning from sensing considerations, resulting in less effective and delayed obstacle response. To address this limitation, we introduce SPOT (Sensing-augmented Planning via Obstacle Threat modeling), a unified planning framework for observation-aware trajectory planning that explicitly incorporates sensing objectives into motion optimization. At the core of our method is a Gaussian Process-based obstacle belief map, which establishes a unified probabilistic representation of both recognized (previously observed) and potential obstacles. This belief is further processed through a collision-aware inference mechanism that transforms spatial uncertainty and trajectory proximity into a time-varying observation urgency map. By integrating urgency values within the current field of view, we define differentiable objectives that enable real-time, observation-aware trajectory planning with computation times under 10 ms. Simulation and real-world experiments in dynamic, cluttered, and occluded environments show that our method detects potential dynamic obstacles 2.8 seconds earlier than baseline approaches, increasing dynamic obstacle visibility by over 500\%, and enabling safe navigation through cluttered, occluded environments.
Do What You Say: Steering Vision-Language-Action Models via Runtime Reasoning-Action Alignment Verification
Reasoning Vision Language Action (VLA) models improve robotic instruction-following by generating step-by-step textual plans before low-level actions, an approach inspired by Chain-of-Thought (CoT) reasoning in language models. Yet even with a correct textual plan, the generated actions can still miss the intended outcomes in the plan, especially in out-of-distribution (OOD) scenarios. We formalize this phenomenon as a lack of embodied CoT faithfulness, and introduce a training-free, runtime policy steering method for reasoning-action alignment. Given a reasoning VLA's intermediate textual plan, our framework samples multiple candidate action sequences from the same model, predicts their outcomes via simulation, and uses a pre-trained Vision-Language Model (VLM) to select the sequence whose outcome best aligns with the VLA's own textual plan. Only executing action sequences that align with the textual reasoning turns our base VLA's natural action diversity from a source of error into a strength, boosting robustness to semantic and visual OOD perturbations and enabling novel behavior composition without costly re-training. We also contribute a reasoning-annotated extension of LIBERO-100, environment variations tailored for OOD evaluation, and demonstrate up to 15% performance gain over prior work on behavior composition tasks and scales with compute and data diversity. Project Website at: https://yilin-wu98.github.io/steering-reasoning-vla/
VT-Refine: Learning Bimanual Assembly with Visuo-Tactile Feedback via Simulation Fine-Tuning
Humans excel at bimanual assembly tasks by adapting to rich tactile feedback -- a capability that remains difficult to replicate in robots through behavioral cloning alone, due to the suboptimality and limited diversity of human demonstrations. In this work, we present VT-Refine, a visuo-tactile policy learning framework that combines real-world demonstrations, high-fidelity tactile simulation, and reinforcement learning to tackle precise, contact-rich bimanual assembly. We begin by training a diffusion policy on a small set of demonstrations using synchronized visual and tactile inputs. This policy is then transferred to a simulated digital twin equipped with simulated tactile sensors and further refined via large-scale reinforcement learning to enhance robustness and generalization. To enable accurate sim-to-real transfer, we leverage high-resolution piezoresistive tactile sensors that provide normal force signals and can be realistically modeled in parallel using GPU-accelerated simulation. Experimental results show that VT-Refine improves assembly performance in both simulation and the real world by increasing data diversity and enabling more effective policy fine-tuning. Our project page is available at https://binghao-huang.github.io/vt_refine/.
comment: Accepted by 9th Conference on Robot Learning (CoRL 2025); Website: https://binghao-huang.github.io/vt_refine/
Context-Based Meta Reinforcement Learning for Robust and Adaptable Peg-in-Hole Assembly Tasks
Autonomous assembly is an essential capability for industrial and service robots, with Peg-in-Hole (PiH) insertion being one of the core tasks. However, PiH assembly in unknown environments is still challenging due to uncertainty in task parameters, such as the hole position and orientation, resulting from sensor noise. Although context-based meta reinforcement learning (RL) methods have been previously presented to adapt to unknown task parameters in PiH assembly tasks, the performance depends on a sample-inefficient procedure or human demonstrations. Thus, to enhance the applicability of meta RL in real-world PiH assembly tasks, we propose to train the agent to use information from the robot's forward kinematics and an uncalibrated camera. Furthermore, we improve the performance by efficiently adapting the meta-trained agent to use data from force/torque sensor. Finally, we propose an adaptation procedure for out-of-distribution tasks whose parameters are different from the training tasks. Experiments on simulated and real robots prove that our modifications enhance the sample efficiency during meta training, real-world adaptation performance, and generalization of the context-based meta RL agent in PiH assembly tasks compared to previous approaches.
Multi-Layered Reasoning from a Single Viewpoint for Learning See-Through Grasping
Sensory substitution enables biological systems to perceive stimuli typically obtained by another organ, which is inspirational for physical agents. Multi-modal perception of intrinsic and extrinsic interactions is critical in building an intelligent robot that learns. This study presents a Vision-based See-Through Perception (VBSeeThruP) architecture that simultaneously perceives multiple intrinsic and extrinsic modalities via a single visual input in a markerless way, all packed within a soft robotic finger using the Soft Polyhedral Network design. It is generally applicable to miniature vision systems placed underneath deformable networks with a see-through design, capturing real-time images of the network's physical interactions induced by contact-based events overlayed on top of the visual scene of the external environment, as demonstrated in the ablation study. We present the VBSeeThruP's capability for learning reactive grasping without using external cameras or dedicated force and torque sensors on the fingertips. Using the inpainted scene and the deformation mask, we further demonstrate the multi-modal performance of the VBSeeThruP architecture to simultaneously achieve various perceptions, including but not limited to scene inpainting, object detection, depth sensing, scene segmentation, masked deformation tracking, 6D force/torque sensing, and contact event detection, all within a single sensory input from the in-finger vision markerlessly.
comment: 23 pages, 13 figures, 2 tables, for supplementary videos, see https://bionicdl.ancorasir.com/?p=1658, for opensourced codes, see https://github.com/ ancorasir/SeeThruFinger
Real-time Spatial-temporal Traversability Assessment via Feature-based Sparse Gaussian Process IROS2025
Terrain analysis is critical for the practical ap- plication of ground mobile robots in real-world tasks, espe- cially in outdoor unstructured environments. In this paper, we propose a novel spatial-temporal traversability assessment method, which aims to enable autonomous robots to effectively navigate through complex terrains. Our approach utilizes sparse Gaussian processes (SGP) to extract geometric features (curvature, gradient, elevation, etc.) directly from point cloud scans. These features are then used to construct a high- resolution local traversability map. Then, we design a spatial- temporal Bayesian Gaussian kernel (BGK) inference method to dynamically evaluate traversability scores, integrating historical and real-time data while considering factors such as slope, flatness, gradient, and uncertainty metrics. GPU acceleration is applied in the feature extraction step, and the system achieves real-time performance. Extensive simulation experiments across diverse terrain scenarios demonstrate that our method outper- forms SOTA approaches in both accuracy and computational efficiency. Additionally, we develop an autonomous navigation framework integrated with the traversability map and validate it with a differential driven vehicle in complex outdoor envi- ronments. Our code will be open-source for further research and development by the community, https://github.com/ZJU-FAST-Lab/FSGP_BGK.
comment: accepted by IROS2025
Whole-Body Model-Predictive Control of Legged Robots with MuJoCo
We demonstrate the surprising real-world effectiveness of a very simple approach to whole-body model-predictive control (MPC) of quadruped and humanoid robots: the iterative LQR (iLQR) algorithm with MuJoCo dynamics and finite-difference approximated derivatives. Building upon the previous success of model-based behavior synthesis and control of locomotion and manipulation tasks with MuJoCo in simulation, we show that these policies can easily generalize to the real world with few sim-to-real considerations. Our baseline method achieves real-time whole-body MPC on a variety of hardware experiments, including dynamic quadruped locomotion, quadruped walking on two legs, and full-sized humanoid bipedal locomotion. We hope this easy-to-reproduce hardware baseline lowers the barrier to entry for real-world whole-body MPC research and contributes to accelerating research velocity in the community. Our code and experiment videos will be available online at:https://johnzhang3.github.io/mujoco_ilqr
comment: under review
Development of a Linear Guide-Rail Testbed for Physically Emulating ISAM Operations
In-Space Servicing, Assembly, and Manufacturing (ISAM) is a set of emerging operations that provides several benefits to improve the longevity, capacity, mo- bility, and expandability of existing and future space assets. Serial robotic ma- nipulators are particularly vital in accomplishing ISAM operations, however, the complex perturbation forces and motions associated with movement of a robotic arm on a free-flying satellite presents a complex controls problem requiring addi- tional study. While many dynamical models are developed, experimentally test- ing and validating these models is challenging given that the models operate in space, where satellites have six-degrees-of-freedom (6-DOF). This paper attempts to resolve those challenges by presenting the design and development of a new hardware-in-the-loop (HIL) experimental testbed utilized to emulate ISAM. This emulation will be accomplished by means of a 6-DOF UR3e robotic arm attached to a satellite bus. This satellite bus is mounted to a 1-DOF guide-rail system, en- abling the satellite bus and robotic arm to move freely in one linear direction. This experimental ISAM emulation system will explore and validate models for space motion, serial robot manipulation, and contact mechanics.
comment: 12 pages, 4 figures, AAS/AIAA Space Flight Mechanics
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics NeurIPS 2025
Spatial referring is a fundamental capability of embodied robots to interact with the 3D physical world. However, even with the powerful pretrained vision language models (VLMs), recent approaches are still not qualified to accurately understand the complex 3D scenes and dynamically reason about the instruction-indicated locations for interaction. To this end, we propose RoboRefer, a 3D-aware VLM that can first achieve precise spatial understanding by integrating a disentangled but dedicated depth encoder via supervised fine-tuning (SFT). Moreover, RoboRefer advances generalized multi-step spatial reasoning via reinforcement fine-tuning (RFT), with metric-sensitive process reward functions tailored for spatial referring tasks. To support SFT and RFT training, we introduce RefSpatial, a large-scale dataset of 20M QA pairs (2x prior), covering 31 spatial relations (vs. 15 prior) and supporting complex reasoning processes (up to 5 steps). In addition, we introduce RefSpatial-Bench, a challenging benchmark filling the gap in evaluating spatial referring with multi-step reasoning. Experiments show that SFT-trained RoboRefer achieves state-of-the-art spatial understanding, with an average success rate of 89.6%. RFT-trained RoboRefer further outperforms all other baselines by a large margin, even surpassing Gemini-2.5-Pro by 17.4% in average accuracy on RefSpatial-Bench. Notably, RoboRefer can be integrated with various control policies to execute long-horizon, dynamic tasks across diverse robots (e,g., UR5, G1 humanoid) in cluttered real-world scenes. See the project page at https://zhoues.github.io/RoboRefer.
comment: Accepted by NeurIPS 2025. Project page: https://zhoues.github.io/RoboRefer/
GeNIE: A Generalizable Navigation System for In-the-Wild Environments
Reliable navigation in unstructured, real-world environments remains a significant challenge for embodied agents, especially when operating across diverse terrains, weather conditions, and sensor configurations. In this paper, we introduce GeNIE (Generalizable Navigation System for In-the-Wild Environments), a robust navigation framework designed for global deployment. GeNIE integrates a generalizable traversability prediction model built on SAM2 with a novel path fusion strategy that enhances planning stability in noisy and ambiguous settings. We deployed GeNIE in the Earth Rover Challenge (ERC) at ICRA 2025, where it was evaluated across six countries spanning three continents. GeNIE took first place and achieved 79% of the maximum possible score, outperforming the second-best team by 17%, and completed the entire competition without a single human intervention. These results set a new benchmark for robust, generalizable outdoor robot navigation. We will release the codebase, pretrained model weights, and newly curated datasets to support future research in real-world navigation.
comment: Accepted to IEEE Robotics and Automation Letters (RAL), 2025. Jiaming Wang, Diwen Liu, and Jizhuo Chen contributed equally to this work
Guided Multi-Fidelity Bayesian Optimization for Data-driven Controller Tuning with Digital Twins
We propose a \textit{guided multi-fidelity Bayesian optimization} framework for data-efficient controller tuning that integrates corrected digital twin simulations with real-world measurements. The method targets closed-loop systems with limited-fidelity simulations or inexpensive approximations. To address model mismatch, we build a multi-fidelity surrogate with a learned correction model that refines digital twin estimates using real data. An adaptive cost-aware acquisition function balances expected improvement, fidelity, and sampling cost. Our method ensures adaptability as new measurements arrive. The digital twin accuracy is re-estimated, dynamically adapting both cross-source correlations and the acquisition function. This ensures that accurate simulations are used more frequently, while inaccurate simulation data are appropriately downweighted. Experiments on robotic drive hardware and supporting numerical studies demonstrate that our method enhances tuning efficiency compared to standard Bayesian optimization and multi-fidelity methods.
comment: This work has been submitted to IEEE Robotics and Automation Letters (RA-L) for review
Policy Contrastive Decoding for Robotic Foundation Models
Robotic foundation models, or generalist robot policies, hold immense potential to enable flexible, general-purpose and dexterous robotic systems. Despite their advancements, our empirical experiments reveal that existing robot policies are prone to learning spurious correlations from pre-training trajectories, adversely affecting their generalization capabilities beyond the training data. To tackle this, we propose a novel Policy Contrastive Decoding (PCD) approach, which redirects the robot policy's focus toward object-relevant visual clues by contrasting action probability distributions derived from original and object-masked visual inputs. As a training-free method, our PCD can be used as a plugin to improve different types of robot policies without needing to finetune or access model weights. We conduct extensive experiments on top of three open-source robot policies, including the autoregressive policy OpenVLA and the diffusion-based policies Octo and $\pi_0$. The obtained results in both simulation and real-world environments prove PCD's flexibility and effectiveness, e.g., PCD enhances the state-of-the-art policy $\pi_0$ by 8.9% in the simulation environment and by 108% in the real-world environment. Code and demos are publicly available at: https://Koorye.github.io/proj/PCD.
Kinetostatics and Particle-Swarm Optimization of Vehicle-Mounted Underactuated Metamorphic Loading Manipulators
Fixed degree-of-freedom (DoF) loading mechanisms often suffer from excessive actuators, complex control, and limited adaptability to dynamic tasks. This study proposes an innovative mechanism of underactuated metamorphic loading manipulators (UMLM), integrating a metamorphic arm with a passively adaptive gripper. The metamorphic arm exploits geometric constraints, enabling the topology reconfiguration and flexible motion trajectories without additional actuators. The adaptive gripper, driven entirely by the arm, conforms to diverse objects through passive compliance. A structural model is developed, and a kinetostatics analysis is conducted to investigate isomorphic grasping configurations. To optimize performance, Particle-Swarm Optimization (PSO) is utilized to refine the gripper's dimensional parameters, ensuring robust adaptability across various applications. Simulation results validate the UMLM's easily implemented control strategy, operational versatility, and effectiveness in grasping diverse objects in dynamic environments. This work underscores the practical potential of underactuated metamorphic mechanisms in applications requiring efficient and adaptable loading solutions. Beyond the specific design, this generalized modeling and optimization framework extends to a broader class of manipulators, offering a scalable approach to the development of robotic systems that require efficiency, flexibility, and robust performance.
comment: 48 pages, 18 figures
MoReFlow: Motion Retargeting Learning through Unsupervised Flow Matching
Motion retargeting holds a premise of offering a larger set of motion data for characters and robots with different morphologies. Many prior works have approached this problem via either handcrafted constraints or paired motion datasets, limiting their applicability to humanoid characters or narrow behaviors such as locomotion. Moreover, they often assume a fixed notion of retargeting, overlooking domain-specific objectives like style preservation in animation or task-space alignment in robotics. In this work, we propose MoReFlow, Motion Retargeting via Flow Matching, an unsupervised framework that learns correspondences between characters' motion embedding spaces. Our method consists of two stages. First, we train tokenized motion embeddings for each character using a VQ-VAE, yielding compact latent representations. Then, we employ flow matching with conditional coupling to align the latent spaces across characters, which simultaneously learns conditioned and unconditioned matching to achieve robust but flexible retargeting. Once trained, MoReFlow enables flexible and reversible retargeting without requiring paired data. Experiments demonstrate that MoReFlow produces high-quality motions across diverse characters and tasks, offering improved controllability, generalization, and motion realism compared to the baselines.
EDEN: Efficient Dual-Layer Exploration Planning for Fast UAV Autonomous Exploration in Large 3-D Environments
Efficient autonomous exploration in large-scale environments remains challenging due to the high planning computational cost and low-speed maneuvers. In this paper, we propose a fast and computationally efficient dual-layer exploration planning method. The insight of our dual-layer method is efficiently finding an acceptable long-term region routing and greedily exploring the target in the region of the first routing area with high speed. Specifically, the proposed method finds the long-term area routing through an approximate algorithm to ensure real-time planning in large-scale environments. Then, the viewpoint in the first routing region with the lowest curvature-penalized cost, which can effectively reduce decelerations caused by sharp turn motions, will be chosen as the next exploration target. To further speed up the exploration, we adopt an aggressive and safe exploration-oriented trajectory to enhance exploration continuity. The proposed method is compared to state-of-the-art methods in challenging simulation environments. The results show that the proposed method outperforms other methods in terms of exploration efficiency, computational cost, and trajectory speed. We also conduct real-world experiments to validate the effectiveness of the proposed method. The code will be open-sourced.
comment: nothing
EvidMTL: Evidential Multi-Task Learning for Uncertainty-Aware Semantic Surface Mapping from Monocular RGB Images IROS 2025
For scene understanding in unstructured environments, an accurate and uncertainty-aware metric-semantic mapping is required to enable informed action selection by autonomous systems. Existing mapping methods often suffer from overconfident semantic predictions, and sparse and noisy depth sensing, leading to inconsistent map representations. In this paper, we therefore introduce EvidMTL, a multi-task learning framework that uses evidential heads for depth estimation and semantic segmentation, enabling uncertainty-aware inference from monocular RGB images. To enable uncertainty-calibrated evidential multi-task learning, we propose a novel evidential depth loss function that jointly optimizes the belief strength of the depth prediction in conjunction with evidential segmentation loss. Building on this, we present EvidKimera, an uncertainty-aware semantic surface mapping framework, which uses evidential depth and semantics prediction for improved 3D metric-semantic consistency. We train and evaluate EvidMTL on the NYUDepthV2 and assess its zero-shot performance on ScanNetV2, demonstrating superior uncertainty estimation compared to conventional approaches while maintaining comparable depth estimation and semantic segmentation. In zero-shot mapping tests on ScanNetV2, EvidKimera outperforms Kimera in semantic surface mapping accuracy and consistency, highlighting the benefits of uncertainty-aware mapping and underscoring its potential for real-world robotic applications.
comment: Submitted to IROS 2025 Conference
Manual2Skill: Learning to Read Manuals and Acquire Robotic Skills for Furniture Assembly Using Vision-Language Models
Humans possess an extraordinary ability to understand and execute complex manipulation tasks by interpreting abstract instruction manuals. For robots, however, this capability remains a substantial challenge, as they cannot interpret abstract instructions and translate them into executable actions. In this paper, we present Manual2Skill, a novel framework that enables robots to perform complex assembly tasks guided by high-level manual instructions. Our approach leverages a Vision-Language Model (VLM) to extract structured information from instructional images and then uses this information to construct hierarchical assembly graphs. These graphs represent parts, subassemblies, and the relationships between them. To facilitate task execution, a pose estimation model predicts the relative 6D poses of components at each assembly step. At the same time, a motion planning module generates actionable sequences for real-world robotic implementation. We demonstrate the effectiveness of Manual2Skill by successfully assembling several real-world IKEA furniture items. This application highlights its ability to manage long-horizon manipulation tasks with both efficiency and precision, significantly enhancing the practicality of robot learning from instruction manuals. This work marks a step forward in advancing robotic systems capable of understanding and executing complex manipulation tasks in a manner akin to human capabilities.Project Page: https://owensun2004.github.io/Furniture-Assembly-Web/
Auditory Localization and Assessment of Consequential Robot Sounds: A Multi-Method Study in Virtual Reality
Mobile robots increasingly operate alongside humans but are often out of sight, so that humans need to rely on the sounds of the robots to recognize their presence. For successful human-robot interaction (HRI), it is therefore crucial to understand how humans perceive robots by their consequential sounds, i.e., operating noise. Prior research suggests that the sound of a quadruped Go1 is more detectable than that of a wheeled Turtlebot. This study builds on this and examines the human ability to localize consequential sounds of three robots (quadruped Go1, wheeled Turtlebot 2i, wheeled HSR) in Virtual Reality. In a within-subjects design, we assessed participants' localization performance for the robots with and without an acoustic vehicle alerting system (AVAS) for two velocities (0.3, 0.8 m/s) and two trajectories (head-on, radial). In each trial, participants were presented with the sound of a moving robot for 3~s and were tasked to point at its final position (localization task). Localization errors were measured as the absolute angular difference between the participants' estimated and the actual robot position. Results showed that the robot type significantly influenced the localization accuracy and precision, with the sound of the wheeled HSR (especially without AVAS) performing worst under all experimental conditions. Surprisingly, participants rated the HSR sound as more positive, less annoying, and more trustworthy than the Turtlebot and Go1 sound. This reveals a tension between subjective evaluation and objective auditory localization performance. Our findings highlight consequential robot sounds as a critical factor for designing intuitive and effective HRI, with implications for human-centered robot design and social navigation.
Demonstration-Enhanced Adaptable Multi-Objective Robot Navigation
Preference-aligned robot navigation in human environments is typically achieved through learning-based approaches, utilizing user feedback or demonstrations for personalization. However, personal preferences are subject to change and might even be context-dependent. Yet traditional reinforcement learning (RL) approaches with static reward functions often fall short in adapting to evolving user preferences, inevitably reflecting demonstrations once training is completed. This paper introduces a structured framework that combines demonstration-based learning with multi-objective reinforcement learning (MORL). To ensure real-world applicability, our approach allows for dynamic adaptation of the robot navigation policy to changing user preferences without retraining. It fluently modulates the amount of demonstration data reflection and other preference-related objectives. Through rigorous evaluations, including a baseline comparison and sim-to-real transfer on two robots, we demonstrate our framework's capability to adapt to user preferences accurately while achieving high navigational performance in terms of collision avoidance and goal pursuance.
Safe, Task-Consistent Manipulation with Operational Space Control Barrier Functions IROS
Safe real-time control of robotic manipulators in unstructured environments requires handling numerous safety constraints without compromising task performance. Traditional approaches, such as artificial potential fields (APFs), suffer from local minima, oscillations, and limited scalability, while model predictive control (MPC) can be computationally expensive. Control barrier functions (CBFs) offer a promising alternative due to their high level of robustness and low computational cost, but these safety filters must be carefully designed to avoid significant reductions in the overall performance of the manipulator. In this work, we introduce an Operational Space Control Barrier Function (OSCBF) framework that integrates safety constraints while preserving task-consistent behavior. Our approach scales to hundreds of simultaneous constraints while retaining real-time control rates, ensuring collision avoidance, singularity prevention, and workspace containment even in highly cluttered settings or during dynamic motions. By explicitly accounting for the task hierarchy in the CBF objective, we prevent degraded performance across both joint-space and operational-space tasks, when at the limit of safety. We validate performance in both simulation and hardware, and release our open-source high-performance code and media on our project webpage, https://stanfordasl.github.io/oscbf/
comment: To be presented at 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
The Impact of VR and 2D Interfaces on Human Feedback in Preference-Based Robot Learning
Aligning robot navigation with human preferences is essential for ensuring comfortable, and predictable robot movement in shared spaces. While preference-based learning methods, such as reinforcement learning from human feedback (RLHF), enable this alignment, the choice of the preference collection interface may influence the process. Traditional 2D interfaces provide structured views but lack spatial depth, whereas immersive VR offers richer perception, potentially affecting preference articulation. This study systematically examines how the interface modality impacts human preference collection and navigation policy alignment. We introduce a novel dataset of 2,325 human preference queries collected through both VR and 2D interfaces, revealing significant differences in user experience, preference consistency, and policy outcomes. Our findings highlight the trade-offs between immersion, perception, and preference reliability, emphasizing the importance of interface selection in preference-based robot learning. The dataset is available to support future research.
Immersive Explainability: Visualizing Robot Navigation Decisions through XAI Semantic Scene Projections in Virtual Reality
End-to-end robot policies achieve high performance through neural networks trained via reinforcement learning (RL). Yet, their black box nature and abstract reasoning pose challenges for human-robot interaction (HRI), because humans may experience difficulty in understanding and predicting the robot's navigation decisions, hindering trust development. We present a virtual reality (VR) interface that visualizes explainable AI (XAI) outputs and the robot's lidar perception to support intuitive interpretation of RL-based navigation behavior. By visually highlighting objects based on their attribution scores, the interface grounds abstract policy explanations in the scene context. This XAI visualization bridges the gap between obscure numerical XAI attribution scores and a human-centric semantic level of explanation. A within-subjects study with 24 participants evaluated the effectiveness of our interface for four visualization conditions combining XAI and lidar. Participants ranked scene objects across navigation scenarios based on their importance to the robot, followed by a questionnaire assessing subjective understanding and predictability. Results show that semantic projection of attributions significantly enhances non-expert users' objective understanding and subjective awareness of robot behavior. In addition, lidar visualization further improves perceived predictability, underscoring the value of integrating XAI and sensor for transparent, trustworthy HRI.
Multiagent Systems
Unleashing Diverse Thinking Modes in LLMs through Multi-Agent Collaboration
Large Language Models (LLMs) demonstrate strong performance but often lack interpretable reasoning. This paper introduces the Multi-Agent Collaboration Framework for Diverse Thinking Modes (DiMo), which enhances both performance and interpretability by simulating a structured debate among four specialized LLM agents. Each agent embodies a distinct reasoning paradigm, allowing the framework to collaboratively explore diverse cognitive approaches. Through iterative debate, agents challenge and refine initial responses, yielding more robust conclusions and an explicit, auditable reasoning chain. Across six benchmarks and under a unified open-source setup, DiMo improves accuracy over widely used single-model and debate baselines, with the largest gains on math. We position DiMo as a semantics-aware, Web-native multi-agent framework: it models human-machine intelligence with LLM agents that produce semantically typed, URL-annotated evidence chains for explanations and user-friendly interactions. Although our experiments use standard reasoning benchmarks, the framework is designed to be instantiated over Web corpora and knowledge graphs, combining retrieval-augmented reasoning with structured justifications that downstream systems can inspect and reuse.
Prompt Optimization via Retrieved Reasoning Assets and Multi-Agent Analysis
Prompt optimization has emerged as an effective alternative to retraining for improving the performance of Large Language Models (LLMs). However, most existing approaches treat evaluation as a black box, relying solely on numerical scores while offering limited insight into why a prompt succeeds or fails. They also depend heavily on trial-and-error refinements, which are difficult to interpret and control. In this paper, we introduce MA-SAPO, a Multi-Agent framework for Score-Aware Prompt Optimization. Compared to prior methods, MA-SAPO explicitly couples evaluation outcomes with structured reasoning to guide systematic edits. The framework specifically consists of two stages: during the Reasoning Phase, agents collaboratively explain metric scores, diagnose weaknesses, and synthesize targeted refinements that are stored as reusable reasoning assets; during the Test Phase, agents retrieve these assets to analyze optimized prompts and apply only evidence-grounded edits. By turning evaluation signals into interpretable reasoning chains, MA-SAPO produces prompt refinements that are more transparent, auditable, and controllable. Experiments on the HelpSteer1/2 benchmarks demonstrate consistent improvements over single-pass prompting, retrieval-augmented baselines, and prior multi-agent strategies, validating the effectiveness of our approach.
comment: Preprint
Ripple Effect Protocol: Coordinating Agent Populations
Modern AI agents can exchange messages using protocols such as A2A and ACP, yet these mechanisms emphasize communication over coordination. As agent populations grow, this limitation produces brittle collective behavior, where individually smart agents converge on poor group outcomes. We introduce the Ripple Effect Protocol (REP), a coordination protocol in which agents share not only their decisions but also lightweight sensitivities - signals expressing how their choices would change if key environmental variables shifted. These sensitivities ripple through local networks, enabling groups to align faster and more stably than with agent-centric communication alone. We formalize REP's protocol specification, separating required message schemas from optional aggregation rules, and evaluate it across scenarios with varying incentives and network topologies. Benchmarks across three domains: (i) supply chain cascades (Beer Game), (ii) preference aggregation in sparse networks (Movie Scheduling), and (iii) sustainable resource allocation (Fishbanks) show that REP improves coordination accuracy and efficiency over A2A by 41 to 100%, while flexibly handling multimodal sensitivity signals from LLMs. By making coordination a protocol-level capability, REP provides scalable infrastructure for the emerging Internet of Agents
Static Sandboxes Are Inadequate: Modeling Societal Complexity Requires Open-Ended Co-Evolution in LLM-Based Multi-Agent Simulations
What if artificial agents could not just communicate, but also evolve, adapt, and reshape their worlds in ways we cannot fully predict? With llm now powering multi-agent systems and social simulations, we are witnessing new possibilities for modeling open-ended, ever-changing environments. Yet, most current simulations remain constrained within static sandboxes, characterized by predefined tasks, limited dynamics, and rigid evaluation criteria. These limitations prevent them from capturing the complexity of real-world societies. In this paper, we argue that static, task-specific benchmarks are fundamentally inadequate and must be rethought. We critically review emerging architectures that blend llm with multi-agent dynamics, highlight key hurdles such as balancing stability and diversity, evaluating unexpected behaviors, and scaling to greater complexity, and introduce a fresh taxonomy for this rapidly evolving field. Finally, we present a research roadmap centered on open-endedness, continuous co-evolution, and the development of resilient, socially aligned AI ecosystems. We call on the community to move beyond static paradigms and help shape the next generation of adaptive, socially-aware multi-agent simulations.
Agentic System with Modal Logic for Autonomous Diagnostics
The development of intelligent agents, particularly those powered by language models (LMs), has shown a critical role in various environments that require intelligent and autonomous decision-making. Environments are not passive testing grounds, and they represent the data required for agents to learn and exhibit in very challenging conditions that require adaptive, complex, and autonomous capacity to make decisions. While the paradigm of scaling models and datasets has led to remarkable emergent capabilities, we argue that scaling the structure, fidelity, and logical consistency of agent reasoning within these environments is a crucial, yet underexplored, dimension of AI research. This paper introduces a neuro-symbolic multi-agent architecture where the belief states of individual agents are formally represented as Kripke models. This foundational choice enables them to reason about known concepts of \emph{possibility} and \emph{necessity} using the formal language of modal logic. In this work, we use immutable, domain-specific knowledge to make an informed root cause diagnosis, which is encoded as logical constraints essential for proper, reliable, and explainable diagnosis. In the proposed model, we show constraints that actively guide the hypothesis generation of LMs, effectively preventing them from reaching physically or logically untenable conclusions. In a high-fidelity simulated particle accelerator environment, our system successfully diagnoses complex, cascading failures by combining the powerful semantic intuition of LMs with the rigorous, verifiable validation of modal logic and a factual world model and showcasing a viable path toward more robust, reliable, and verifiable autonomous agents.
comment: 10 pages, 1 figure
ReaGAN: Node-as-Agent-Reasoning Graph Agentic Network
Graph Neural Networks (GNNs) have achieved remarkable success in graph-based learning by propagating information among neighbor nodes via predefined aggregation mechanisms. However, such fixed schemes often suffer from two key limitations. First, they cannot handle the imbalance in node informativeness -- some nodes are rich in information, while others remain sparse. Second, predefined message passing primarily leverages local structural similarity while ignoring global semantic relationships across the graph, limiting the model's ability to capture distant but relevant information. We propose Retrieval-augmented Graph Agentic Network (ReaGAN), an agent-based framework that empowers each node with autonomous, node-level decision-making. Each node acts as an agent that independently plans its next action based on its internal memory, enabling node-level planning and adaptive message propagation. Additionally, retrieval-augmented generation (RAG) allows nodes to access semantically relevant content and build global relationships in the graph. ReaGAN achieves competitive performance under few-shot in-context settings using a frozen LLM backbone without fine-tuning, showcasing the potential of agentic planning and local-global retrieval in graph learning.
comment: 11 pages, work in progress
Dominated Actions in Imperfect-Information Games
Dominance is a fundamental concept in game theory. In strategic-form games dominated strategies can be identified in polynomial time. As a consequence, iterative removal of dominated strategies can be performed efficiently as a preprocessing step for reducing the size of a game before computing a Nash equilibrium. For imperfect-information games in extensive form, we could convert the game to strategic form and then iteratively remove dominated strategies in the same way; however, this conversion may cause an exponential blowup in game size. In this paper we define and study the concept of dominated actions in imperfect-information games. Our main result is a polynomial-time algorithm for determining whether an action is dominated (strictly or weakly) by any mixed strategy in n-player games, which can be extended to an algorithm for iteratively removing dominated actions. This allows us to efficiently reduce the size of the game tree as a preprocessing step for Nash equilibrium computation. We explore the role of dominated actions empirically in the "All In or Fold" No-Limit Texas Hold'em poker variant.
Systems and Control (CS)
Robust Dynamic Staffing with Predictions
We consider a natural dynamic staffing problem in which a decision-maker sequentially hires workers over a finite horizon to meet an unknown demand revealed at the end. Predictions about demand arrive over time and become increasingly accurate, while worker availability decreases. This creates a fundamental trade-off between hiring early to avoid understaffing (when workers are more available but forecasts are less reliable) and hiring late to avoid overstaffing (when forecasts are more accurate but availability is lower). This problem is motivated by last-mile delivery operations, where companies such as Amazon rely on gig-economy workers whose availability declines closer to the operating day. To address practical limitations of Bayesian models (in particular, to remain agnostic to the underlying forecasting method), we study this problem under adversarial predictions. In this model, sequential predictions are adversarially chosen uncertainty intervals that (approximately) contain the true demand. The objective is to minimize worst-case staffing imbalance cost. Our main result is a simple and computationally efficient online algorithm that is minimax optimal. We first characterize the minimax cost against a restricted adversary via a polynomial-size linear program, then show how to emulate this solution in the general case. While our base model focuses on a single demand, we extend the framework to multiple demands (with egalitarian/utilitarian objectives), to settings with costly reversals of hiring decisions, and to inconsistent prediction intervals. We also introduce a practical "re-solving" variant of our algorithm, which we prove is also minimax optimal. Finally we conduct numerical experiments showing that our algorithms outperform Bayesian heuristics in both cost and speed, and are competitive with (approximate or exact) Bayesian-optimal policies when those can be computed.
Adversarial Reinforcement Learning for Robust Control of Fixed-Wing Aircraft under Model Uncertainty
This paper presents a reinforcement learning-based path-following controller for a fixed-wing small uncrewed aircraft system (sUAS) that is robust to uncertainties in the aerodynamic model of the sUAS. The controller is trained using the Robust Adversarial Reinforcement Learning framework, where an adversary perturbs the environment (aerodynamic model) to expose the agent (sUAS) to demanding scenarios. In our formulation, the adversary introduces rate-bounded perturbations to the aerodynamic model coefficients. We demonstrate that adversarial training improves robustness compared to controllers trained using stochastic model uncertainty. The learned controller is also benchmarked against a switched uncertain initial condition controller. The effectiveness of the approach is validated through high-fidelity simulations using a realistic six-degree-of-freedom fixed-wing aircraft model, showing accurate and robust path-following performance under a variety of uncertain aerodynamic conditions.
QRTlib: A Library for Fast Quantum Real Transforms
Real-valued transforms such as the discrete cosine, sine, and Hartley transforms play a central role in classical computing, complementing the Fourier transform in applications from signal and image processing to data compression. However, their quantum counterparts have not evolved in parallel, and no unified framework exists for implementing them efficiently on quantum hardware. This article addresses this gap by introducing QRTlib, a library for fast and practical implementations of quantum real transforms, including the quantum Hartley, cosine, and sine transforms of various types. We develop new algorithms and circuit optimizations that make these transforms efficient and suitable for near-term devices. In particular, we present a quantum Hartley transform based on the linear combination of unitaries (LCU) technique, achieving a fourfold reduction in circuit size compared to prior methods, and an improved quantum sine transform of Type I that removes large multi-controlled operations. We also introduce circuit-level optimizations, including two's-complement and or-tree constructions. QRTlib provides the first complete implementations of these quantum real transforms in Qiskit.
Towards Intelligent Traffic Signaling in Dhaka City Based on Vehicle Detection and Congestion Optimization
The vehicular density in urbanizing cities of developing countries such as Dhaka, Bangladesh result in a lot of traffic congestion, causing poor on-road experiences. Traffic signaling is a key component in effective traffic management for such situations, but the advancements in intelligent traffic signaling have been exclusive to developed countries with structured traffic. The non-lane-based, heterogeneous traffic of Dhaka City requires a contextual approach. This study focuses on the development of an intelligent traffic signaling system feasible in the context of developing countries such as Bangladesh. We propose a pipeline leveraging Real Time Streaming Protocol (RTSP) feeds, a low resources system Raspberry Pi 4B processing, and a state of the art YOLO-based object detection model trained on the Non-lane-based and Heterogeneous Traffic (NHT-1071) dataset to detect and classify heterogeneous traffic. A multi-objective optimization algorithm, NSGA-II, then generates optimized signal timings, minimizing waiting time while maximizing vehicle throughput. We test our implementation in a five-road intersection at Palashi, Dhaka, demonstrating the potential to significantly improve traffic management in similar situations. The developed testbed paves the way for more contextual and effective Intelligent Traffic Signaling (ITS) solutions for developing areas with complicated traffic dynamics such as Dhaka City.
comment: 10 pages, Submitted to IEEE Transactions on Intelligent Transportation Systems (T-ITS)
Enhancing Channel Estimation in RIS-aided Systems via Observation Matrix Design
Reconfigurable intelligent surfaces (RISs) have emerged as a promising technology for enhancing wireless communications through dense antenna arrays. Accurate channel estimation is critical to unlocking their full performance potential. To enhance RIS channel estimators, this paper proposes a novel observation matrix design scheme. Bayesian optimization framework is adopted to generate observation matrices that maximize the mutual information between received pilot signals and RIS channels. To solve the formulated problem efficiently, we develop an alternating Riemannian manifold optimization (ARMO) algorithm to alternately update the receiver combiners and RIS phase-shift matrices. An adaptive kernel training strategy is further introduced to iteratively refine the channel covariance matrix without requiring additional pilot resources. Simulation results demonstrate that the proposed ARMO-enhanced estimator achieves substantial gains in estimation accuracy over state-of-the-art methods.
comment: 5 pages, 2 figures
Topology-Aware Hybrid Wi-Fi/BLE Fingerprinting via Evidence-Theoretic Fusion and Persistent Homology
Indoor localization remains challenging in GNSS-denied environments due to multipath, device heterogeneity, and volatile radio conditions. We propose a topology-aware, hybrid Wi-Fi/BLE fingerprinting framework that (i) applies physically consistent RSS normalization (dBm z-scoring or dBm -> linear mW -> z-score), (ii) denoises streams with classical Bayesian filters (KF/UKF/PF), (iii) combines complementary regressors (Random Forest and weighted kNN with a diagonal Mahalanobis metric), (iv) performs evidence-theoretic fusion via Dempster-Shafer theory (DST), and (v) augments each sample with persistent-homology (PH) descriptors. The system outputs both (x, y) estimates and interpretable belief maps, and is engineered for microcontroller-class deployment with per-update cost O(T log M + log M + Mp + S). We evaluate on two heterogeneous datasets, including a new 1,200-sample ESP32 survey, and report ablations, robustness to test-only noise, and significance across 10 stratified splits. Under 10% synthetic RSS noise, the full pipeline attains 3.40 m (Dataset 1) and 2.45 m (Dataset 2) RMSE, improving a strong PF + RF baseline by about 37%. Averaged across splits, it yields 4.993 +/- 0.15 m versus 6.292 +/- 0.13 m (20.6% relative reduction; p < 0.001). In noise-free tests, accuracy tightens to 0.44 m and 0.32 m (up to 56% better). Compared with recent learning-heavy approaches that assume large site-specific datasets and GPU inference, our method delivers competitive accuracy with formal uncertainty quantification and low computational cost suitable for real-time deployment.
SMP-RCR: A Sparse Multipoint Moment Matching Method for RC Reduction
In post--layout circuit simulation, efficient model order reduction (MOR) for many--port resistor--capacitor (RC) circuits remains a crucial issue. The current mainstream MOR methods for such circuits include high--order moment matching methods and elimination methods. High-order moment matching methods--characterized by high accuracy, such as PRIMA and TurboMOR--tend to generate large dense reduced-order systems when the number of ports is large, which impairs the efficiency of MOR. Another common type of MOR method for many--port circuits is based on Gaussian elimination, with the SIP method as a representative. The main limitation of this method lies in the inadequate matching of high--order moments. In this paper, we propose a sparse multipoint moment matching method and present comprehensive theoretical analysis results regarding the multi--frequency high--order moment matching property. Meanwhile, to enhance the algorithm's efficiency, sparse control and deflation techniques are introduced to further optimize the algorithm. Numerical experiments demonstrated that, compared to SIP, the accuracy is improved by more than two orders of magnitude at high frequency points without adding many extra linear components. Compared to TurboMOR methods, our method achieves a speed improvement of more than twice while maintaining the same level of precision.
Small-Signal Stability Analysis of Power Systems by Implicit Multilinear Models
This paper proposes a new approach to perform small-signal stability analysis based on linearization of implicit multilinear models. Multilinear models describe the system dynamics by multilinear functions of state, input, and algebraic variables. Using suitable transformations of variables, they can also represent trigonometric functions, which often occur in power systems modeling. This allows tensor representations of grid-following and grid-forming power converters. This paper introduces small-signal stability analysis of equilibrium points based on implicit multilinear models using generalized eigenvalues. The generalized eigenvalues are computed from linear descriptor models of the linearized implicit multilinear model. The proposed approach is tested using a 3-bus network example, first by comparing time-domain simulations of the implicit multilinear model with those of the nonlinear model, and second by comparing the generalized eigenvalues with those of the linearized nonlinear model. The results show that the decomposed tensor representation of the implicit multilinear model allows for a faster linearization compared to conventional methods in MATLAB Simulink.
Single-Step Digital Backpropagation for O-band Coherent Transmission Systems
We demonstrate digital backpropagation-based compensation of fibre nonlinearities in the near-zero dispersion regime of the O-band. Single-step DBP effectively mitigates self-phase modulation, achieving SNR gains of up to 1.6 dB for 50 Gbaud PDM-256QAM transmission over a 2-span 151 km SMF-28 ULL fibre link.
comment: conference, 3 pages, 2 figures
Stabilization of Nonlinear Systems with State-Dependent Representation: From Model-Based to Direct Data-Driven Control
This paper presents a novel framework for stabilizing nonlinear systems represented in state-dependent form. We first reformulate the nonlinear dynamics as a state-dependent parameter-varying model and synthesize a stabilizing controller offline via tractable linear matrix inequalities (LMIs). The resulting controller guarantees local exponential stability, maintains robustness against disturbances, and provides an estimate of the region of attraction under input saturation. We then extend the formulation to the direct data-driven setting, where a known library of basis functions represents the dynamics with unknown coefficients consistent with noisy experimental data. By leveraging Petersen's lemma, we derive data-dependent LMIs that ensure stability and robustness for all systems compatible with the data. Numerical and physical experimental results validate that our approach achieves rigorous end-to-end guarantees on stability, robustness, and safety directly from finite data without explicit model identification.
AoI-Aware Task Offloading and Transmission Optimization for Industrial IoT Networks: A Branching Deep Reinforcement Learning Approach
In the Industrial Internet of Things (IIoT), the frequent transmission of large amounts of data over wireless networks should meet the stringent timeliness requirements. Particularly, the freshness of packet status updates has a significant impact on the system performance. In this paper, we propose an age-of-information (AoI)-aware multi-base station (BS) real-time monitoring framework to support extensive IIoT deployments. To meet the freshness requirements of IIoT, we formulate a joint task offloading and resource allocation optimization problem with the goal of minimizing long-term average AoI. Tackling the core challenges of combinatorial explosion in multi-BS decision spaces and the stochastic dynamics of IIoT systems is crucial, as these factors render traditional optimization methods intractable. Firstly, an innovative branching-based Dueling Double Deep Q-Network (Branching-D3QN) algorithm is proposed to effectively implement task offloading, which optimizes the convergence performance by reducing the action space complexity from exponential to linear levels. Then, an efficient optimization solution to resource allocation is proposed by proving the semi-definite property of the Hessian matrix of bandwidth and computation resources. Finally, we propose an iterative optimization algorithm for efficient joint task offloading and resource allocation to achieve optimal average AoI performance. Extensive simulations demonstrate that our proposed Branching-D3QN algorithm outperforms both state-of-the-art DRL methods and classical heuristics, achieving up to a 75% enhanced convergence speed and at least a 22% reduction in the long-term average AoI.
comment: 15 pages, 13 figures, submitted to IEEE journal for potential publication
Real-time Measurement-based Optimization for Distribution System Operation Considering Battery Voltage and Thermal Constraints SC
The secure operation of power distribution systems is challenged by the growing integration of distributed energy resources. Leveraging the flexibility of battery storage offers a cost-effective alternative to measures like generation curtailment, which results in energy losses. However, developing an effective operational model for battery storage is hindered by inaccurate grid models, unavailability of load data, nonlinear relationship between power injections and network states, intertemporal constraints, and complex electrochemical and thermal dynamics. To address these challenges, this paper proposes a data-driven operational control scheme for battery storage in distribution systems. Linear and convex quadratic operational constraints are constructed based on real-time distribution system and battery storage measurements. Lyapunov optimization decouples multi-period battery operation, enabling a real-time, forecast-free control strategy with low computational complexity. Numerical studies using nonlinear distribution system and battery storage simulators validate the effectiveness of the approach in ensuring secure distribution system operation and satisfaction of voltage and thermal constraints of battery storage.
comment: 7 pages, submitted to PSCC 2026
Iterative solvers for partial differential equations with dissipative structure: Operator preconditioning and optimal control
This work considers the iterative solution of large-scale problems subject to non-symmetric matrices or operators arising in discretizations of (port-)Hamiltonian partial differential equations. We consider problems governed by an operator $\mathcal{A}=\mathcal{H}+\mathcal{S}$ with symmetric part $\mathcal{H}$ that is positive (semi-)definite and skew-symmetric part $\mathcal{S}$. Prior work has shown that the structure and sparsity of the associated linear system enables Krylov subspace solvers such as the generalized minimal residual method (GMRES) or short recurrence variants such as Widlund's or Rapoport's method using the symmetric part $\mathcal{H}$, or an approximation of it, as preconditioner. In this work, we analyze the resulting condition numbers, which are crucial for fast convergence of these methods, for various partial differential equations (PDEs) arising in diffusion phenomena, fluid dynamics, and elasticity. We show that preconditioning with the symmetric part leads to a condition number uniform in the mesh size in case of elliptic and parabolic PDEs where $\mathcal{H}^{-1}\mathcal{S}$ is a bounded operator. Further, we employ the tailored Krylov subspace methods in optimal control by means of a condensing approach and a constraint preconditioner for the optimality system. We illustrate the results by various large-scale numerical examples and discuss efficient evaluations of the preconditioner, such as incomplete Cholesky factorization or the algebraic multigrid method.
comment: 26 pages, 8 figures
Adaptive Sensing Performance Design for Enhancing Secure Communication in Networked ISAC Systems
The channel state information (CSI) of an eavesdropper is crucial for physical layer security (PLS) design, but it is difficult to obtain due to the passive and non-cooperative nature of the eavesdropper. To this end, integrated sensing and communication (ISAC) offers a novel solution by estimating the CSI of the eavesdropper based on sensing information. However, existing studies normally impose explicit and fixed sensing performance requirement without considering the varying communication conditions, which hinders the system from fully exploiting the synergy between sensing and communication. To address this issue, this paper proposes sensing-enhanced secure communication with adaptive sensing performance. Specifically, we formulate the sensing performance implicitly in the information leakage rate and adaptively optimize it for the minimization of the power consumption, offering enhanced flexibility and adaptability in sensing performance. We consider both centralized and decentralized designs to thoroughly investigate the impact of network structure on system performance and complexity. Specifically, we devise a block coordinate descent (BCD)-based method for centralized design. For decentralized design, we develop an optimization framework based on consensus alternating direction method of multipliers (ADMM) to reduce complexity and information exchange overhead. Experimental results demonstrate the advantage of the proposed implicit sensing performance requirement design due to its capability to adaptively adjust the sensing performance to enhance the system performance for varying system configurations.
comment: 16 pages
Conformal Prediction in The Loop: A Feedback-Based Uncertainty Model for Trajectory Optimization NeurIPS 2025
Conformal Prediction (CP) is a powerful statistical machine learning tool to construct uncertainty sets with coverage guarantees, which has fueled its extensive adoption in generating prediction regions for decision-making tasks, e.g., Trajectory Optimization (TO) in uncertain environments. However, existing methods predominantly employ a sequential scheme, where decisions rely unidirectionally on the prediction regions, and consequently the information from decision-making fails to be fed back to instruct CP. In this paper, we propose a novel Feedback-Based CP (Fb-CP) framework for shrinking-horizon TO with a joint risk constraint over the entire mission time. Specifically, a CP-based posterior risk calculation method is developed by fully leveraging the realized trajectories to adjust the posterior allowable risk, which is then allocated to future times to update prediction regions. In this way, the information in the realized trajectories is continuously fed back to the CP, enabling attractive feedback-based adjustments of the prediction regions and a provable online improvement in trajectory performance. Furthermore, we theoretically prove that such adjustments consistently maintain the coverage guarantees of the prediction regions, thereby ensuring provable safety. Additionally, we develop a decision-focused iterative risk allocation algorithm with theoretical convergence analysis for allocating the posterior allowable risk which closely aligns with Fb-CP. Furthermore, we extend the proposed method to handle distribution shift. The effectiveness and superiority of the proposed method are demonstrated through benchmark experiments.
comment: Accepted by NeurIPS 2025 Main Track
Supervisory Control of Hybrid Power Plants Using Online Feedback Optimization: Designs and Validations with a Hybrid Co-Simulation Engine
This research investigates designing a supervisory feedback controller for a hybrid power plant that coordinates the wind, solar, and battery energy storage plants to meet the desired power demands. We have explored an online feedback control design that does not require detailed knowledge about the models, known as feedback optimization. The control inputs are updated using the gradient information of the cost and the outputs with respect to the input control commands. This enables us to adjust the active power references of wind, solar, and storage plants to meet the power generation requirements set by grid operators. The methodology also ensures robust control performance in the presence of uncertainties in the weather. In this paper, we focus on describing the supervisory feedback optimization formulation and control-oriented modeling for individual renewable and storage components of the hybrid power plant. The proposed supervisory control has been integrated with the hybrid plant co-simulation engine, Hercules, demonstrating its effectiveness in more realistic simulation scenarios.
comment: 20 pages, 9 figures
Predictability of Complex Systems
The study of complex systems has attracted widespread attention from researchers in the fields of natural sciences, social sciences, and engineering. Prediction is one of the central issues in this field. Although most related studies have focused on prediction methods, research on the predictability of complex systems has received increasing attention across disciplines--aiming to provide theories and tools to address a key question: What are the limits of prediction accuracy? Predictability itself can serve as an important feature for characterizing complex systems, and accurate estimation of predictability can provide a benchmark for the study of prediction algorithms. This allows researchers to clearly identify the gap between current prediction accuracy and theoretical limits, thereby helping them determine whether there is still significant room to improve existing algorithms. More importantly, investigating predictability often requires the development of new theories and methods, which can further inspire the design of more effective algorithms. Over the past few decades, this field has undergone significant evolution. In particular, the rapid development of data science has introduced a wealth of data-driven approaches for understanding and quantifying predictability. This review summarizes representative achievements, integrating both data-driven and mechanistic perspectives. After a brief introduction to the significance of the topic in focus, we will explore three core aspects: the predictability of time series, the predictability of network structures, and the predictability of dynamical processes. Finally, we will provide extensive application examples across various fields and outline open challenges for future research.
AC Dynamics-aware Trajectory Optimization with Binary Enforcement for Adaptive UFLS Design
The high penetration of distributed energy resources, resulting in backfeed of power at the transmission and distribution interface, is causing conventional underfrequency load shedding (UFLS) schemes to become nonconforming. Adaptive schemes that update UFLS relay settings recursively in time offer a solution, but existing adaptive techniques that obtain UFLS relay settings with linearized or reduced-order model formulations fail to capture AC nonlinear network behavior. In practice, this will result in relays unable to restore system frequency during adverse disturbances. We formulate an adaptive UFLS problem as a trajectory optimization and include the full AC nonlinear network dynamics to ensure AC feasibility and time-coordinated control actions. We include binary decisions to model relay switching action and time-delayed multi-stage load-shedding. However, this formulation results in an intractable MINLP problem. To enforce model tractability, we relax these binary variables into continuous surrogates and reformulate the MINLP as a sequence of NLPs. We solve the NLPs with a homotopy-driven method that enforces near-integer-feasible solutions. We evaluate the framework on multiple synthetic transmission systems and demonstrate that it scales efficiently to networks exceeding 1500+ nodes with over 170k+ continuous and 73k+ binary decision variables, while successfully recovering binary-feasible solutions that arrest the frequency decline during worst-case disturbance.
Towards Smart Manufacturing Metaverse via Digital Twinning in Extended Reality
The rapid evolution of modern manufacturing systems is driven by the integration of emerging metaverse technologies such as artificial intelligence (AI), digital twin (DT) with different forms of extended reality (XR) like virtual reality (VR), augmented reality (AR), and mixed reality (MR). These advances confront manufacturing workers with complex and evolving environments that demand digital literacy for problem solving in the future workplace. However, manufacturing industry faces a critical shortage of skilled workforce with digital literacy in the world. Further, global pandemic has significantly changed how people work and collaborate digitally and remotely. There is an urgent need to rethink digital platformization and leverage emerging technologies to propel industrial evolution toward human-centered manufacturing metaverse (MfgVerse). This paper presents a forward-looking perspective on the development of smart MfgVerse, highlighting current efforts in learning factory, cognitive digital twinning, and the new sharing economy of manufacturing-as-a-service (MaaS). MfgVerse is converging into multiplex networks, including a social network of human stakeholders, an interconnected network of manufacturing things or agents (e.g., machines, robots, facilities, material handling systems), a network of digital twins of physical things, as well as auxiliary networks of sales, supply chain, logistics, and remanufacturing systems. We also showcase the design and development of a learning factory for workforce training in extended reality. Finally, future directions, challenges, and opportunities are discussed for human-centered manufacturing metaverse. We hope this work helps stimulate more comprehensive studies and in-depth research efforts to advance MfgVerse technologies.
An ANN-Enhanced Approach for Flatness-Based Constrained Control of Nonlinear Systems
Neural networks have proven practical for a synergistic combination of advanced control techniques. This work analyzes the implementation of rectified linear unit neural networks to achieve constrained control in differentially flat systems. Specifically, the class of flat systems enjoys the benefit of feedback linearizability, i.e., the systems can be linearized by means of a proper variable transformation. However, the price for linearizing the dynamics is that the constraint descriptions are distorted geometrically. Our results show that, by using neural networks, these constraints can be represented as a union of polytopes, enabling the use of mixed-integer programming tools to guarantee constraint satisfaction. We further analyze the integration of the characterization into efficient settings such as control Lyapunov function-based and model predictive control (MPC). Interestingly, this description also allows us to explicitly compute the solution of the MPC problem for the nonlinear system. Several examples are provided to illustrate the effectiveness of our framework.
FlipDyn with Control: Resource Takeover Games with Dynamics
We introduce FlipDyn with control, a finite-horizon zero-sum resource takeover game, where a defender and an adversary decide when to takeover and how to control a common resource. At each discrete-time step, the players can take over or retain control, incurring state and control-dependent costs. The system is modeled as a hybrid dynamical system, with a discrete \texttt{FlipDyn} state determining control authority. Our contributions are: (i) For arbitrary non-negative costs, we derive the saddle-point value of the \texttt{FlipDyn} game and the corresponding Nash equilibria (NE) takeover strategies. (ii) For linear dynamical systems with quadratic costs, we establish sufficient conditions under which the game admits an NE. (iii) For scalar linear dynamical systems with quadratic costs, we derive parameterized NE takeover strategies and saddle-point values independent of the continuous state. (iv) For higher-dimensional linear dynamical systems with quadratic costs, we derive approximate NE takeover strategies and control policies, and compute bounds on the saddle-point values. We validate our results through a numerical study on adversarial control of a linear system.
comment: 17 Pages, 2 figures. Under review at IEEE TAC
Physics-Informed Deep B-Spline Networks
Physics-informed machine learning offers a promising framework for solving complex partial differential equations (PDEs) by integrating observational data with governing physical laws. However, learning PDEs with varying parameters and changing initial conditions and boundary conditions (ICBCs) with theoretical guarantees remains an open challenge. In this paper, we propose physics-informed deep B-spline networks, a novel technique that approximates a family of PDEs with different parameters and ICBCs by learning B-spline control points through neural networks. The proposed B-spline representation reduces the learning task from predicting solution values over the entire domain to learning a compact set of control points, enforces strict compliance to initial and Dirichlet boundary conditions by construction, and enables analytical computation of derivatives for incorporating PDE residual losses. While existing approximation and generalization theories are not applicable in this setting - where solutions of parametrized PDE families are represented via B-spline bases - we fill this gap by showing that B-spline networks are universal approximators for such families under mild conditions. We also derive generalization error bounds for physics-informed learning in both elliptic and parabolic PDE settings, establishing new theoretical guarantees. Finally, we demonstrate in experiments that the proposed technique has improved efficiency-accuracy tradeoffs compared to existing techniques in a dynamical system problem with discontinuous ICBCs and can handle nonhomogeneous ICBCs and non-rectangular domains.
Passivity-Based Robust Shape Control of a Cable-Driven Solar Sail Boom for the CABLESSail Concept
Solar sails provide a means of propulsion using solar radiation pressure, which offers the possibility of exciting new spacecraft capabilities. However, solar sails have attitude control challenges because of the significant disturbance torques that they encounter due to imperfections in the sail and its supporting structure, as well as limited actuation capabilities. The Cable-Actuated Bio-inspired Lightweight Elastic Solar Sail (CABLESSail) concept was previously proposed to overcome these challenges by controlling the shape of the sail through cable actuation. The structural flexibility of CABLESSail introduces control challenges, which necessitate the design of a robust feedback controller for this system. The purpose of the proposed research here is to design a robust controller to ensure precise and reliable control of CABLESSail's boom. Taking into account the system dynamics and the dynamic properties of the CABLESSail concept, a passivity-based proportional-derivative (PD) controller for a single boom on the CABLESSail system is designed. To reach the nonzero desired setpoints, a feedforward input is additionally applied to the control law and a time-varying feedforward input is used instead of the constant one to effectively track a time-varying desired boom tip deflection. This control law is assessed by numerical simulations and by tests using a smaller-scale prototype of Solar Cruiser. Both the simulation and the test results show that this PD control with the time-varying feedforward input robustly controls the flexible cable-actuated solar sail.
comment: Submitted to Acta Astronautica
Whole-Body Model-Predictive Control of Legged Robots with MuJoCo
We demonstrate the surprising real-world effectiveness of a very simple approach to whole-body model-predictive control (MPC) of quadruped and humanoid robots: the iterative LQR (iLQR) algorithm with MuJoCo dynamics and finite-difference approximated derivatives. Building upon the previous success of model-based behavior synthesis and control of locomotion and manipulation tasks with MuJoCo in simulation, we show that these policies can easily generalize to the real world with few sim-to-real considerations. Our baseline method achieves real-time whole-body MPC on a variety of hardware experiments, including dynamic quadruped locomotion, quadruped walking on two legs, and full-sized humanoid bipedal locomotion. We hope this easy-to-reproduce hardware baseline lowers the barrier to entry for real-world whole-body MPC research and contributes to accelerating research velocity in the community. Our code and experiment videos will be available online at:https://johnzhang3.github.io/mujoco_ilqr
comment: under review
The impact of large-scale EV charging on the real-time operation of distribution systems: A comprehensive review
With the large-scale integration of electric vehicles (EVs) in the distribution grid, the unpredictable nature of EV charging introduces considerable uncertainties to the grid's real-time operations. This can exacerbate load fluctuations, compromise power quality, and pose risks to the grid's stability and security. However, due to their dual role as controllable loads and energy storage devices, EVs have the potential to mitigate these fluctuations, balance the variability of renewable energy sources, and provide ancillary services that support grid stability. By leveraging the bidirectional flow of information and energy in smart grids, the adverse effects of EV charging can be minimized and even converted into beneficial outcomes through effective real-time management strategies. This paper explores the negative impacts of EV charging on the distribution system's real-time operations and outlines methods to transform these challenges into positive contributions. Additionally, it provides an in-depth analysis of the real-time management system for EV charging, focusing on state estimation and management strategies.
Guided Multi-Fidelity Bayesian Optimization for Data-driven Controller Tuning with Digital Twins
We propose a \textit{guided multi-fidelity Bayesian optimization} framework for data-efficient controller tuning that integrates corrected digital twin simulations with real-world measurements. The method targets closed-loop systems with limited-fidelity simulations or inexpensive approximations. To address model mismatch, we build a multi-fidelity surrogate with a learned correction model that refines digital twin estimates using real data. An adaptive cost-aware acquisition function balances expected improvement, fidelity, and sampling cost. Our method ensures adaptability as new measurements arrive. The digital twin accuracy is re-estimated, dynamically adapting both cross-source correlations and the acquisition function. This ensures that accurate simulations are used more frequently, while inaccurate simulation data are appropriately downweighted. Experiments on robotic drive hardware and supporting numerical studies demonstrate that our method enhances tuning efficiency compared to standard Bayesian optimization and multi-fidelity methods.
comment: This work has been submitted to IEEE Robotics and Automation Letters (RA-L) for review
Axial current as the origin of quantum intrinsic orbital angular momentum
We show that the axial current density is the physical origin (generator) of quantum intrinsic orbital angular momentum (IOAM). Without the axial current, the IOAM of particles vanishes. Broadly speaking, we argue that the spiral or interference characteristics of the axial current density determine the occurrence of nonlinear or tunneling effects in any spacetime-dependent quantum systems. Our findings offer a comprehensive theoretical framework that addresses the limitations of Keldysh's ionization theory and provides new insights into the angular momentum properties of quantum systems, particularly in tunneling-dominated regimes. Using Wigner function methods, fermionic generalized two-level model, and Berry phase simulations, we predict that IOAM effect can persist even in pure quantum tunneling processes. These results open the door for experimental verification of IOAM effects in future high-intensity QED experiments, such as those using X-ray free electron lasers.
comment: 8 pages, 2 figures
Systems and Control (EESS)
Robust Dynamic Staffing with Predictions
We consider a natural dynamic staffing problem in which a decision-maker sequentially hires workers over a finite horizon to meet an unknown demand revealed at the end. Predictions about demand arrive over time and become increasingly accurate, while worker availability decreases. This creates a fundamental trade-off between hiring early to avoid understaffing (when workers are more available but forecasts are less reliable) and hiring late to avoid overstaffing (when forecasts are more accurate but availability is lower). This problem is motivated by last-mile delivery operations, where companies such as Amazon rely on gig-economy workers whose availability declines closer to the operating day. To address practical limitations of Bayesian models (in particular, to remain agnostic to the underlying forecasting method), we study this problem under adversarial predictions. In this model, sequential predictions are adversarially chosen uncertainty intervals that (approximately) contain the true demand. The objective is to minimize worst-case staffing imbalance cost. Our main result is a simple and computationally efficient online algorithm that is minimax optimal. We first characterize the minimax cost against a restricted adversary via a polynomial-size linear program, then show how to emulate this solution in the general case. While our base model focuses on a single demand, we extend the framework to multiple demands (with egalitarian/utilitarian objectives), to settings with costly reversals of hiring decisions, and to inconsistent prediction intervals. We also introduce a practical "re-solving" variant of our algorithm, which we prove is also minimax optimal. Finally we conduct numerical experiments showing that our algorithms outperform Bayesian heuristics in both cost and speed, and are competitive with (approximate or exact) Bayesian-optimal policies when those can be computed.
Adversarial Reinforcement Learning for Robust Control of Fixed-Wing Aircraft under Model Uncertainty
This paper presents a reinforcement learning-based path-following controller for a fixed-wing small uncrewed aircraft system (sUAS) that is robust to uncertainties in the aerodynamic model of the sUAS. The controller is trained using the Robust Adversarial Reinforcement Learning framework, where an adversary perturbs the environment (aerodynamic model) to expose the agent (sUAS) to demanding scenarios. In our formulation, the adversary introduces rate-bounded perturbations to the aerodynamic model coefficients. We demonstrate that adversarial training improves robustness compared to controllers trained using stochastic model uncertainty. The learned controller is also benchmarked against a switched uncertain initial condition controller. The effectiveness of the approach is validated through high-fidelity simulations using a realistic six-degree-of-freedom fixed-wing aircraft model, showing accurate and robust path-following performance under a variety of uncertain aerodynamic conditions.
QRTlib: A Library for Fast Quantum Real Transforms
Real-valued transforms such as the discrete cosine, sine, and Hartley transforms play a central role in classical computing, complementing the Fourier transform in applications from signal and image processing to data compression. However, their quantum counterparts have not evolved in parallel, and no unified framework exists for implementing them efficiently on quantum hardware. This article addresses this gap by introducing QRTlib, a library for fast and practical implementations of quantum real transforms, including the quantum Hartley, cosine, and sine transforms of various types. We develop new algorithms and circuit optimizations that make these transforms efficient and suitable for near-term devices. In particular, we present a quantum Hartley transform based on the linear combination of unitaries (LCU) technique, achieving a fourfold reduction in circuit size compared to prior methods, and an improved quantum sine transform of Type I that removes large multi-controlled operations. We also introduce circuit-level optimizations, including two's-complement and or-tree constructions. QRTlib provides the first complete implementations of these quantum real transforms in Qiskit.
Towards Intelligent Traffic Signaling in Dhaka City Based on Vehicle Detection and Congestion Optimization
The vehicular density in urbanizing cities of developing countries such as Dhaka, Bangladesh result in a lot of traffic congestion, causing poor on-road experiences. Traffic signaling is a key component in effective traffic management for such situations, but the advancements in intelligent traffic signaling have been exclusive to developed countries with structured traffic. The non-lane-based, heterogeneous traffic of Dhaka City requires a contextual approach. This study focuses on the development of an intelligent traffic signaling system feasible in the context of developing countries such as Bangladesh. We propose a pipeline leveraging Real Time Streaming Protocol (RTSP) feeds, a low resources system Raspberry Pi 4B processing, and a state of the art YOLO-based object detection model trained on the Non-lane-based and Heterogeneous Traffic (NHT-1071) dataset to detect and classify heterogeneous traffic. A multi-objective optimization algorithm, NSGA-II, then generates optimized signal timings, minimizing waiting time while maximizing vehicle throughput. We test our implementation in a five-road intersection at Palashi, Dhaka, demonstrating the potential to significantly improve traffic management in similar situations. The developed testbed paves the way for more contextual and effective Intelligent Traffic Signaling (ITS) solutions for developing areas with complicated traffic dynamics such as Dhaka City.
comment: 10 pages, Submitted to IEEE Transactions on Intelligent Transportation Systems (T-ITS)
Enhancing Channel Estimation in RIS-aided Systems via Observation Matrix Design
Reconfigurable intelligent surfaces (RISs) have emerged as a promising technology for enhancing wireless communications through dense antenna arrays. Accurate channel estimation is critical to unlocking their full performance potential. To enhance RIS channel estimators, this paper proposes a novel observation matrix design scheme. Bayesian optimization framework is adopted to generate observation matrices that maximize the mutual information between received pilot signals and RIS channels. To solve the formulated problem efficiently, we develop an alternating Riemannian manifold optimization (ARMO) algorithm to alternately update the receiver combiners and RIS phase-shift matrices. An adaptive kernel training strategy is further introduced to iteratively refine the channel covariance matrix without requiring additional pilot resources. Simulation results demonstrate that the proposed ARMO-enhanced estimator achieves substantial gains in estimation accuracy over state-of-the-art methods.
comment: 5 pages, 2 figures
Topology-Aware Hybrid Wi-Fi/BLE Fingerprinting via Evidence-Theoretic Fusion and Persistent Homology
Indoor localization remains challenging in GNSS-denied environments due to multipath, device heterogeneity, and volatile radio conditions. We propose a topology-aware, hybrid Wi-Fi/BLE fingerprinting framework that (i) applies physically consistent RSS normalization (dBm z-scoring or dBm -> linear mW -> z-score), (ii) denoises streams with classical Bayesian filters (KF/UKF/PF), (iii) combines complementary regressors (Random Forest and weighted kNN with a diagonal Mahalanobis metric), (iv) performs evidence-theoretic fusion via Dempster-Shafer theory (DST), and (v) augments each sample with persistent-homology (PH) descriptors. The system outputs both (x, y) estimates and interpretable belief maps, and is engineered for microcontroller-class deployment with per-update cost O(T log M + log M + Mp + S). We evaluate on two heterogeneous datasets, including a new 1,200-sample ESP32 survey, and report ablations, robustness to test-only noise, and significance across 10 stratified splits. Under 10% synthetic RSS noise, the full pipeline attains 3.40 m (Dataset 1) and 2.45 m (Dataset 2) RMSE, improving a strong PF + RF baseline by about 37%. Averaged across splits, it yields 4.993 +/- 0.15 m versus 6.292 +/- 0.13 m (20.6% relative reduction; p < 0.001). In noise-free tests, accuracy tightens to 0.44 m and 0.32 m (up to 56% better). Compared with recent learning-heavy approaches that assume large site-specific datasets and GPU inference, our method delivers competitive accuracy with formal uncertainty quantification and low computational cost suitable for real-time deployment.
SMP-RCR: A Sparse Multipoint Moment Matching Method for RC Reduction
In post--layout circuit simulation, efficient model order reduction (MOR) for many--port resistor--capacitor (RC) circuits remains a crucial issue. The current mainstream MOR methods for such circuits include high--order moment matching methods and elimination methods. High-order moment matching methods--characterized by high accuracy, such as PRIMA and TurboMOR--tend to generate large dense reduced-order systems when the number of ports is large, which impairs the efficiency of MOR. Another common type of MOR method for many--port circuits is based on Gaussian elimination, with the SIP method as a representative. The main limitation of this method lies in the inadequate matching of high--order moments. In this paper, we propose a sparse multipoint moment matching method and present comprehensive theoretical analysis results regarding the multi--frequency high--order moment matching property. Meanwhile, to enhance the algorithm's efficiency, sparse control and deflation techniques are introduced to further optimize the algorithm. Numerical experiments demonstrated that, compared to SIP, the accuracy is improved by more than two orders of magnitude at high frequency points without adding many extra linear components. Compared to TurboMOR methods, our method achieves a speed improvement of more than twice while maintaining the same level of precision.
Small-Signal Stability Analysis of Power Systems by Implicit Multilinear Models
This paper proposes a new approach to perform small-signal stability analysis based on linearization of implicit multilinear models. Multilinear models describe the system dynamics by multilinear functions of state, input, and algebraic variables. Using suitable transformations of variables, they can also represent trigonometric functions, which often occur in power systems modeling. This allows tensor representations of grid-following and grid-forming power converters. This paper introduces small-signal stability analysis of equilibrium points based on implicit multilinear models using generalized eigenvalues. The generalized eigenvalues are computed from linear descriptor models of the linearized implicit multilinear model. The proposed approach is tested using a 3-bus network example, first by comparing time-domain simulations of the implicit multilinear model with those of the nonlinear model, and second by comparing the generalized eigenvalues with those of the linearized nonlinear model. The results show that the decomposed tensor representation of the implicit multilinear model allows for a faster linearization compared to conventional methods in MATLAB Simulink.
Single-Step Digital Backpropagation for O-band Coherent Transmission Systems
We demonstrate digital backpropagation-based compensation of fibre nonlinearities in the near-zero dispersion regime of the O-band. Single-step DBP effectively mitigates self-phase modulation, achieving SNR gains of up to 1.6 dB for 50 Gbaud PDM-256QAM transmission over a 2-span 151 km SMF-28 ULL fibre link.
comment: conference, 3 pages, 2 figures
Stabilization of Nonlinear Systems with State-Dependent Representation: From Model-Based to Direct Data-Driven Control
This paper presents a novel framework for stabilizing nonlinear systems represented in state-dependent form. We first reformulate the nonlinear dynamics as a state-dependent parameter-varying model and synthesize a stabilizing controller offline via tractable linear matrix inequalities (LMIs). The resulting controller guarantees local exponential stability, maintains robustness against disturbances, and provides an estimate of the region of attraction under input saturation. We then extend the formulation to the direct data-driven setting, where a known library of basis functions represents the dynamics with unknown coefficients consistent with noisy experimental data. By leveraging Petersen's lemma, we derive data-dependent LMIs that ensure stability and robustness for all systems compatible with the data. Numerical and physical experimental results validate that our approach achieves rigorous end-to-end guarantees on stability, robustness, and safety directly from finite data without explicit model identification.
AoI-Aware Task Offloading and Transmission Optimization for Industrial IoT Networks: A Branching Deep Reinforcement Learning Approach
In the Industrial Internet of Things (IIoT), the frequent transmission of large amounts of data over wireless networks should meet the stringent timeliness requirements. Particularly, the freshness of packet status updates has a significant impact on the system performance. In this paper, we propose an age-of-information (AoI)-aware multi-base station (BS) real-time monitoring framework to support extensive IIoT deployments. To meet the freshness requirements of IIoT, we formulate a joint task offloading and resource allocation optimization problem with the goal of minimizing long-term average AoI. Tackling the core challenges of combinatorial explosion in multi-BS decision spaces and the stochastic dynamics of IIoT systems is crucial, as these factors render traditional optimization methods intractable. Firstly, an innovative branching-based Dueling Double Deep Q-Network (Branching-D3QN) algorithm is proposed to effectively implement task offloading, which optimizes the convergence performance by reducing the action space complexity from exponential to linear levels. Then, an efficient optimization solution to resource allocation is proposed by proving the semi-definite property of the Hessian matrix of bandwidth and computation resources. Finally, we propose an iterative optimization algorithm for efficient joint task offloading and resource allocation to achieve optimal average AoI performance. Extensive simulations demonstrate that our proposed Branching-D3QN algorithm outperforms both state-of-the-art DRL methods and classical heuristics, achieving up to a 75% enhanced convergence speed and at least a 22% reduction in the long-term average AoI.
comment: 15 pages, 13 figures, submitted to IEEE journal for potential publication
Real-time Measurement-based Optimization for Distribution System Operation Considering Battery Voltage and Thermal Constraints SC
The secure operation of power distribution systems is challenged by the growing integration of distributed energy resources. Leveraging the flexibility of battery storage offers a cost-effective alternative to measures like generation curtailment, which results in energy losses. However, developing an effective operational model for battery storage is hindered by inaccurate grid models, unavailability of load data, nonlinear relationship between power injections and network states, intertemporal constraints, and complex electrochemical and thermal dynamics. To address these challenges, this paper proposes a data-driven operational control scheme for battery storage in distribution systems. Linear and convex quadratic operational constraints are constructed based on real-time distribution system and battery storage measurements. Lyapunov optimization decouples multi-period battery operation, enabling a real-time, forecast-free control strategy with low computational complexity. Numerical studies using nonlinear distribution system and battery storage simulators validate the effectiveness of the approach in ensuring secure distribution system operation and satisfaction of voltage and thermal constraints of battery storage.
comment: 7 pages, submitted to PSCC 2026
Iterative solvers for partial differential equations with dissipative structure: Operator preconditioning and optimal control
This work considers the iterative solution of large-scale problems subject to non-symmetric matrices or operators arising in discretizations of (port-)Hamiltonian partial differential equations. We consider problems governed by an operator $\mathcal{A}=\mathcal{H}+\mathcal{S}$ with symmetric part $\mathcal{H}$ that is positive (semi-)definite and skew-symmetric part $\mathcal{S}$. Prior work has shown that the structure and sparsity of the associated linear system enables Krylov subspace solvers such as the generalized minimal residual method (GMRES) or short recurrence variants such as Widlund's or Rapoport's method using the symmetric part $\mathcal{H}$, or an approximation of it, as preconditioner. In this work, we analyze the resulting condition numbers, which are crucial for fast convergence of these methods, for various partial differential equations (PDEs) arising in diffusion phenomena, fluid dynamics, and elasticity. We show that preconditioning with the symmetric part leads to a condition number uniform in the mesh size in case of elliptic and parabolic PDEs where $\mathcal{H}^{-1}\mathcal{S}$ is a bounded operator. Further, we employ the tailored Krylov subspace methods in optimal control by means of a condensing approach and a constraint preconditioner for the optimality system. We illustrate the results by various large-scale numerical examples and discuss efficient evaluations of the preconditioner, such as incomplete Cholesky factorization or the algebraic multigrid method.
comment: 26 pages, 8 figures
Adaptive Sensing Performance Design for Enhancing Secure Communication in Networked ISAC Systems
The channel state information (CSI) of an eavesdropper is crucial for physical layer security (PLS) design, but it is difficult to obtain due to the passive and non-cooperative nature of the eavesdropper. To this end, integrated sensing and communication (ISAC) offers a novel solution by estimating the CSI of the eavesdropper based on sensing information. However, existing studies normally impose explicit and fixed sensing performance requirement without considering the varying communication conditions, which hinders the system from fully exploiting the synergy between sensing and communication. To address this issue, this paper proposes sensing-enhanced secure communication with adaptive sensing performance. Specifically, we formulate the sensing performance implicitly in the information leakage rate and adaptively optimize it for the minimization of the power consumption, offering enhanced flexibility and adaptability in sensing performance. We consider both centralized and decentralized designs to thoroughly investigate the impact of network structure on system performance and complexity. Specifically, we devise a block coordinate descent (BCD)-based method for centralized design. For decentralized design, we develop an optimization framework based on consensus alternating direction method of multipliers (ADMM) to reduce complexity and information exchange overhead. Experimental results demonstrate the advantage of the proposed implicit sensing performance requirement design due to its capability to adaptively adjust the sensing performance to enhance the system performance for varying system configurations.
comment: 16 pages
Conformal Prediction in The Loop: A Feedback-Based Uncertainty Model for Trajectory Optimization NeurIPS 2025
Conformal Prediction (CP) is a powerful statistical machine learning tool to construct uncertainty sets with coverage guarantees, which has fueled its extensive adoption in generating prediction regions for decision-making tasks, e.g., Trajectory Optimization (TO) in uncertain environments. However, existing methods predominantly employ a sequential scheme, where decisions rely unidirectionally on the prediction regions, and consequently the information from decision-making fails to be fed back to instruct CP. In this paper, we propose a novel Feedback-Based CP (Fb-CP) framework for shrinking-horizon TO with a joint risk constraint over the entire mission time. Specifically, a CP-based posterior risk calculation method is developed by fully leveraging the realized trajectories to adjust the posterior allowable risk, which is then allocated to future times to update prediction regions. In this way, the information in the realized trajectories is continuously fed back to the CP, enabling attractive feedback-based adjustments of the prediction regions and a provable online improvement in trajectory performance. Furthermore, we theoretically prove that such adjustments consistently maintain the coverage guarantees of the prediction regions, thereby ensuring provable safety. Additionally, we develop a decision-focused iterative risk allocation algorithm with theoretical convergence analysis for allocating the posterior allowable risk which closely aligns with Fb-CP. Furthermore, we extend the proposed method to handle distribution shift. The effectiveness and superiority of the proposed method are demonstrated through benchmark experiments.
comment: Accepted by NeurIPS 2025 Main Track
Supervisory Control of Hybrid Power Plants Using Online Feedback Optimization: Designs and Validations with a Hybrid Co-Simulation Engine
This research investigates designing a supervisory feedback controller for a hybrid power plant that coordinates the wind, solar, and battery energy storage plants to meet the desired power demands. We have explored an online feedback control design that does not require detailed knowledge about the models, known as feedback optimization. The control inputs are updated using the gradient information of the cost and the outputs with respect to the input control commands. This enables us to adjust the active power references of wind, solar, and storage plants to meet the power generation requirements set by grid operators. The methodology also ensures robust control performance in the presence of uncertainties in the weather. In this paper, we focus on describing the supervisory feedback optimization formulation and control-oriented modeling for individual renewable and storage components of the hybrid power plant. The proposed supervisory control has been integrated with the hybrid plant co-simulation engine, Hercules, demonstrating its effectiveness in more realistic simulation scenarios.
comment: 20 pages, 9 figures
Predictability of Complex Systems
The study of complex systems has attracted widespread attention from researchers in the fields of natural sciences, social sciences, and engineering. Prediction is one of the central issues in this field. Although most related studies have focused on prediction methods, research on the predictability of complex systems has received increasing attention across disciplines--aiming to provide theories and tools to address a key question: What are the limits of prediction accuracy? Predictability itself can serve as an important feature for characterizing complex systems, and accurate estimation of predictability can provide a benchmark for the study of prediction algorithms. This allows researchers to clearly identify the gap between current prediction accuracy and theoretical limits, thereby helping them determine whether there is still significant room to improve existing algorithms. More importantly, investigating predictability often requires the development of new theories and methods, which can further inspire the design of more effective algorithms. Over the past few decades, this field has undergone significant evolution. In particular, the rapid development of data science has introduced a wealth of data-driven approaches for understanding and quantifying predictability. This review summarizes representative achievements, integrating both data-driven and mechanistic perspectives. After a brief introduction to the significance of the topic in focus, we will explore three core aspects: the predictability of time series, the predictability of network structures, and the predictability of dynamical processes. Finally, we will provide extensive application examples across various fields and outline open challenges for future research.
AC Dynamics-aware Trajectory Optimization with Binary Enforcement for Adaptive UFLS Design
The high penetration of distributed energy resources, resulting in backfeed of power at the transmission and distribution interface, is causing conventional underfrequency load shedding (UFLS) schemes to become nonconforming. Adaptive schemes that update UFLS relay settings recursively in time offer a solution, but existing adaptive techniques that obtain UFLS relay settings with linearized or reduced-order model formulations fail to capture AC nonlinear network behavior. In practice, this will result in relays unable to restore system frequency during adverse disturbances. We formulate an adaptive UFLS problem as a trajectory optimization and include the full AC nonlinear network dynamics to ensure AC feasibility and time-coordinated control actions. We include binary decisions to model relay switching action and time-delayed multi-stage load-shedding. However, this formulation results in an intractable MINLP problem. To enforce model tractability, we relax these binary variables into continuous surrogates and reformulate the MINLP as a sequence of NLPs. We solve the NLPs with a homotopy-driven method that enforces near-integer-feasible solutions. We evaluate the framework on multiple synthetic transmission systems and demonstrate that it scales efficiently to networks exceeding 1500+ nodes with over 170k+ continuous and 73k+ binary decision variables, while successfully recovering binary-feasible solutions that arrest the frequency decline during worst-case disturbance.
Towards Smart Manufacturing Metaverse via Digital Twinning in Extended Reality
The rapid evolution of modern manufacturing systems is driven by the integration of emerging metaverse technologies such as artificial intelligence (AI), digital twin (DT) with different forms of extended reality (XR) like virtual reality (VR), augmented reality (AR), and mixed reality (MR). These advances confront manufacturing workers with complex and evolving environments that demand digital literacy for problem solving in the future workplace. However, manufacturing industry faces a critical shortage of skilled workforce with digital literacy in the world. Further, global pandemic has significantly changed how people work and collaborate digitally and remotely. There is an urgent need to rethink digital platformization and leverage emerging technologies to propel industrial evolution toward human-centered manufacturing metaverse (MfgVerse). This paper presents a forward-looking perspective on the development of smart MfgVerse, highlighting current efforts in learning factory, cognitive digital twinning, and the new sharing economy of manufacturing-as-a-service (MaaS). MfgVerse is converging into multiplex networks, including a social network of human stakeholders, an interconnected network of manufacturing things or agents (e.g., machines, robots, facilities, material handling systems), a network of digital twins of physical things, as well as auxiliary networks of sales, supply chain, logistics, and remanufacturing systems. We also showcase the design and development of a learning factory for workforce training in extended reality. Finally, future directions, challenges, and opportunities are discussed for human-centered manufacturing metaverse. We hope this work helps stimulate more comprehensive studies and in-depth research efforts to advance MfgVerse technologies.
An ANN-Enhanced Approach for Flatness-Based Constrained Control of Nonlinear Systems
Neural networks have proven practical for a synergistic combination of advanced control techniques. This work analyzes the implementation of rectified linear unit neural networks to achieve constrained control in differentially flat systems. Specifically, the class of flat systems enjoys the benefit of feedback linearizability, i.e., the systems can be linearized by means of a proper variable transformation. However, the price for linearizing the dynamics is that the constraint descriptions are distorted geometrically. Our results show that, by using neural networks, these constraints can be represented as a union of polytopes, enabling the use of mixed-integer programming tools to guarantee constraint satisfaction. We further analyze the integration of the characterization into efficient settings such as control Lyapunov function-based and model predictive control (MPC). Interestingly, this description also allows us to explicitly compute the solution of the MPC problem for the nonlinear system. Several examples are provided to illustrate the effectiveness of our framework.
FlipDyn with Control: Resource Takeover Games with Dynamics
We introduce FlipDyn with control, a finite-horizon zero-sum resource takeover game, where a defender and an adversary decide when to takeover and how to control a common resource. At each discrete-time step, the players can take over or retain control, incurring state and control-dependent costs. The system is modeled as a hybrid dynamical system, with a discrete \texttt{FlipDyn} state determining control authority. Our contributions are: (i) For arbitrary non-negative costs, we derive the saddle-point value of the \texttt{FlipDyn} game and the corresponding Nash equilibria (NE) takeover strategies. (ii) For linear dynamical systems with quadratic costs, we establish sufficient conditions under which the game admits an NE. (iii) For scalar linear dynamical systems with quadratic costs, we derive parameterized NE takeover strategies and saddle-point values independent of the continuous state. (iv) For higher-dimensional linear dynamical systems with quadratic costs, we derive approximate NE takeover strategies and control policies, and compute bounds on the saddle-point values. We validate our results through a numerical study on adversarial control of a linear system.
comment: 17 Pages, 2 figures. Under review at IEEE TAC
Physics-Informed Deep B-Spline Networks
Physics-informed machine learning offers a promising framework for solving complex partial differential equations (PDEs) by integrating observational data with governing physical laws. However, learning PDEs with varying parameters and changing initial conditions and boundary conditions (ICBCs) with theoretical guarantees remains an open challenge. In this paper, we propose physics-informed deep B-spline networks, a novel technique that approximates a family of PDEs with different parameters and ICBCs by learning B-spline control points through neural networks. The proposed B-spline representation reduces the learning task from predicting solution values over the entire domain to learning a compact set of control points, enforces strict compliance to initial and Dirichlet boundary conditions by construction, and enables analytical computation of derivatives for incorporating PDE residual losses. While existing approximation and generalization theories are not applicable in this setting - where solutions of parametrized PDE families are represented via B-spline bases - we fill this gap by showing that B-spline networks are universal approximators for such families under mild conditions. We also derive generalization error bounds for physics-informed learning in both elliptic and parabolic PDE settings, establishing new theoretical guarantees. Finally, we demonstrate in experiments that the proposed technique has improved efficiency-accuracy tradeoffs compared to existing techniques in a dynamical system problem with discontinuous ICBCs and can handle nonhomogeneous ICBCs and non-rectangular domains.
Passivity-Based Robust Shape Control of a Cable-Driven Solar Sail Boom for the CABLESSail Concept
Solar sails provide a means of propulsion using solar radiation pressure, which offers the possibility of exciting new spacecraft capabilities. However, solar sails have attitude control challenges because of the significant disturbance torques that they encounter due to imperfections in the sail and its supporting structure, as well as limited actuation capabilities. The Cable-Actuated Bio-inspired Lightweight Elastic Solar Sail (CABLESSail) concept was previously proposed to overcome these challenges by controlling the shape of the sail through cable actuation. The structural flexibility of CABLESSail introduces control challenges, which necessitate the design of a robust feedback controller for this system. The purpose of the proposed research here is to design a robust controller to ensure precise and reliable control of CABLESSail's boom. Taking into account the system dynamics and the dynamic properties of the CABLESSail concept, a passivity-based proportional-derivative (PD) controller for a single boom on the CABLESSail system is designed. To reach the nonzero desired setpoints, a feedforward input is additionally applied to the control law and a time-varying feedforward input is used instead of the constant one to effectively track a time-varying desired boom tip deflection. This control law is assessed by numerical simulations and by tests using a smaller-scale prototype of Solar Cruiser. Both the simulation and the test results show that this PD control with the time-varying feedforward input robustly controls the flexible cable-actuated solar sail.
comment: Submitted to Acta Astronautica
Whole-Body Model-Predictive Control of Legged Robots with MuJoCo
We demonstrate the surprising real-world effectiveness of a very simple approach to whole-body model-predictive control (MPC) of quadruped and humanoid robots: the iterative LQR (iLQR) algorithm with MuJoCo dynamics and finite-difference approximated derivatives. Building upon the previous success of model-based behavior synthesis and control of locomotion and manipulation tasks with MuJoCo in simulation, we show that these policies can easily generalize to the real world with few sim-to-real considerations. Our baseline method achieves real-time whole-body MPC on a variety of hardware experiments, including dynamic quadruped locomotion, quadruped walking on two legs, and full-sized humanoid bipedal locomotion. We hope this easy-to-reproduce hardware baseline lowers the barrier to entry for real-world whole-body MPC research and contributes to accelerating research velocity in the community. Our code and experiment videos will be available online at:https://johnzhang3.github.io/mujoco_ilqr
comment: under review
The impact of large-scale EV charging on the real-time operation of distribution systems: A comprehensive review
With the large-scale integration of electric vehicles (EVs) in the distribution grid, the unpredictable nature of EV charging introduces considerable uncertainties to the grid's real-time operations. This can exacerbate load fluctuations, compromise power quality, and pose risks to the grid's stability and security. However, due to their dual role as controllable loads and energy storage devices, EVs have the potential to mitigate these fluctuations, balance the variability of renewable energy sources, and provide ancillary services that support grid stability. By leveraging the bidirectional flow of information and energy in smart grids, the adverse effects of EV charging can be minimized and even converted into beneficial outcomes through effective real-time management strategies. This paper explores the negative impacts of EV charging on the distribution system's real-time operations and outlines methods to transform these challenges into positive contributions. Additionally, it provides an in-depth analysis of the real-time management system for EV charging, focusing on state estimation and management strategies.
Guided Multi-Fidelity Bayesian Optimization for Data-driven Controller Tuning with Digital Twins
We propose a \textit{guided multi-fidelity Bayesian optimization} framework for data-efficient controller tuning that integrates corrected digital twin simulations with real-world measurements. The method targets closed-loop systems with limited-fidelity simulations or inexpensive approximations. To address model mismatch, we build a multi-fidelity surrogate with a learned correction model that refines digital twin estimates using real data. An adaptive cost-aware acquisition function balances expected improvement, fidelity, and sampling cost. Our method ensures adaptability as new measurements arrive. The digital twin accuracy is re-estimated, dynamically adapting both cross-source correlations and the acquisition function. This ensures that accurate simulations are used more frequently, while inaccurate simulation data are appropriately downweighted. Experiments on robotic drive hardware and supporting numerical studies demonstrate that our method enhances tuning efficiency compared to standard Bayesian optimization and multi-fidelity methods.
comment: This work has been submitted to IEEE Robotics and Automation Letters (RA-L) for review
Axial current as the origin of quantum intrinsic orbital angular momentum
We show that the axial current density is the physical origin (generator) of quantum intrinsic orbital angular momentum (IOAM). Without the axial current, the IOAM of particles vanishes. Broadly speaking, we argue that the spiral or interference characteristics of the axial current density determine the occurrence of nonlinear or tunneling effects in any spacetime-dependent quantum systems. Our findings offer a comprehensive theoretical framework that addresses the limitations of Keldysh's ionization theory and provides new insights into the angular momentum properties of quantum systems, particularly in tunneling-dominated regimes. Using Wigner function methods, fermionic generalized two-level model, and Berry phase simulations, we predict that IOAM effect can persist even in pure quantum tunneling processes. These results open the door for experimental verification of IOAM effects in future high-intensity QED experiments, such as those using X-ray free electron lasers.
comment: 8 pages, 2 figures
Systems and Control (CS)
Bio-inspired Microgrid Management based on Brain's Sensorimotor Gating
Microgrids are emerging as key enablers of resilient, sustainable, and intelligent power systems, but they continue to face challenges in dynamic disturbance handling, protection coordination, and uncertainty. Recent efforts have explored Brain Emotional Learning (BEL) controllers as bio-inspired solutions for microgrid control. Building on this growing trajectory, this article introduces a new paradigm for Neuro-Microgrids, inspired by the brain's sensorimotor gating mechanisms, specifically the Prepulse Inhibition (PPI) and Prepulse Facilitation (PPF). Sensorimotor gating offers a biological model for selectively suppressing or amplifying responses depending on contextual relevance. By mapping these principles onto the hierarchical control architecture of microgrids, we propose a Sensorimotor Gating-Inspired Neuro-Microgrid (SG-NMG) framework. In this architecture, PPI-like control decisions correspond to protective damping in primary and secondary management of microgrids, whereas PPF-like decisions correspond to adaptive amplification of corrective control actions. The framework is presented through analytical workflow design, neuro-circuitry analogies, and integration with machine learning methods. Finally, open challenges and research directions are outlined, including the mathematical modeling of gating, digital twin validation, and cross-disciplinary collaboration between neuroscience and industrial power systems. The resulting paradigm highlights sensorimotor gating as a promising framework for designing self-protective, adaptive, and resilient microgrids.
Braking within Barriers: Constructive Safety-Critical Control for Input-Constrained Vehicles via the Backup Set Method
This paper presents a safety-critical control framework to maintain bounded lateral motions for vehicles braking on asymmetric surfaces. We synthesize a brake controller that assists drivers and guarantees safety against excessive lateral motions (i.e., prevents the vehicle from spinning out) while minimizing the stopping distance. We address this safety-critical control problem in the presence of input constraints, since braking forces are limited by the available friction on the road. We use backup control barrier functions for safe control design. As this approach requires the construction of a backup set and a backup controller, we propose a novel, systematic method to creating valid backup set-backup controller pairs based on feedback linearization and continuous-time Lyapunov equations. We use simple examples to demonstrate our proposed safety-critical control method. Finally, we implement our approach on a four-wheel vehicle model for braking on asymmetric surfaces and present simulation results.
comment: Submitted to the IEEE Transactions on Automation Science and Engineering. 14 pages, 10 figures
Cavity Duplexer Tuning with 1d Resnet-like Neural Networks
This paper presents machine learning method for tuning of cavity duplexer with a large amount of adjustment screws. After testing we declined conventional reinforcement learning approach and reformulated our task in the supervised learning setup. The suggested neural network architecture includes 1d ResNet-like backbone and processing of some additional information about S-parameters, like the shape of curve and peaks positions and amplitudes. This neural network with external control algorithm is capable to reach almost the tuned state of the duplexer within 4-5 rotations per screw.
Integrating Conductor Health into Dynamic Line Rating and Unit Commitment under Uncertainty
Dynamic line rating (DLR) enables greater utilization of existing transmission lines by leveraging real-time weather data. However, the elevated temperature operation (ETO) of conductors under DLR is often overlooked, despite its long-term impact on conductor health. This paper addresses this issue by 1) quantifying depreciation costs associated with ETO and 2) proposing a Conductor Health-Aware Unit Commitment (CHA-UC) that internalizes these costs in operational decisions. The CHA-UC incorporates a robust linear approximation of conductor temperature and integration of expected depreciation costs due to hourly ETO into the objective function. Case studies on the Texas 123-bus backbone test system using NOAA weather data demonstrate that the proposed CHA-UC model reduces the total cost by 0.8% and renewable curtailment by 84%compared to static line rating (SLR), while conventional DLR operation without risk consideration resulted in higher costs due to excessive ETO. Further analysis of the commitment decisions and the line temperature statistics confirms that the CHA-UC achieves safer line flows by shifting generator commitments. Finally, we examine the emergent correlation between wind generation and DLR forecast errors, and show that CHA-UC adaptively manages this effect by relaxing flows for risk-hedging conditions while tightening flows for risk-amplifying ones.
Sugar Shack 4.0: Practical Demonstration of an IIoT-Based Event-Driven Automation System
This paper presents a practical alternative to programmable-logic-controller-centric automation by implementing an event-driven architecture built with industrial Internet of Things tools. A layered design on a local edge server (i) abstracts actuators, (ii) enforces mutual exclusion of shared physical resources through an interlock with priority queueing, (iii) composes deterministic singular operations, and (iv) orchestrates complete workflows as state machines in Node-RED, with communication over MQTT. The device layer uses low-cost ESP32-based gateways to interface sensors and actuators, while all automation logic is offloaded to the server side. As part of a larger project involving the first scientifically-documented integration of Industry 4.0 technologies in a maple syrup boiling center, this work demonstrates the deployment of the proposed system as a case-study. Evaluation over an entire production season shows median message time of flight around one tenth of a second, command issuance-to-motion latencies of about two to three seconds, and command completion near six seconds dominated by actuator mechanics; operation runtimes span tens of seconds to minutes. These results indicate that network and orchestration overheads are negligible relative to process dynamics, enabling modular, distributed control without compromising determinism or fault isolation. The approach reduces material and integration effort, supports portable containerized deployment, and naturally enables an edge/cloud split in which persistence and analytics are offloaded while automation remains at the edge.
comment: 10 pages, 15 figures
Mitigating Underwater Noise from Offshore Wind Turbines via Individual Pitch Control
This paper proposes a pitch control strategy to mitigate the underwater acoustic footprint of offshore wind turbines, a measure that will soon become necessary to minimize impacts on marine life, which rely on sound for communication, navigation, and survival. First, we quantify the underwater acoustic signature of blade-generated aerodynamic noise from three reference turbines, the NREL 5 MW, DTU 10 MW, and IEA 22 MW, using coupling blade element momentum and coupled air-water acoustic propagation modeling. Second, we propose and implement an open-loop individual pitch control (IPC) strategy that modulates the pitch of the blade at the blade passing frequency to attenuate the overall sound pressure level (OSPL) and the amplitude modulation (AM) of the transmitted noise. Third, we benchmark IPC performance against conventional pitch schemes. The results indicate that up to 5 dB reductions in OSPL and a decrease in AM depth 20% can be achieved with a pitch variation of $\Delta\theta\approx 5^\circ$, with small losses (5-10%) in energy capture. These findings highlight a previously underappreciated noise pathway and demonstrate that targeted blade-pitch modulation can mitigate its impact.
Cross-border offshore hydrogen trade and carbon mitigation for Europe's net zero transition
European countries are ambitious in both the net-zero transition and offshore energy resource development. The Irish and UK governments announced their commitments to offshore wind capacities - 37 and 125 GW, respectively, in 2050, more than two times higher than their projected power demands. While other continental countries, such as Germany, are calling for cleaner fuel resources. Exporting surplus offshore green hydrogen and bridging supply and demand could be pivotal in carbon emission mitigation for Europe. Yet, the potentials of these Island countries, are usually underestimated. This paper developed a bottom-up method to investigate the role of offshore hydrogen from Ireland and the UK in the decarbonisation of the entire Europe. We evaluate the future hydrogen/ammonia trading and the contributions of each country in carbon emission mitigation, considering their relative cost-competitiveness in offshore hydrogen production, domestic hourly power and gas system operation, and international shipping costs. Results indicate that the offshore green hydrogen could reduce 175.16 Mt/year of carbon dioxide emissions in Europe. The UK will be the largest hydrogen supplier from 2030 to 2040, while surpassed by Ireland in 2050, with 161 TWh of hydrogen exports to France and Spain. The offshore green hydrogen can contribute to 175.16 Mt of annual carbon dioxide emission reductions in total. This general flow of hydrogen from the West to the East not only facilitates Europe's net-zero progress, but also reshapes the energy supply structure and helps to ensure energy security across the European continent.
Freehand 3D Ultrasound Imaging: Sim-in-the-Loop Probe Pose Optimization via Visual Servoing
Freehand 3D ultrasound (US) imaging using conventional 2D probes offers flexibility and accessibility for diverse clinical applications but faces challenges in accurate probe pose estimation. Traditional methods depend on costly tracking systems, while neural network-based methods struggle with image noise and error accumulation, compromising reconstruction precision. We propose a cost-effective and versatile solution that leverages lightweight cameras and visual servoing in simulated environments for precise 3D US imaging. These cameras capture visual feedback from a textured planar workspace. To counter occlusions and lighting issues, we introduce an image restoration method that reconstructs occluded regions by matching surrounding texture patterns. For pose estimation, we develop a simulation-in-the-loop approach, which replicates the system setup in simulation and iteratively minimizes pose errors between simulated and real-world observations. A visual servoing controller refines the alignment of camera views, improving translational estimation by optimizing image alignment. Validations on a soft vascular phantom, a 3D-printed conical model, and a human arm demonstrate the robustness and accuracy of our approach, with Hausdorff distances to the reference reconstructions of 0.359 mm, 1.171 mm, and 0.858 mm, respectively. These results confirm the method's potential for reliable freehand 3D US reconstruction.
Adaptive Legged Locomotion via Online Learning for Model Predictive Control
We provide an algorithm for adaptive legged locomotion via online learning and model predictive control. The algorithm is composed of two interacting modules: model predictive control (MPC) and online learning of residual dynamics. The residual dynamics can represent modeling errors and external disturbances. We are motivated by the future of autonomy where quadrupeds will autonomously perform complex tasks despite real-world unknown uncertainty, such as unknown payload and uneven terrains. The algorithm uses random Fourier features to approximate the residual dynamics in reproducing kernel Hilbert spaces. Then, it employs MPC based on the current learned model of the residual dynamics. The model is updated online in a self-supervised manner using least squares based on the data collected while controlling the quadruped. The algorithm enjoys sublinear \textit{dynamic regret}, defined as the suboptimality against an optimal clairvoyant controller that knows how the residual dynamics. We validate our algorithm in Gazebo and MuJoCo simulations, where the quadruped aims to track reference trajectories. The Gazebo simulations include constant unknown external forces up to $12\boldsymbol{g}$, where $\boldsymbol{g}$ is the gravity vector, in flat terrain, slope terrain with $20\degree$ inclination, and rough terrain with $0.25m$ height variation. The MuJoCo simulations include time-varying unknown disturbances with payload up to $8~kg$ and time-varying ground friction coefficients in flat terrain.
comment: 9 pages
A Predictive Flexibility Aggregation Method for Low Voltage Distribution System Control
This paper presents a predictive control strategy to manage low-voltage distribution systems. The proposed approach relies on an aggregate of the flexibility at the residential unit level into a three-dimensional chart that represents the injected active and reactive power, and the flexibility cost. First, this method solves a multiparametric optimization problem offline at the residential unit level to aggregate the flexibility of the assets. Then, a semi-explicit model predictive control problem is solved to account for forecasts. By combining the results of these problems with measurements, the method generates the desired flexibility chart. The proposed approach is compatible with realtime control requirements, as heavy computations are performed offline locally, making it naturally parallelizable. By linking realtime flexibility assessment with energy scheduling, our approach enables efficient, low-cost, and privacy-preserving management of low-voltage distribution systems. We validate this method on a low-voltage network of 5 buses by comparing it with an ideal technique.
comment: 8 pages, 6 figures
Observer Design over Hypercomplex Quaternions
We develop observer design over hypercomplex quaternions in a characteristic-polynomial-free framework. Using the standard right-module convention, we derive a right observable companion form and its companion polynomial that encodes error dynamics via right-eigenvalue similarity classes. The design mirrors the real/complex case - coefficient updates in companion coordinates, followed by a similarity back - yet avoids determinants, characteristic/minimal polynomials, and Cayley-Hamilton identities that do not transfer to quaternions. We also give an Ackermann-type construction for the important case of closed-loop companion polynomials with real coefficients, ensuring similarity-equivariant evaluation. The results yield simple recipes for full-order observers directly over quaternions, clarify the role of right spectra and their similarity classes, and pinpoint when classical one-shot formulas remain valid. Numerical examples illustrate the method and advantages over vectorized or complex-adjoint surrogates.
comment: Accepted for presentation at the 24th European Control Conference (ECC 2026), Reykjavik, Iceland. This work was co-funded by the European Union under the project ROBOPROX (reg. no. CZ.02.01.01/00/22 008/0004590)
Active Inverse Methods in Stackelberg Games with Bounded Rationality
Inverse game theory is utilized to infer the cost functions of all players based on game outcomes. However, existing inverse game theory methods do not consider the learner as an active participant in the game, which could significantly enhance the learning process. In this paper, we extend inverse game theory to active inverse methods. For Stackelberg games with bounded rationality, the leader, acting as a learner, actively chooses actions to better understand the follower's cost functions. First, we develop a method of active learning by leveraging Fisher information to maximize information gain about the unknown parameters and prove the consistency and asymptotic normality. Additionally, when leaders consider its cost, we develop a method of active inverse game to balance exploration and exploitation, and prove the consistency and asymptotic Stackelberg equilibrium with quadratic cost functions. Finally, we verify the properties of these methods through simulations in the quadratic case and demonstrate that the active inverse game method can achieve Stackelberg equilibrium more quickly through active exploration.
Hypergame-based Cognition Modeling and Intention Interpretation for Human-Driven Vehicles in Connected Mixed Traffic
With the practical implementation of connected and autonomous vehicles (CAVs), the traffic system is expected to remain a mix of CAVs and human-driven vehicles (HVs) for the foreseeable future. To enhance safety and traffic efficiency, the trajectory planning strategies of CAVs must account for the influence of HVs, necessitating accurate HV trajectory prediction. Current research often assumes that human drivers have perfect knowledge of all vehicles' objectives, an unrealistic premise. This paper bridges the gap by leveraging hypergame theory to account for cognitive and perception limitations in HVs. We model human bounded rationality without assuming them to be merely passive followers and propose a hierarchical cognition modeling framework that captures cognitive relationships among vehicles. We further analyze the cognitive stability of the system, proving that the strategy profile where all vehicles adopt cognitively equilibrium strategies constitutes a hyper Nash equilibrium when CAVs accurately learn HV parameters. To achieve this, we develop an inverse learning algorithm for distributed intention interpretation via vehicle-to-everything (V2X) communication, which extends the framework to both offline and online scenarios. Additionally, we introduce a distributed trajectory prediction and planning approach for CAVs, leveraging the learned parameters in real time. Simulations in highway lane-changing scenarios demonstrate the proposed method's accuracy in parameter learning, robustness to noisy trajectory observations, and safety in HV trajectory prediction. The results validate the effectiveness of our method in both offline and online implementations.
Hypergraph Contrastive Sensor Fusion for Multimodal Fault Diagnosis in Induction Motors
Reliable induction motor (IM) fault diagnosis is vital for industrial safety and operational continuity, mitigating costly unplanned downtime. Conventional approaches often struggle to capture complex multimodal signal relationships, are constrained to unimodal data or single fault types, and exhibit performance degradation under noisy or cross-domain conditions. This paper proposes the Multimodal Hypergraph Contrastive Attention Network (MM-HCAN), a unified framework for robust fault diagnosis. To the best of our knowledge, MM-HCAN is the first to integrate contrastive learning within a hypergraph topology specifically designed for multimodal sensor fusion, enabling the joint modelling of intra- and inter-modal dependencies and enhancing generalisation beyond Euclidean embedding spaces. The model facilitates simultaneous diagnosis of bearing, stator, and rotor faults, addressing the engineering need for consolidated di- agnostic capabilities. Evaluated on three real-world benchmarks, MM-HCAN achieves up to 99.82% accuracy with strong cross-domain generalisation and resilience to noise, demonstrating its suitability for real-world deployment. An ablation study validates the contribution of each component. MM-HCAN provides a scalable and robust solution for comprehensive multi-fault diagnosis, supporting predictive maintenance and extended asset longevity in industrial environments.
comment: Submitted to IEEE Sensors Journal
A Tsetlin Machine Image Classification Accelerator on a Flexible Substrate
This paper introduces the first implementation of digital Tsetlin Machines (TMs) on flexible integrated circuit (FlexIC) using Pragmatic's 600nm IGZO-based FlexIC technology. TMs, known for their energy efficiency, interpretability, and suitability for edge computing, have previously been limited by the rigidity of conventional silicon-based chips. We develop two TM inference models as FlexICs: one achieving 98.5% accuracy using 6800 NAND2 equivalent logic gates with an area of 8X8 mm2, and a second more compact version achieving slightly lower prediction accuracy of 93% but using only 1420 NAND2 equivalent gates with an area of 4X4 mm2, both of which are custom-designed for an 8X8-pixel handwritten digit recognition dataset. The paper demonstrates the feasibility of deploying flexible TM inference engines into wearable healthcare and edge computing applications.
comment: accepted by International Symposium on the Tsetlin Machine (ISTM) 2025
Modelling-driven requirements for Error Field Control Coil application to initial JT-60SA plasmas
JT-60SA is a large superconducting tokamak built in Naka, Japan. After the successful achievement of its first MA-class plasma, the installation of several additional sub-systems, including a set of non-axisymmetric Error Field Correction Coils (EFCC), is ongoing. Optimization of future JT-60SA plasma scenarios will critically depend on the correct use of EFCC, including careful fulfillment of system specifications. In addition to that, preparation and risk mitigation of early ITER operations will greatly benefit from the experience gained by early EFCC application to JT-60SA experiments, in particular to optimize error field detection and control strategies. In this work, EFCC application in JT-60SA Initial Research Phase I perspective scenarios is modeled including plasma response. Impact of (Resonant) Magnetic Perturbations on the different plasma scenarios is assessed for both core and pedestal regions by the linear resistive MHD code MARS-F. The dominant core response to EFs is discussed case by case and compared to mode locking thresholds from literature. Typical current/voltage amplitudes and wave-forms are then compared to EFCC specifications in order to assess a safe operational space.
Balancing Fairness and Performance in Multi-User Spark Workloads with Dynamic Scheduling (extended version) SoCC'25
Apache Spark is a widely adopted framework for large-scale data processing. However, in industrial analytics environments, Spark's built-in schedulers, such as FIFO and fair scheduling, struggle to maintain both user-level fairness and low mean response time, particularly in long-running shared applications. Existing solutions typically focus on job-level fairness which unintentionally favors users who submit more jobs. Although Spark offers a built-in fair scheduler, it lacks adaptability to dynamic user workloads and may degrade overall job performance. We present the User Weighted Fair Queuing (UWFQ) scheduler, designed to minimize job response times while ensuring equitable resource distribution across users and their respective jobs. UWFQ simulates a virtual fair queuing system and schedules jobs based on their estimated finish times under a bounded fairness model. To further address task skew and reduce priority inversions, which are common in Spark workloads, we introduce runtime partitioning, a method that dynamically refines task granularity based on expected runtime. We implement UWFQ within the Spark framework and evaluate its performance using multi-user synthetic workloads and Google cluster traces. We show that UWFQ reduces the average response time of small jobs by up to 74% compared to existing built-in Spark schedulers and to state-of-the-art fair scheduling algorithms.
comment: This paper is an extended version of a paper accepted at the ACM Symposium on Cloud Computing (SoCC'25) that contains a proof of correctness
Recursive Inference for Heterogeneous Multi-Output GP State-Space Models with Arbitrary Moment Matching
Accurate learning of system dynamics is becoming increasingly crucial for advanced control and decision-making in engineering. However, real-world systems often exhibit multiple channels and highly nonlinear transition dynamics, challenging traditional modeling methods. To enable online learning for these systems, this paper formulates the system as Gaussian process state-space models (GPSSMs) and develops a recursive learning method. The main contributions are threefold. First, a heterogeneous multi-output kernel is designed, allowing each output dimension to adopt distinct kernel types, hyperparameters, and input variables, improving expressiveness in multi-dimensional dynamics learning. Second, an inducing-point management algorithm enhances computational efficiency through independent selection and pruning for each output dimension. Third, a unified recursive inference framework for GPSSMs is derived, supporting general moment matching approaches, including the extended Kalman filter (EKF), unscented Kalman filter (UKF), and assumed density filtering (ADF), enabling accurate learning under strong nonlinearity and significant noise. Experiments on synthetic and real-world datasets show that the proposed method matches the accuracy of SOTA offline GPSSMs with only 1/100 of the runtime, and surpasses SOTA online GPSSMs by around 70% in accuracy under heavy noise while using only 1/20 of the runtime.
TranSimHub:A Unified Air-Ground Simulation Platform for Multi-Modal Perception and Decision-Making
Air-ground collaborative intelligence is becoming a key approach for next-generation urban intelligent transportation management, where aerial and ground systems work together on perception, communication, and decision-making. However, the lack of a unified multi-modal simulation environment has limited progress in studying cross-domain perception, coordination under communication constraints, and joint decision optimization. To address this gap, we present TranSimHub, a unified simulation platform for air-ground collaborative intelligence. TranSimHub offers synchronized multi-view rendering across RGB, depth, and semantic segmentation modalities, ensuring consistent perception between aerial and ground viewpoints. It also supports information exchange between the two domains and includes a causal scene editor that enables controllable scenario creation and counterfactual analysis under diverse conditions such as different weather, emergency events, and dynamic obstacles. We release TranSimHub as an open-source platform that supports end-to-end research on perception, fusion, and control across realistic air and ground traffic scenes. Our code is available at https://github.com/Traffic-Alpha/TranSimHub.
comment: 9 pages, 4 figures
Singularity-free dynamical invariants-based quantum control
State preparation is a cornerstone of quantum technologies, underpinning applications in computation, communication, and sensing. Its importance becomes even more pronounced in non-Markovian open quantum systems, where environmental memory and model uncertainties pose significant challenges to achieving high-fidelity control. Invariant-based inverse engineering provides a principled framework for synthesizing analytic control fields, yet existing parameterizations often lead to experimentally infeasible, singular pulses and are limited to simplified noise models such as those of Lindblad form. Here, we introduce a generalized invariant-based protocol for single-qubit state preparation under arbitrary noise conditions. The control proceeds in two-stages: first, we construct a family of bounded pulses that achieve perfect state preparation in a closed system; second, we identify the optimal member of this family that minimizes the effect of noise. The framework accommodates both (i) characterized noise, enabling noise-aware control synthesis, and (ii) uncharacterized noise, where a noise-agnostic variant preserves robustness without requiring a master-equation description. Numerical simulations demonstrate high-fidelity state preparation across diverse targets while producing smooth, hardware-feasible control fields. This singularity-free framework extends invariant-based control to realistic open-system regimes, providing a versatile route toward robust quantum state engineering on NISQ hardware and other platforms exhibiting non-Markovian dynamics.
Adaptive Cost-Map-based Path Planning in Partially Unknown Environments with Movable Obstacles
Reliable navigation in disaster-response and other unstructured indoor settings requires robots not only to avoid obstacles but also to recognise when those obstacles can be pushed aside. We present an adaptive, LiDAR and odometry-based path-planning framework that embeds this capability into the ROS2 Nav2 stack. A new Movable Obstacles Layer labels all LiDAR returns missing from a prior static map as tentatively movable and assigns a reduced traversal cost. A companion Slow-Pose Progress Checker monitors the ratio of commanded to actual velocity; when the robot slows appreciably, the local cost is raised from light to heavy, and on a stall to lethal, prompting the global planner to back out and re-route. Gazebo evaluations on a Scout Mini, spanning isolated objects and cluttered corridors, show higher goal-reach rates and fewer deadlocks than a no-layer baseline, with traversal times broadly comparable. Because the method relies only on planar scans and CPU-level computation, it suits resource-constrained search and rescue robots and integrates into heterogeneous platforms with minimal engineering. Overall, the results indicate that interaction-aware cost maps are a lightweight, ROS2-native extension for navigating among potentially movable obstacles in unstructured settings. The full implementation will be released as open source athttps://costmap-namo.github.io.
Modeling and Dynamic Simulation of a Hybrid Wind-Wave System on a Hexagonal Semi-Submersible Platform
Offshore renewable energy systems offer promising solutions for sustainable power generation, yet most existing platforms harvest either wind or wave energy in isolation. This study presents a hybrid floating offshore platform that integrates a wind turbine with three oscillating surge wave energy converters (WECs) into a hexagonal semi-submersible structure. In this configuration, the flaps are integrated with the platform geometry to provide both energy extraction and hydrodynamic stability. A modeling and simulation framework was developed using WEC-Sim and benchmarked against the NREL 5 MW semisubmersible reference. Metacentric height analysis confirmed hydrostatic stability across a range of prescribed flap angles. Sensitivity analysis of twelve geometric variables identified flap dimensions and tower length as dominant drivers of stability, energy capture, and tower stress. Time-domain simulations revealed dependence on wave incidence angle, with variations in flap power sharing, capture width ratio (CWR), and platform response. The feasibility of using flap sweeps to modulate pitch motion was also demonstrated. Annual energy production (AEP) estimates based on site-specific data indicate 16.86 GWh from wind and 3.65 GWh from wave energy, with WECs contributing about 18% of the total. These results highlight the potential of integrated wind-wave platforms and point toward future studies on structural modeling and advanced control.
comment: 28 pages, 17 figures
An Iterative Problem-Driven Scenario Reduction Framework for Stochastic Optimization with Conditional Value-at-Risk
Scenario reduction (SR) alleviates the computational complexity of scenario-based stochastic optimization with conditional value-at-risk (SBSO-CVaR) by identifying representative scenarios to depict the underlying uncertainty and tail risks. Existing distribution-driven SR methods emphasize statistical similarity but often exclude extreme scenarios, leading to weak tail-risk awareness and insufficient problem-specific representativeness. Instead, this paper proposes an iterative problem-driven scenario reduction framework. Specifically, we integrate the SBSO-CVaR problem structure into SR process and project the original scenario set from the distribution space onto the problem space. Subsequently, to minimize the SR optimality gap with acceptable computation complexity, we propose a tractable iterative problem-driven scenario reduction (IPDSR) method that selects representative scenarios that best approximate the optimality distribution of the original scenario set while preserving tail risks. Furthermore, the iteration process is rendered as a mixed-integer program to enable scenario partitioning and representative scenarios selection. And ex-post problem-driven evaluation indices are proposed to evaluate the SR performance. Numerical experiments show IPDSR significantly outperforms existing SR methods by achieving an optimality gap of less than 1% within an acceptable computation time.
Comprehensive Dynamic Modeling and Constraint-Aware Air Supply Control for Localized Water Management in Automotive Polymer Electrolyte Membrane Fuel Cells
In this paper, a predictive constraint-aware control scheme is formulated within the Command Governor (CG) framework for localized hydration management of a proton exchange membrane (PEM) fuel cell system. First, a comprehensive nonlinear dynamic model of the fuel cell system is presented which includes a pseudo 2-dimensional (P2D) model of the stack, reactant supply and cooling subsystems. The model captures the couplings among the various subsystems and serves as the basis for designing output feedback controllers to track the optimal set-points of the air supply and cooling systems for power optimization. The closed-loop nonlinear model is then used to analyze the dynamic behavior of membrane hydration near the anode inlet, the driest region of the membrane in a counter-flow configuration, under various operating conditions. A reduced-order linearized model is then derived to approximate hydration behavior with sufficient fidelity for constraint enforcement. This model is used within the CG framework to adjust the air supply set-points when necessary to prevent membrane dry-out. The effectiveness of the proposed approach in maintaining local membrane hydration while closely tracking the requested net power is demonstrated through realistic drive-cycle simulations.
comment: This is a manuscript submitted to Applied Energy
Techno-Economic Feasibility Analysis of Quantum Key Distribution for Power-System Communications
The accelerating digitalization and decentralization of modern power systems expose critical communication infrastructures to escalating cyber risks, particularly under emerging quantum computing threats. This paper presents an integrated techno-economic framework to evaluate the feasibility of Quantum Key Distribution (QKD) for secure power-system communications. A stochastic system model is developed to jointly capture time-varying key demand, QKD supply under optical-loss constraints, station-side buffering, and post-quantum cryptography (PQC) fallback mechanisms. Analytical conditions are derived for service-level assurance, including buffer stability, outage probability, and availability bounds. Building on this, two quantitative metrics, including the Levelized Cost of Security (LCoSec) and Cost of Incremental Security (CIS), are formulated to unify capital, operational, and risk-related expenditures within a discounted net-present-value framework. Using IEEE 118-bus, 123-node, and 39-bus test systems, we conduct discrete-event simulations comparing PQC-only, QKD-only, and Hybrid architectures across multiple topologies and service profiles. Results show that Hybrid architectures dominated by QKD significantly reduce key-outage probability and SLA shortfalls, achieving near-unit availability for real-time and confidentiality-critical services. Economic analyses reveal clear breakeven zones where QKD-enhanced deployments become cost-effective, primarily in metropolitan and distribution-level networks under moderate optical loss and buffer sizing. The proposed framework provides a reproducible, risk-aware decision tool for guiding large-scale, economically justified QKD adoption in future resilient power-system infrastructures.
Quantum-Key-Distribution Authenticated Aggregation and Settlement for Virtual Power Plants
The proliferation of distributed energy resources (DERs) and demand-side flexibility has made virtual power plants (VPPs) central to modern grid operation. Yet their end-to-end business pipeline, covering bidding, dispatch, metering, settlement, and archival, forms a tightly coupled cyber-physical-economic system where secure and timely communication is critical. Under the combined stress of sophisticated cyberattacks and extreme weather shocks, conventional cryptography offers limited long-term protection. Quantum key distribution (QKD), with information-theoretic guarantees, is viewed as a gold standard for securing critical infrastructures. However, limited key generation rates, routing capacity, and system overhead render key allocation a pressing challenge: scarce quantum keys must be scheduled across heterogeneous processes to minimize residual risk while maintaining latency guarantees. This paper introduces a quantum-authenticated aggregation and settlement framework for VPPs. We first develop a system-threat model that connects QKD key generation and routing with business-layer security strategies, authentication strength, refresh frequency, and delay constraints. Building on this, we formulate a key-budgeted risk minimization problem that jointly accounts for economic risk, service-level violations, and key-budget feasibility, and reveal a threshold property linking marginal security value to shadow prices. Case studies on a representative VPP system demonstrate that the proposed approach significantly reduces residual risk and SLA violations, enhances key efficiency and robustness, and aligns observed dynamics with the theoretical shadow price mechanism.
Spatial-to-Spectral Harmonic-Modulated Arrays for 6G Multi-Beam MIMO
This article presents an overview and analysis of spatial-to-spectral harmonic-modulated arrays (SHAs). Compared to traditional analog or digital beamforming arrays, SHAs enable concurrent multi-beamforming without requiring substantial hardware replication. SHAs replace the need for hardware replication with frequency-domain multiplexing. Furthermore, SHAs have the potential to become key contributors to future 6G networks by enabling scalable multi-user communications, joint communication and sensing, and spatial interference mitigation. In addition, an analysis of the SHA's harmonic-modulation waveform and its effects on gain, noise and bandwidth is presented. A comb-like modulation waveform for SHAs that minimizes spectral inefficiency is proposed. Further, an analysis of the SHA's capability to independently steer multiple beams is presented. This capability is quantified in terms of the SHA's spatial-to-spectral degrees of freedom. Lastly, this work introduces a novel SHA architecture that provides three spatial-to-spectral degrees of freedom with minimal hardware replication.
A Motivational Driver Steering Model: Task Difficulty Homeostasis From Control Theory Perspective
A general and psychologically plausible collision avoidance driver model can improve transportation safety significantly. Most computational driver models found in the literature have used control theory methods only, and they are not established based on psychological theories. In this paper, a unified approach is presented based on concepts taken from psychology and control theory. The "task difficulty homeostasis theory", a prominent motivational theory, is combined with the "Lyapunov stability method" in control theory to present a general and psychologically plausible model. This approach is used to model driver steering behavior for collision avoidance. The performance of this model is measured by simulation of two collision avoidance scenarios at a wide range of speeds from 20 km/h to 170 km/h. The model is validated by experiments on a driving simulator. The results demonstrate that the model follows human behavior accurately with a mean error of 7 percent.
comment: Cognitive systems Research
Personalized Collaborative Learning with Affinity-Based Variance Reduction
Multi-agent learning faces a fundamental tension: leveraging distributed collaboration without sacrificing the personalization needed for diverse agents. This tension intensifies when aiming for full personalization while adapting to unknown heterogeneity levels -- gaining collaborative speedup when agents are similar, without performance degradation when they are different. Embracing the challenge, we propose personalized collaborative learning (PCL), a novel framework for heterogeneous agents to collaboratively learn personalized solutions with seamless adaptivity. Through carefully designed bias correction and importance correction mechanisms, our method AffPCL robustly handles both environment and objective heterogeneity. We prove that AffPCL reduces sample complexity over independent learning by a factor of $\max\{n^{-1}, \delta\}$, where $n$ is the number of agents and $\delta\in[0,1]$ measures their heterogeneity. This affinity-based acceleration automatically interpolates between the linear speedup of federated learning in homogeneous settings and the baseline of independent learning, without requiring prior knowledge of the system. Our analysis further reveals that an agent may obtain linear speedup even by collaborating with arbitrarily dissimilar agents, unveiling new insights into personalization and collaboration in the high heterogeneity regime.
DeGrip: A Compact Cable-driven Robotic Gripper for Desktop Disassembly
Intelligent robotic disassembly of end-of-life (EOL) products has been a long-standing challenge in robotics. While machine learning techniques have shown promise, the lack of specialized hardware limits their application in real-world scenarios. We introduce DeGrip, a customized gripper designed for the disassembly of EOL computer desktops. DeGrip provides three degrees of freedom (DOF), enabling arbitrary configurations within the disassembly environment when mounted on a robotic manipulator. It employs a cable-driven transmission mechanism that reduces its overall size and enables operation in confined spaces. The wrist is designed to decouple the actuation of wrist and jaw joints. We also developed an EOL desktop disassembly environment in Isaac Sim to evaluate the effectiveness of DeGrip. The tasks were designed to demonstrate its ability to operate in confined spaces and disassemble components in arbitrary configurations. The evaluation results confirm the capability of DeGrip for EOL desktop disassembly.
Heterogeneous Multi-Agent Task-Assignment with Uncertain Execution Times and Preferences
While sequential task assignment for a single agent has been widely studied, such problems in a multi-agent setting, where the agents have heterogeneous task preferences or capabilities, remain less well-characterized. We study a multi-agent task assignment problem where a central planner assigns recurring tasks to multiple members of a team over a finite time horizon. For any given task, the members have heterogeneous capabilities in terms of task completion times, task resource consumption (which can model variables such as energy or attention), and preferences in terms of the rewards they collect upon task completion. We assume that the reward, execution time, and resource consumption for each member to complete any task are stochastic with unknown distributions. The goal of the planner is to maximize the total expected reward that the team receives over the problem horizon while ensuring that the resource consumption required for any assigned task is within the capability of the agent. We propose and analyze a bandit algorithm for this problem. Since the bandit algorithm relies on solving an optimal task assignment problem repeatedly, we analyze the achievable regret in two cases: when we can solve the optimal task assignment exactly and when we can solve it only approximately.
comment: 14 pages
Explore-then-Commit for Nonstationary Linear Bandits with Latent Dynamics
We study a nonstationary bandit problem where rewards depend on both actions and latent states, the latter governed by unknown linear dynamics. Crucially, the state dynamics also depend on the actions, resulting in tension between short-term and long-term rewards. We propose an explore-then-commit algorithm for a finite horizon $T$. During the exploration phase, random Rademacher actions enable estimation of the Markov parameters of the linear dynamics, which characterize the action-reward relationship. In the commit phase, the algorithm uses the estimated parameters to design an optimized action sequence for long-term reward. Our proposed algorithm achieves $\tilde{\mathcal{O}}(T^{2/3})$ regret. Our analysis handles two key challenges: learning from temporally correlated rewards, and designing action sequences with optimal long-term reward. We address the first challenge by providing near-optimal sample complexity and error bounds for system identification using bilinear rewards. We address the second challenge by proving an equivalence with indefinite quadratic optimization over a hypercube, a known NP-hard problem. We provide a sub-optimality guarantee for this problem, enabling our regret upper bound. Lastly, we propose a semidefinite relaxation with Goemans-Williamson rounding as a practical approach.
Residual Correction Models for AC Optimal Power Flow Using DC Optimal Power Flow Solutions
Solving the nonlinear AC optimal power flow (AC OPF) problem remains a major computational bottleneck for real-time grid operations. In this paper, we propose a residual learning paradigm that uses fast DC optimal power flow (DC OPF) solutions as a baseline, and learns only the nonlinear corrections required to provide the full AC-OPF solution. The method utilizes a topology-aware Graph Neural Network with local attention and two-level DC feature integration, trained using a physics-informed loss that enforces AC power-flow feasibility and operational limits. Evaluations on OPFData for 57-, 118-, and 2000-bus systems show around 25% lower MSE, up to 3X reduction in feasibility error, and up to 13X runtime speedup compared to conventional AC OPF solvers. The model maintains accuracy under N-1 contingencies and scales efficiently to large networks. These results demonstrate that residual learning is a practical and scalable bridge between linear approximations and AC-feasible OPF, enabling near real-time operational decision making.
Learning a Generalized Model for Substation Level Voltage Estimation in Distribution Networks
Accurate voltage estimation in distribution networks is critical for real-time monitoring and increasing the reliability of the grid. As DER penetration and distribution level voltage variability increase, robust distribution system state estimation (DSSE) has become more essential to maintain safe and efficient operations. Traditional DSSE techniques, however, struggle with sparse measurements and the scale of modern feeders, limiting their scalability to large networks. This paper presents a hierarchical graph neural network for substation-level voltage estimation that exploits both electrical topology and physical features, while remaining robust to the low observability levels common to real-world distribution networks. Leveraging the public SMART-DS datasets, the model is trained and evaluated on thousands of buses across multiple substations and DER penetration scenarios. Comprehensive experiments demonstrate that the proposed method achieves up to 2 times lower RMSE than alternative data-driven models, and maintains high accuracy with as little as 1\% measurement coverage. The results highlight the potential of GNNs to enable scalable, reproducible, and data-driven voltage monitoring for distribution systems.
DRL-Based Resource Allocation for Energy-Efficient IRS-Assisted UAV Spectrum Sharing Systems
Intelligent reflecting surface (IRS) assisted unmanned aerial vehicle (UAV) systems provide a new paradigm for reconfigurable and flexible wireless communications. To enable more energy efficient and spectrum efficient IRS assisted UAV wireless communications, this paper introduces a novel IRS-assisted UAV enabled spectrum sharing system with orthogonal frequency division multiplexing (OFDM). The goal is to maximize the energy efficiency (EE) of the secondary network by jointly optimizing the beamforming, subcarrier allocation, IRS phase shifts, and the UAV trajectory subject to practical transmit power and passive reflection constraints as well as UAV physical limitations. A physically grounded propulsion-energy model is adopted, with its tight upper bound used to form a tractable EE lower bound for the spectrum sharing system. To handle highly non convex, time coupled optimization problems with a mixed continuous and discrete policy space, we develop a deep reinforcement learning (DRL) approach based on the actor critic framework. Extended experiments show the significant EE improvement of the proposed DRL-based approach compared to several benchmark schemes, thus demonstrating the effectiveness and robustness of the proposed approach with mobility.
comment: 7 pages, 3 figures, 1 algorithm. LaTeX class: IEEEtran
Through-the-Earth Magnetic Induction Communication and Networking: A Comprehensive Survey
Magnetic induction (MI) communication (MIC) has emerged as a promising candidate for underground communication networks due to its excellent penetration capabilities. Integration with Space-Air-Ground-Underground (SAGUI) networks in next-generation mobile communication systems requires a well-defined network architecture. A recent discovery in MIC research, MI fast fading, remains in its early stages and presents unique challenges. This paper provides a comprehensive survey on through-the-earth (TTE) MIC, covering MI applications, channel modeling, point-to-point MIC design, relay techniques, network frameworks, and emerging technologies. We compare various MIC applications to highlight TTE-specific challenges and review the principles of channel modeling, addressing both MI slow fading and MI fast fading, along with its potential impact on existing MIC theories. We conduct a fine-grained decomposition of MI channel power gain into four distinct physical parameters, and propose a novel geometric model to analyze MI fast fading. We also summarize MI relay techniques, examine crosstalk effects in relay and high-density networks, and explore key research tasks within the OSI framework for a holistic MI network protocol in SAGUI. To bridge the gaps identified, we propose a MIC framework that supports TCP/IP and Linux, enabling full implementation of existing and emerging MIC solutions. This framework empowers researchers to leverage Linux resources and deep learning platforms for accelerated development of MIC in SAGUI networks. Remaining research challenges, open issues, and promising novel techniques are further identified to advance MIC research.
comment: This work has been accepted by the IEEE Communications Surveys & Tutorials (COMST) for publication. The final published version will be available on IEEE Xplore
Kernel-based Koopman approximants for control: Flexible sampling, error analysis, and stability
Data-driven techniques for analysis, modeling, and control of complex dynamical systems are on the uptake. Koopman theory provides the theoretical foundation for the popular kernel extended dynamic mode decomposition (kEDMD). In this work, we propose a novel kEDMD scheme to approximate nonlinear control systems accompanied by an in-depth error analysis. Key features are regularization-based robustness and an adroit decomposition into micro and macro grids enabling flexible sampling. But foremost, we prove proportionality, i.e., explicit dependence on the distance to the (controlled) equilibrium, of the derived bound on the full approximation error. Leveraging this key property, we rigorously show that asymptotic stability of the data-driven surrogate (control) system implies asymptotic stability of the original (control) system and vice versa.
comment: 29 pages, 5 figures
Contact-Aware Safety in Soft Robots Using High-Order Control Barrier and Lyapunov Functions
Robots operating alongside people, particularly in sensitive scenarios such as aiding the elderly with daily tasks or collaborating with workers in manufacturing, must guarantee safety and cultivate user trust. Continuum soft manipulators promise safety through material compliance, but as designs evolve for greater precision, payload capacity, and speed, and increasingly incorporate rigid elements, their injury risk resurfaces. In this letter, we introduce a comprehensive High-Order Control Barrier Function (HOCBF) + High-Order Control Lyapunov Function (HOCLF) framework that enforces strict contact force limits across the entire soft-robot body during environmental interactions. Our approach combines a differentiable Piecewise Cosserat-Segment (PCS) dynamics model with a convex-polygon distance approximation metric, named Differentiable Conservative Separating Axis Theorem (DCSAT), based on the soft robot geometry to enable real-time, whole-body collision detection, resolution, and enforcement of the safety constraints. By embedding HOCBFs into our optimization routine, we guarantee safety, allowing, for instance, safe navigation in operational space under HOCLF-driven motion objectives. Extensive planar simulations demonstrate that our method maintains safety-bounded contacts while achieving precise shape and task-space regulation. This work thus lays a foundation for the deployment of soft robots in human-centric environments with provable safety and performance.
comment: 8 pages
Pseudo-Kinematic Trajectory Control and Planning of Tracked Vehicles
Tracked vehicles distribute their weight continuously over a large surface area (the tracks). This distinctive feature makes them the preferred choice for vehicles required to traverse soft and uneven terrain. From a robotics perspective, however, this flexibility comes at a cost: the complexity of modelling the system and the resulting difficulty in designing theoretically sound navigation solutions. In this paper, we aim to bridge this gap by proposing a framework for the navigation of tracked vehicles, built upon three key pillars. The first pillar comprises two models: a simulation model and a control-oriented model. The simulation model captures the intricate terramechanics dynamics arising from soil-track interaction and is employed to develop faithful digital twins of the system across a wide range of operating conditions. The control-oriented model is pseudo-kinematic and mathematically tractable, enabling the design of efficient and theoretically robust control schemes. The second pillar is a Lyapunov-based feedback trajectory controller that provides certifiable tracking guarantees. The third pillar is a portfolio of motion planning solutions, each offering different complexity-accuracy trade-offs. The various components of the proposed approach are validated through an extensive set of simulation and experimental data.
RadioDiff-$k^2$: Helmholtz Equation Informed Generative Diffusion Model for Multi-Path Aware Radio Map Construction
In this paper, we propose a novel physics-informed generative learning approach, named RadioDiff-$k^2$, for accurate and efficient multipath-aware radio map (RM) construction. As future wireless communication evolves towards environment-aware paradigms, the accurate construction of RMs becomes crucial yet highly challenging. Conventional electromagnetic (EM)-based methods, such as full-wave solvers and ray-tracing approaches, exhibit substantial computational overhead and limited adaptability to dynamic scenarios. Although existing neural network (NN) approaches have efficient inferencing speed, they lack sufficient consideration of the underlying physics of EM wave propagation, limiting their effectiveness in accurately modeling critical EM singularities induced by complex multipath environments. To address these fundamental limitations, we propose a novel physics-inspired RM construction method guided explicitly by the Helmholtz equation, which inherently governs EM wave propagation. Specifically, based on the analysis of partial differential equations (PDEs), we theoretically establish a direct correspondence between EM singularities, which correspond to the critical spatial features influencing wireless propagation, and regions defined by negative wave numbers in the Helmholtz equation. We then design an innovative dual diffusion model (DM)-based large artificial intelligence framework comprising one DM dedicated to accurately inferring EM singularities and another DM responsible for reconstructing the complete RM using these singularities along with environmental contextual information. Experimental results demonstrate that the proposed RadioDiff-$k^2$ framework achieves state-of-the-art (SOTA) performance in both image-level RM construction and localization tasks, while maintaining inference latency within a few hundred milliseconds.
A kernel-based approach to physics-informed nonlinear system identification
This paper presents a kernel-based framework for physics-informed nonlinear system identification. The key contribution is a structured methodology that extends kernel-based techniques to seamlessly embed partially known physics-based models, improving parameter estimation and overall model accuracy. The proposed method enhances traditional modeling approaches by embedding a parametric model, which provides physical interpretability, with a kernel-based function, which accounts for unmodeled dynamics. The two models' components are identified from the data simultaneously, thereby minimizing a suitable cost that balances the relative importance of the physical and the black-box parts of the model. Additionally, nonlinear state smoothing is employed to address scenarios involving state-space models with not fully measurable states. Numerical simulations on an experimental benchmark system demonstrate the effectiveness of the proposed approach, achieving up to 51% reduction in simulation root mean square error compared to physics-only models and 31% performance improvement over state-of-the-art identification techniques.
comment: [Extended version] This work has been submitted to the IEEE for possible publication
Stochastic Model Predictive Control for Sub-Gaussian Noise
We propose a stochastic Model Predictive Control (MPC) framework that ensures closed-loop chance constraint satisfaction for linear systems with general sub-Gaussian process and measurement noise. By considering sub-Gaussian noise, we can provide guarantees for a large class of distributions, including time-varying distributions. Specifically, we first provide a new characterization of sub-Gaussian random vectors using matrix variance proxy, which can more accurately represent the predicted state distribution. We then derive tail bounds under linear propagation for the new characterization, enabling tractable computation of probabilistic reachable sets of linear systems. Lastly, we utilize these probabilistic reachable sets to formulate a stochastic MPC scheme that provides closed-loop guarantees for general sub-Gaussian noise. We further demonstrate our approach in simulations, including a challenging task of surgical planning from image observations.
comment: 15 pages, 6 figures, submitted to Automatica
Multi-stage model predictive control for slug flow crystallizers using uncertainty-aware surrogate models
This paper presents a novel dynamic model for slug flow crystallizers that addresses the challenges of spatial distribution without backmixing or diffusion, potentially enabling advanced model-based control. The developed model can accurately describe the main characteristics of slug flow crystallizers, including slug-to-slug variability but leads to a high computational complexity due to the consideration of partial differential equations and population balance equations. For that reason, the model cannot be directly used for process optimization and control. To solve this challenge, we propose two different approaches, conformalized quantile regression and Bayesian last layer neural networks, to develop surrogate models with uncertainty quantification capabilities. These surrogates output a prediction of the system states together with an uncertainty of these predictions to account for process variability and model uncertainty. We use the uncertainty of the predictions to formulate a robust model predictive control approach, enabling robust real-time advanced control of a slug flow crystallizer.
Decentralized Real-Time Iterations for Distributed NMPC
This article presents a Real-Time Iteration (RTI) scheme for distributed Nonlinear Model Predictive Control (NMPC). The scheme transfers the well-known RTI approach, a key enabler for many industrial real-time NMPC implementations, to the setting of cooperative distributed control. At each sampling instant, one outer iteration of a bi-level decentralized Sequential Quadratic Programming (dSQP) method is applied to a centralized optimal control problem. This ensures that real-time requirements are met and it facilitates cooperation between subsystems. Combining novel dSQP convergence results with RTI stability guarantees, we prove local exponential stability under standard assumptions on the MPC design with and without terminal constraints. The proposed scheme only requires neighbor-to-neighbor communication and avoids a central coordinator. A numerical example with coupled inverted pendulums demonstrates the efficacy of the approach.
A Set-Theoretic Robust Control Approach for Linear Quadratic Games with Unknown Counterparts
Ensuring robust decision-making in multi-agent systems is challenging when agents have distinct, possibly conflicting objectives and lack full knowledge of each other's strategies. This is apparent in safety-critical applications such as human-robot interaction and assisted driving, where uncertainty arises not only from unknown adversary strategies but also from external disturbances. To address this, the paper proposes a robust adaptive control approach based on linear quadratic differential games. Our method allows a controlled agent to iteratively refine its belief about the adversary strategy and disturbances using a set-membership approach, while simultaneously adapting its policy to guarantee robustness against the uncertain adversary policy and improve performance over time. We formally derive theoretical guarantees on the robustness of the proposed control scheme and its convergence to $\epsilon$-Nash strategies. The effectiveness of our approach is demonstrated in a numerical simulation.
comment: Accepted for publication in the Proceedings of the 64th IEEE Conference on Decision and Control
Wearable and Ultra-Low-Power Fusion of EMG and A-Mode US for Hand-Wrist Kinematic Tracking
Hand gesture recognition based on biosignals has shown strong potential for developing intuitive human-machine interaction strategies that closely mimic natural human behavior. In particular, sensor fusion approaches have gained attention for combining complementary information and overcoming the limitations of individual sensing modalities, thereby enabling more robust and reliable systems. Among them, the fusion of surface electromyography (EMG) and A-mode ultrasound (US) is very promising. However, prior solutions rely on power-hungry platforms unsuitable for multi-day use and are limited to discrete gesture classification. In this work, we present an ultra-low-power (sub-50 mW) system for concurrent acquisition of 8-channel EMG and 4-channel A-mode US signals, integrating two state-of-the-art platforms into fully wearable, dry-contact armbands. We propose a framework for continuous tracking of 23 degrees of freedom (DoFs), 20 for the hand and 3 for the wrist, using a kinematic glove for ground-truth labeling. Our method employs lightweight encoder-decoder architectures with multi-task learning to simultaneously estimate hand and wrist joint angles. Experimental results under realistic sensor repositioning conditions demonstrate that EMG-US fusion achieves a root mean squared error of $10.6^\circ\pm2.0^\circ$, compared to $12.0^\circ\pm1^\circ$ for EMG and $13.1^\circ\pm2.6^\circ$ for US, and a R$^2$ score of $0.61\pm0.1$, with $0.54\pm0.03$ for EMG and $0.38\pm0.20$ for US.
comment: 5 pages, 3 figures
VLMLight: Safety-Critical Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning Architecture
Traffic signal control (TSC) is a core challenge in urban mobility, where real-time decisions must balance efficiency and safety. Existing methods - ranging from rule-based heuristics to reinforcement learning (RL) - often struggle to generalize to complex, dynamic, and safety-critical scenarios. We introduce VLMLight, a novel TSC framework that integrates vision-language meta-control with dual-branch reasoning. At the core of VLMLight is the first image-based traffic simulator that enables multi-view visual perception at intersections, allowing policies to reason over rich cues such as vehicle type, motion, and spatial density. A large language model (LLM) serves as a safety-prioritized meta-controller, selecting between a fast RL policy for routine traffic and a structured reasoning branch for critical cases. In the latter, multiple LLM agents collaborate to assess traffic phases, prioritize emergency vehicles, and verify rule compliance. Experiments show that VLMLight reduces waiting times for emergency vehicles by up to 65% over RL-only systems, while preserving real-time performance in standard conditions with less than 1% degradation. VLMLight offers a scalable, interpretable, and safety-aware solution for next-generation traffic signal control.
comment: 25 pages, 15 figures
Learning to Capture Rocks using an Excavator: A Reinforcement Learning Approach with Guiding Reward Formulation
Rock capturing with standard excavator buckets is a challenging task typically requiring the expertise of skilled operators. Unlike soil digging, it involves manipulating large, irregular rocks in unstructured environments where complex contact interactions with granular material make model-based control impractical. Existing autonomous excavation methods focus mainly on continuous media or rely on specialized grippers, limiting their applicability to real-world construction sites. This paper introduces a fully data-driven control framework for rock capturing that eliminates the need for explicit modeling of rock or soil properties. A model-free reinforcement learning agent is trained in the AGX Dynamics simulator using the Proximal Policy Optimization (PPO) algorithm and a guiding reward formulation. The learned policy outputs joint velocity commands directly to the boom, arm, and bucket of a CAT365 excavator model. Robustness is enhanced through extensive domain randomization of rock geometry, density, and mass, as well as the initial configurations of the bucket, rock, and goal position. To the best of our knowledge, this is the first study to develop and evaluate an RL-based controller for the rock capturing task. Experimental results show that the policy generalizes well to unseen rocks and varying soil conditions, achieving high success rates comparable to those of human participants while maintaining machine stability. These findings demonstrate the feasibility of learning-based excavation strategies for discrete object manipulation without requiring specialized hardware or detailed material models.
Grid-Aware Real-Time Dispatch of Microgrid with Generalized Energy Storage: A Prediction-Free Online Optimization Approach
This paper proposes a novel prediction-free two-stage coordinated dispatch framework for the real-time dispatch of grid-connected microgrid with generalized energy storages (GES). The proposed framework explicitly addresses grid awareness, non-anticipativity constraints, and the time-coupling characteristics of GES, providing microgrid operators with a near-optimal, reliable, and adaptable dispatch tool. In the offline stage, we generate the hindsight state-of-charge (SoC) trajectories of GES by solving the multi-period economic dispatch with historical scenarios. Subsequently, leveraging this historical information (SoC trajectories, net loads, and electricity prices), we synthesize and dynamically update online references for both SoC and opportunity cost through kernel regression. We propose an adaptive Lagrange multiplier-based online convex optimization algorithm, which innovatively incorporates reference tracking for global vision and expert-tracking for step-size updates. We provide theoretical proof to show that the proposed OCO algorithm achieves a sublinear bound of both dynamic regret and time-varying hard constraint violation. Numerical studies using ground-truth data from the Australian Energy Market Operator demonstrate that the proposed method outperforms state-of-the-art methods, reducing operational costs by 5.0-6.2% and voltage violations by 0.8-9.1%. These improvements mainly result from mitigating myopia by reference tracking and the adaptive capability provided by dynamically updated references and adaptive Lagrange multipliers. Sensitivity analysis demonstrates the robustness, computational efficiency, and scalability of the proposed method.
A Multimodal Lightweight Approach to Fault Diagnosis of Induction Motors in High-Dimensional Dataset
An accurate AI-based diagnostic system for induction motors (IMs) holds the potential to enhance proactive maintenance, mitigating unplanned downtime and curbing overall maintenance costs within an industrial environment. Notably, among the prevalent faults in IMs, a Broken Rotor Bar (BRB) fault is frequently encountered. Researchers have proposed various fault diagnosis approaches using signal processing (SP), machine learning (ML), deep learning (DL), and hybrid architectures for BRB faults. One limitation in the existing literature is the training of these architectures on relatively small datasets, risking overfitting when implementing such systems in industrial environments. This paper addresses this limitation by implementing large-scale data of BRB faults by using a transfer-learning-based lightweight DL model named ShuffleNetV2 for diagnosing one, two, three, and four BRB faults using current and vibration signal data. Spectral images for training and testing are generated using a Short-Time Fourier Transform (STFT). The dataset comprises 57,500 images, with 47,500 used for training and 10,000 for testing. Remarkably, the ShuffleNetV2 model exhibited superior performance, in less computational cost as well as accurately classifying 98.856% of spectral images. To further enhance the visualization of harmonic sidebands resulting from broken bars, Fast Fourier Transform (FFT) is applied to current and vibration data. The paper also provides insights into the training and testing times for each model, contributing to a comprehensive understanding of the proposed fault diagnosis methodology. The findings of our research provide valuable insights into the performance and efficiency of different ML and DL models, offering a foundation for the development of robust fault diagnosis systems for induction motors in industrial settings.
Feedback Stackelberg-Nash equilibria in difference games with quasi-hierarchical interactions and inequality constraints
In this paper, we study a class of two-player deterministic finite-horizon difference games with coupled inequality constraints, where each player has two types of decision variables: one involving sequential interactions and the other simultaneous interactions. We refer to this class of games as quasi-hierarchical dynamic games and define a solution concept called the feedback Stackelberg-Nash (FSN) equilibrium. Under separability assumption on cost functions, we provide a recursive formulation of the FSN solution using dynamic programming. We show that the FSN solution can be derived from the parametric feedback Stackelberg solution of an associated unconstrained game involving only sequential interactions, with a specific choice of the parameters that satisfy certain implicit complementarity conditions. For the linear-quadratic case, we show that an FSN solution is obtained by reformulating these complementarity conditions as a single large-scale linear complementarity problem. Finally, we illustrate our results using a dynamic duopoly game with production constraints.
Real-Time Linear MPC for Quadrotors on SE(3): An Analytical Koopman-based Realization
This letter presents an analytical linear parameter-varying (LPV) representation of quadrotor dynamics utilizing Koopman theory, facilitating computationally efficient linear model predictive control (LMPC) for real-time trajectory tracking. By leveraging carefully designed Koopman observables, the proposed approach enables a compact lifted-space evolution that mitigates the curse of dimensionality while preserving the nonlinear characteristics of the system. Although model predictive control (MPC) is a powerful strategy for quadrotor control, it faces a trade-off between the high computational cost of nonlinear MPC (NMPC) and the reduced accuracy of LMPC. To address this gap, we introduce KQ-LMPC (Koopman Quasilinear LPV MPC), which leverages the Koopman-lifted LPV formulation to enforce constraints, ensure lower computational burden and real-time feasibility, and deliver tracking performance comparable to NMPC. Experimental validation confirms the effectiveness of the framework in reasonably agile flight. To the best of our knowledge, this is the first experimentally validated LMPC for quadrotors that employs analytically derived Koopman observables without requiring training data.
comment: 6 pages, 3 figures, accepted for publication at IEEE Robotics and Automation Letters
Recursive Gaussian Process State Space Model
Learning dynamical models from data is not only fundamental but also holds great promise for advancing principle discovery, time-series prediction, and controller design. Among various approaches, Gaussian Process State-Space Models (GPSSMs) have recently gained significant attention due to their combination of flexibility and interpretability. However, for online learning, the field lacks an efficient method suitable for scenarios where prior information regarding data distribution and model function is limited. To address this issue, this paper proposes a recursive GPSSM method with adaptive capabilities for both operating domains and Gaussian process (GP) hyperparameters. Specifically, we first utilize first-order linearization to derive a Bayesian update equation for the joint distribution between the system state and the GP model, enabling closed-form and domain-independent learning. Second, an online selection algorithm for inducing points is developed based on informative criteria to achieve lightweight learning. Third, to support online hyperparameter optimization, we recover historical measurement information from the current filtering distribution. Comprehensive evaluations on both synthetic and real-world datasets demonstrate the superior accuracy, computational efficiency, and adaptability of our method compared to state-of-the-art online GPSSM techniques.
A Human-Vector Susceptible-Infected-Susceptible Model for Analyzing and Controlling the Spread of Vector-Borne Diseases
We propose an epidemic model for the spread of vector-borne diseases. The model, which is built extending the classical susceptible-infected-susceptible model, accounts for two populations -- humans and vectors -- and for cross-contagion between the two species, whereby humans become infected upon interaction with carrier vectors, and vectors become carriers after interaction with infected humans. We formulate the model as a system of ordinary differential equations and leverage monotone systems theory to rigorously characterize the epidemic dynamics. Specifically, we characterize the global asymptotic behavior of the disease, determining conditions for quick eradication of the disease (i.e., for which all trajectories converge to a disease-free equilibrium), or convergence to a (unique) endemic equilibrium. Then, we incorporate two control actions: namely, vector control and incentives to adopt protection measures. Using the derived mathematical tools, we assess the impact of these two control actions and determine the optimal control policy.
comment: Published in the Proceedings of the 2025 European Control Conference (ECC)
Robust Closed-Form Control for MIMO Nonlinear Systems under Conflicting Time-Varying Hard and Soft Constraints (extended version)
This paper introduces a novel robust closed-form control law to handle time-varying hard and soft constraints in uncertain high-relative-degree nonlinear MIMO systems. These constraints represent spatiotemporal specifications in mechanical systems' operational space, with hard constraints ensuring safety-critical requirements and soft constraints encoding performance or task objectives. Initially, all constraints are consolidated into two separate scalar time-varying hard and soft constraint functions, whose positive level sets define feasible regions. A closed-form control law is developed to enforce these constraints using appropriately designed reciprocal barriers and nonlinear transformation functions. When conflicts between hard and soft constraints arise, the control law prioritizes hard constraints by virtually relaxing soft constraints via a dynamic relaxation law. Notably, the proposed control law maintains low complexity by avoiding approximation schemes for coping with system uncertainties. Simulation results confirm the effectiveness of the proposed method.
comment: 18 pages, 6 figures
Systems and Control (EESS)
Bio-inspired Microgrid Management based on Brain's Sensorimotor Gating
Microgrids are emerging as key enablers of resilient, sustainable, and intelligent power systems, but they continue to face challenges in dynamic disturbance handling, protection coordination, and uncertainty. Recent efforts have explored Brain Emotional Learning (BEL) controllers as bio-inspired solutions for microgrid control. Building on this growing trajectory, this article introduces a new paradigm for Neuro-Microgrids, inspired by the brain's sensorimotor gating mechanisms, specifically the Prepulse Inhibition (PPI) and Prepulse Facilitation (PPF). Sensorimotor gating offers a biological model for selectively suppressing or amplifying responses depending on contextual relevance. By mapping these principles onto the hierarchical control architecture of microgrids, we propose a Sensorimotor Gating-Inspired Neuro-Microgrid (SG-NMG) framework. In this architecture, PPI-like control decisions correspond to protective damping in primary and secondary management of microgrids, whereas PPF-like decisions correspond to adaptive amplification of corrective control actions. The framework is presented through analytical workflow design, neuro-circuitry analogies, and integration with machine learning methods. Finally, open challenges and research directions are outlined, including the mathematical modeling of gating, digital twin validation, and cross-disciplinary collaboration between neuroscience and industrial power systems. The resulting paradigm highlights sensorimotor gating as a promising framework for designing self-protective, adaptive, and resilient microgrids.
Braking within Barriers: Constructive Safety-Critical Control for Input-Constrained Vehicles via the Backup Set Method
This paper presents a safety-critical control framework to maintain bounded lateral motions for vehicles braking on asymmetric surfaces. We synthesize a brake controller that assists drivers and guarantees safety against excessive lateral motions (i.e., prevents the vehicle from spinning out) while minimizing the stopping distance. We address this safety-critical control problem in the presence of input constraints, since braking forces are limited by the available friction on the road. We use backup control barrier functions for safe control design. As this approach requires the construction of a backup set and a backup controller, we propose a novel, systematic method to creating valid backup set-backup controller pairs based on feedback linearization and continuous-time Lyapunov equations. We use simple examples to demonstrate our proposed safety-critical control method. Finally, we implement our approach on a four-wheel vehicle model for braking on asymmetric surfaces and present simulation results.
comment: Submitted to the IEEE Transactions on Automation Science and Engineering. 14 pages, 10 figures
Cavity Duplexer Tuning with 1d Resnet-like Neural Networks
This paper presents machine learning method for tuning of cavity duplexer with a large amount of adjustment screws. After testing we declined conventional reinforcement learning approach and reformulated our task in the supervised learning setup. The suggested neural network architecture includes 1d ResNet-like backbone and processing of some additional information about S-parameters, like the shape of curve and peaks positions and amplitudes. This neural network with external control algorithm is capable to reach almost the tuned state of the duplexer within 4-5 rotations per screw.
Integrating Conductor Health into Dynamic Line Rating and Unit Commitment under Uncertainty
Dynamic line rating (DLR) enables greater utilization of existing transmission lines by leveraging real-time weather data. However, the elevated temperature operation (ETO) of conductors under DLR is often overlooked, despite its long-term impact on conductor health. This paper addresses this issue by 1) quantifying depreciation costs associated with ETO and 2) proposing a Conductor Health-Aware Unit Commitment (CHA-UC) that internalizes these costs in operational decisions. The CHA-UC incorporates a robust linear approximation of conductor temperature and integration of expected depreciation costs due to hourly ETO into the objective function. Case studies on the Texas 123-bus backbone test system using NOAA weather data demonstrate that the proposed CHA-UC model reduces the total cost by 0.8% and renewable curtailment by 84%compared to static line rating (SLR), while conventional DLR operation without risk consideration resulted in higher costs due to excessive ETO. Further analysis of the commitment decisions and the line temperature statistics confirms that the CHA-UC achieves safer line flows by shifting generator commitments. Finally, we examine the emergent correlation between wind generation and DLR forecast errors, and show that CHA-UC adaptively manages this effect by relaxing flows for risk-hedging conditions while tightening flows for risk-amplifying ones.
Sugar Shack 4.0: Practical Demonstration of an IIoT-Based Event-Driven Automation System
This paper presents a practical alternative to programmable-logic-controller-centric automation by implementing an event-driven architecture built with industrial Internet of Things tools. A layered design on a local edge server (i) abstracts actuators, (ii) enforces mutual exclusion of shared physical resources through an interlock with priority queueing, (iii) composes deterministic singular operations, and (iv) orchestrates complete workflows as state machines in Node-RED, with communication over MQTT. The device layer uses low-cost ESP32-based gateways to interface sensors and actuators, while all automation logic is offloaded to the server side. As part of a larger project involving the first scientifically-documented integration of Industry 4.0 technologies in a maple syrup boiling center, this work demonstrates the deployment of the proposed system as a case-study. Evaluation over an entire production season shows median message time of flight around one tenth of a second, command issuance-to-motion latencies of about two to three seconds, and command completion near six seconds dominated by actuator mechanics; operation runtimes span tens of seconds to minutes. These results indicate that network and orchestration overheads are negligible relative to process dynamics, enabling modular, distributed control without compromising determinism or fault isolation. The approach reduces material and integration effort, supports portable containerized deployment, and naturally enables an edge/cloud split in which persistence and analytics are offloaded while automation remains at the edge.
comment: 10 pages, 15 figures
Mitigating Underwater Noise from Offshore Wind Turbines via Individual Pitch Control
This paper proposes a pitch control strategy to mitigate the underwater acoustic footprint of offshore wind turbines, a measure that will soon become necessary to minimize impacts on marine life, which rely on sound for communication, navigation, and survival. First, we quantify the underwater acoustic signature of blade-generated aerodynamic noise from three reference turbines, the NREL 5 MW, DTU 10 MW, and IEA 22 MW, using coupling blade element momentum and coupled air-water acoustic propagation modeling. Second, we propose and implement an open-loop individual pitch control (IPC) strategy that modulates the pitch of the blade at the blade passing frequency to attenuate the overall sound pressure level (OSPL) and the amplitude modulation (AM) of the transmitted noise. Third, we benchmark IPC performance against conventional pitch schemes. The results indicate that up to 5 dB reductions in OSPL and a decrease in AM depth 20% can be achieved with a pitch variation of $\Delta\theta\approx 5^\circ$, with small losses (5-10%) in energy capture. These findings highlight a previously underappreciated noise pathway and demonstrate that targeted blade-pitch modulation can mitigate its impact.
Cross-border offshore hydrogen trade and carbon mitigation for Europe's net zero transition
European countries are ambitious in both the net-zero transition and offshore energy resource development. The Irish and UK governments announced their commitments to offshore wind capacities - 37 and 125 GW, respectively, in 2050, more than two times higher than their projected power demands. While other continental countries, such as Germany, are calling for cleaner fuel resources. Exporting surplus offshore green hydrogen and bridging supply and demand could be pivotal in carbon emission mitigation for Europe. Yet, the potentials of these Island countries, are usually underestimated. This paper developed a bottom-up method to investigate the role of offshore hydrogen from Ireland and the UK in the decarbonisation of the entire Europe. We evaluate the future hydrogen/ammonia trading and the contributions of each country in carbon emission mitigation, considering their relative cost-competitiveness in offshore hydrogen production, domestic hourly power and gas system operation, and international shipping costs. Results indicate that the offshore green hydrogen could reduce 175.16 Mt/year of carbon dioxide emissions in Europe. The UK will be the largest hydrogen supplier from 2030 to 2040, while surpassed by Ireland in 2050, with 161 TWh of hydrogen exports to France and Spain. The offshore green hydrogen can contribute to 175.16 Mt of annual carbon dioxide emission reductions in total. This general flow of hydrogen from the West to the East not only facilitates Europe's net-zero progress, but also reshapes the energy supply structure and helps to ensure energy security across the European continent.
Freehand 3D Ultrasound Imaging: Sim-in-the-Loop Probe Pose Optimization via Visual Servoing
Freehand 3D ultrasound (US) imaging using conventional 2D probes offers flexibility and accessibility for diverse clinical applications but faces challenges in accurate probe pose estimation. Traditional methods depend on costly tracking systems, while neural network-based methods struggle with image noise and error accumulation, compromising reconstruction precision. We propose a cost-effective and versatile solution that leverages lightweight cameras and visual servoing in simulated environments for precise 3D US imaging. These cameras capture visual feedback from a textured planar workspace. To counter occlusions and lighting issues, we introduce an image restoration method that reconstructs occluded regions by matching surrounding texture patterns. For pose estimation, we develop a simulation-in-the-loop approach, which replicates the system setup in simulation and iteratively minimizes pose errors between simulated and real-world observations. A visual servoing controller refines the alignment of camera views, improving translational estimation by optimizing image alignment. Validations on a soft vascular phantom, a 3D-printed conical model, and a human arm demonstrate the robustness and accuracy of our approach, with Hausdorff distances to the reference reconstructions of 0.359 mm, 1.171 mm, and 0.858 mm, respectively. These results confirm the method's potential for reliable freehand 3D US reconstruction.
Adaptive Legged Locomotion via Online Learning for Model Predictive Control
We provide an algorithm for adaptive legged locomotion via online learning and model predictive control. The algorithm is composed of two interacting modules: model predictive control (MPC) and online learning of residual dynamics. The residual dynamics can represent modeling errors and external disturbances. We are motivated by the future of autonomy where quadrupeds will autonomously perform complex tasks despite real-world unknown uncertainty, such as unknown payload and uneven terrains. The algorithm uses random Fourier features to approximate the residual dynamics in reproducing kernel Hilbert spaces. Then, it employs MPC based on the current learned model of the residual dynamics. The model is updated online in a self-supervised manner using least squares based on the data collected while controlling the quadruped. The algorithm enjoys sublinear \textit{dynamic regret}, defined as the suboptimality against an optimal clairvoyant controller that knows how the residual dynamics. We validate our algorithm in Gazebo and MuJoCo simulations, where the quadruped aims to track reference trajectories. The Gazebo simulations include constant unknown external forces up to $12\boldsymbol{g}$, where $\boldsymbol{g}$ is the gravity vector, in flat terrain, slope terrain with $20\degree$ inclination, and rough terrain with $0.25m$ height variation. The MuJoCo simulations include time-varying unknown disturbances with payload up to $8~kg$ and time-varying ground friction coefficients in flat terrain.
comment: 9 pages
A Predictive Flexibility Aggregation Method for Low Voltage Distribution System Control
This paper presents a predictive control strategy to manage low-voltage distribution systems. The proposed approach relies on an aggregate of the flexibility at the residential unit level into a three-dimensional chart that represents the injected active and reactive power, and the flexibility cost. First, this method solves a multiparametric optimization problem offline at the residential unit level to aggregate the flexibility of the assets. Then, a semi-explicit model predictive control problem is solved to account for forecasts. By combining the results of these problems with measurements, the method generates the desired flexibility chart. The proposed approach is compatible with realtime control requirements, as heavy computations are performed offline locally, making it naturally parallelizable. By linking realtime flexibility assessment with energy scheduling, our approach enables efficient, low-cost, and privacy-preserving management of low-voltage distribution systems. We validate this method on a low-voltage network of 5 buses by comparing it with an ideal technique.
comment: 8 pages, 6 figures
Observer Design over Hypercomplex Quaternions
We develop observer design over hypercomplex quaternions in a characteristic-polynomial-free framework. Using the standard right-module convention, we derive a right observable companion form and its companion polynomial that encodes error dynamics via right-eigenvalue similarity classes. The design mirrors the real/complex case - coefficient updates in companion coordinates, followed by a similarity back - yet avoids determinants, characteristic/minimal polynomials, and Cayley-Hamilton identities that do not transfer to quaternions. We also give an Ackermann-type construction for the important case of closed-loop companion polynomials with real coefficients, ensuring similarity-equivariant evaluation. The results yield simple recipes for full-order observers directly over quaternions, clarify the role of right spectra and their similarity classes, and pinpoint when classical one-shot formulas remain valid. Numerical examples illustrate the method and advantages over vectorized or complex-adjoint surrogates.
comment: Accepted for presentation at the 24th European Control Conference (ECC 2026), Reykjavik, Iceland. This work was co-funded by the European Union under the project ROBOPROX (reg. no. CZ.02.01.01/00/22 008/0004590)
Active Inverse Methods in Stackelberg Games with Bounded Rationality
Inverse game theory is utilized to infer the cost functions of all players based on game outcomes. However, existing inverse game theory methods do not consider the learner as an active participant in the game, which could significantly enhance the learning process. In this paper, we extend inverse game theory to active inverse methods. For Stackelberg games with bounded rationality, the leader, acting as a learner, actively chooses actions to better understand the follower's cost functions. First, we develop a method of active learning by leveraging Fisher information to maximize information gain about the unknown parameters and prove the consistency and asymptotic normality. Additionally, when leaders consider its cost, we develop a method of active inverse game to balance exploration and exploitation, and prove the consistency and asymptotic Stackelberg equilibrium with quadratic cost functions. Finally, we verify the properties of these methods through simulations in the quadratic case and demonstrate that the active inverse game method can achieve Stackelberg equilibrium more quickly through active exploration.
Hypergame-based Cognition Modeling and Intention Interpretation for Human-Driven Vehicles in Connected Mixed Traffic
With the practical implementation of connected and autonomous vehicles (CAVs), the traffic system is expected to remain a mix of CAVs and human-driven vehicles (HVs) for the foreseeable future. To enhance safety and traffic efficiency, the trajectory planning strategies of CAVs must account for the influence of HVs, necessitating accurate HV trajectory prediction. Current research often assumes that human drivers have perfect knowledge of all vehicles' objectives, an unrealistic premise. This paper bridges the gap by leveraging hypergame theory to account for cognitive and perception limitations in HVs. We model human bounded rationality without assuming them to be merely passive followers and propose a hierarchical cognition modeling framework that captures cognitive relationships among vehicles. We further analyze the cognitive stability of the system, proving that the strategy profile where all vehicles adopt cognitively equilibrium strategies constitutes a hyper Nash equilibrium when CAVs accurately learn HV parameters. To achieve this, we develop an inverse learning algorithm for distributed intention interpretation via vehicle-to-everything (V2X) communication, which extends the framework to both offline and online scenarios. Additionally, we introduce a distributed trajectory prediction and planning approach for CAVs, leveraging the learned parameters in real time. Simulations in highway lane-changing scenarios demonstrate the proposed method's accuracy in parameter learning, robustness to noisy trajectory observations, and safety in HV trajectory prediction. The results validate the effectiveness of our method in both offline and online implementations.
Hypergraph Contrastive Sensor Fusion for Multimodal Fault Diagnosis in Induction Motors
Reliable induction motor (IM) fault diagnosis is vital for industrial safety and operational continuity, mitigating costly unplanned downtime. Conventional approaches often struggle to capture complex multimodal signal relationships, are constrained to unimodal data or single fault types, and exhibit performance degradation under noisy or cross-domain conditions. This paper proposes the Multimodal Hypergraph Contrastive Attention Network (MM-HCAN), a unified framework for robust fault diagnosis. To the best of our knowledge, MM-HCAN is the first to integrate contrastive learning within a hypergraph topology specifically designed for multimodal sensor fusion, enabling the joint modelling of intra- and inter-modal dependencies and enhancing generalisation beyond Euclidean embedding spaces. The model facilitates simultaneous diagnosis of bearing, stator, and rotor faults, addressing the engineering need for consolidated di- agnostic capabilities. Evaluated on three real-world benchmarks, MM-HCAN achieves up to 99.82% accuracy with strong cross-domain generalisation and resilience to noise, demonstrating its suitability for real-world deployment. An ablation study validates the contribution of each component. MM-HCAN provides a scalable and robust solution for comprehensive multi-fault diagnosis, supporting predictive maintenance and extended asset longevity in industrial environments.
comment: Submitted to IEEE Sensors Journal
A Tsetlin Machine Image Classification Accelerator on a Flexible Substrate
This paper introduces the first implementation of digital Tsetlin Machines (TMs) on flexible integrated circuit (FlexIC) using Pragmatic's 600nm IGZO-based FlexIC technology. TMs, known for their energy efficiency, interpretability, and suitability for edge computing, have previously been limited by the rigidity of conventional silicon-based chips. We develop two TM inference models as FlexICs: one achieving 98.5% accuracy using 6800 NAND2 equivalent logic gates with an area of 8X8 mm2, and a second more compact version achieving slightly lower prediction accuracy of 93% but using only 1420 NAND2 equivalent gates with an area of 4X4 mm2, both of which are custom-designed for an 8X8-pixel handwritten digit recognition dataset. The paper demonstrates the feasibility of deploying flexible TM inference engines into wearable healthcare and edge computing applications.
comment: accepted by International Symposium on the Tsetlin Machine (ISTM) 2025
Modelling-driven requirements for Error Field Control Coil application to initial JT-60SA plasmas
JT-60SA is a large superconducting tokamak built in Naka, Japan. After the successful achievement of its first MA-class plasma, the installation of several additional sub-systems, including a set of non-axisymmetric Error Field Correction Coils (EFCC), is ongoing. Optimization of future JT-60SA plasma scenarios will critically depend on the correct use of EFCC, including careful fulfillment of system specifications. In addition to that, preparation and risk mitigation of early ITER operations will greatly benefit from the experience gained by early EFCC application to JT-60SA experiments, in particular to optimize error field detection and control strategies. In this work, EFCC application in JT-60SA Initial Research Phase I perspective scenarios is modeled including plasma response. Impact of (Resonant) Magnetic Perturbations on the different plasma scenarios is assessed for both core and pedestal regions by the linear resistive MHD code MARS-F. The dominant core response to EFs is discussed case by case and compared to mode locking thresholds from literature. Typical current/voltage amplitudes and wave-forms are then compared to EFCC specifications in order to assess a safe operational space.
Balancing Fairness and Performance in Multi-User Spark Workloads with Dynamic Scheduling (extended version) SoCC'25
Apache Spark is a widely adopted framework for large-scale data processing. However, in industrial analytics environments, Spark's built-in schedulers, such as FIFO and fair scheduling, struggle to maintain both user-level fairness and low mean response time, particularly in long-running shared applications. Existing solutions typically focus on job-level fairness which unintentionally favors users who submit more jobs. Although Spark offers a built-in fair scheduler, it lacks adaptability to dynamic user workloads and may degrade overall job performance. We present the User Weighted Fair Queuing (UWFQ) scheduler, designed to minimize job response times while ensuring equitable resource distribution across users and their respective jobs. UWFQ simulates a virtual fair queuing system and schedules jobs based on their estimated finish times under a bounded fairness model. To further address task skew and reduce priority inversions, which are common in Spark workloads, we introduce runtime partitioning, a method that dynamically refines task granularity based on expected runtime. We implement UWFQ within the Spark framework and evaluate its performance using multi-user synthetic workloads and Google cluster traces. We show that UWFQ reduces the average response time of small jobs by up to 74% compared to existing built-in Spark schedulers and to state-of-the-art fair scheduling algorithms.
comment: This paper is an extended version of a paper accepted at the ACM Symposium on Cloud Computing (SoCC'25) that contains a proof of correctness
Recursive Inference for Heterogeneous Multi-Output GP State-Space Models with Arbitrary Moment Matching
Accurate learning of system dynamics is becoming increasingly crucial for advanced control and decision-making in engineering. However, real-world systems often exhibit multiple channels and highly nonlinear transition dynamics, challenging traditional modeling methods. To enable online learning for these systems, this paper formulates the system as Gaussian process state-space models (GPSSMs) and develops a recursive learning method. The main contributions are threefold. First, a heterogeneous multi-output kernel is designed, allowing each output dimension to adopt distinct kernel types, hyperparameters, and input variables, improving expressiveness in multi-dimensional dynamics learning. Second, an inducing-point management algorithm enhances computational efficiency through independent selection and pruning for each output dimension. Third, a unified recursive inference framework for GPSSMs is derived, supporting general moment matching approaches, including the extended Kalman filter (EKF), unscented Kalman filter (UKF), and assumed density filtering (ADF), enabling accurate learning under strong nonlinearity and significant noise. Experiments on synthetic and real-world datasets show that the proposed method matches the accuracy of SOTA offline GPSSMs with only 1/100 of the runtime, and surpasses SOTA online GPSSMs by around 70% in accuracy under heavy noise while using only 1/20 of the runtime.
TranSimHub:A Unified Air-Ground Simulation Platform for Multi-Modal Perception and Decision-Making
Air-ground collaborative intelligence is becoming a key approach for next-generation urban intelligent transportation management, where aerial and ground systems work together on perception, communication, and decision-making. However, the lack of a unified multi-modal simulation environment has limited progress in studying cross-domain perception, coordination under communication constraints, and joint decision optimization. To address this gap, we present TranSimHub, a unified simulation platform for air-ground collaborative intelligence. TranSimHub offers synchronized multi-view rendering across RGB, depth, and semantic segmentation modalities, ensuring consistent perception between aerial and ground viewpoints. It also supports information exchange between the two domains and includes a causal scene editor that enables controllable scenario creation and counterfactual analysis under diverse conditions such as different weather, emergency events, and dynamic obstacles. We release TranSimHub as an open-source platform that supports end-to-end research on perception, fusion, and control across realistic air and ground traffic scenes. Our code is available at https://github.com/Traffic-Alpha/TranSimHub.
comment: 9 pages, 4 figures
Singularity-free dynamical invariants-based quantum control
State preparation is a cornerstone of quantum technologies, underpinning applications in computation, communication, and sensing. Its importance becomes even more pronounced in non-Markovian open quantum systems, where environmental memory and model uncertainties pose significant challenges to achieving high-fidelity control. Invariant-based inverse engineering provides a principled framework for synthesizing analytic control fields, yet existing parameterizations often lead to experimentally infeasible, singular pulses and are limited to simplified noise models such as those of Lindblad form. Here, we introduce a generalized invariant-based protocol for single-qubit state preparation under arbitrary noise conditions. The control proceeds in two-stages: first, we construct a family of bounded pulses that achieve perfect state preparation in a closed system; second, we identify the optimal member of this family that minimizes the effect of noise. The framework accommodates both (i) characterized noise, enabling noise-aware control synthesis, and (ii) uncharacterized noise, where a noise-agnostic variant preserves robustness without requiring a master-equation description. Numerical simulations demonstrate high-fidelity state preparation across diverse targets while producing smooth, hardware-feasible control fields. This singularity-free framework extends invariant-based control to realistic open-system regimes, providing a versatile route toward robust quantum state engineering on NISQ hardware and other platforms exhibiting non-Markovian dynamics.
Adaptive Cost-Map-based Path Planning in Partially Unknown Environments with Movable Obstacles
Reliable navigation in disaster-response and other unstructured indoor settings requires robots not only to avoid obstacles but also to recognise when those obstacles can be pushed aside. We present an adaptive, LiDAR and odometry-based path-planning framework that embeds this capability into the ROS2 Nav2 stack. A new Movable Obstacles Layer labels all LiDAR returns missing from a prior static map as tentatively movable and assigns a reduced traversal cost. A companion Slow-Pose Progress Checker monitors the ratio of commanded to actual velocity; when the robot slows appreciably, the local cost is raised from light to heavy, and on a stall to lethal, prompting the global planner to back out and re-route. Gazebo evaluations on a Scout Mini, spanning isolated objects and cluttered corridors, show higher goal-reach rates and fewer deadlocks than a no-layer baseline, with traversal times broadly comparable. Because the method relies only on planar scans and CPU-level computation, it suits resource-constrained search and rescue robots and integrates into heterogeneous platforms with minimal engineering. Overall, the results indicate that interaction-aware cost maps are a lightweight, ROS2-native extension for navigating among potentially movable obstacles in unstructured settings. The full implementation will be released as open source athttps://costmap-namo.github.io.
Modeling and Dynamic Simulation of a Hybrid Wind-Wave System on a Hexagonal Semi-Submersible Platform
Offshore renewable energy systems offer promising solutions for sustainable power generation, yet most existing platforms harvest either wind or wave energy in isolation. This study presents a hybrid floating offshore platform that integrates a wind turbine with three oscillating surge wave energy converters (WECs) into a hexagonal semi-submersible structure. In this configuration, the flaps are integrated with the platform geometry to provide both energy extraction and hydrodynamic stability. A modeling and simulation framework was developed using WEC-Sim and benchmarked against the NREL 5 MW semisubmersible reference. Metacentric height analysis confirmed hydrostatic stability across a range of prescribed flap angles. Sensitivity analysis of twelve geometric variables identified flap dimensions and tower length as dominant drivers of stability, energy capture, and tower stress. Time-domain simulations revealed dependence on wave incidence angle, with variations in flap power sharing, capture width ratio (CWR), and platform response. The feasibility of using flap sweeps to modulate pitch motion was also demonstrated. Annual energy production (AEP) estimates based on site-specific data indicate 16.86 GWh from wind and 3.65 GWh from wave energy, with WECs contributing about 18% of the total. These results highlight the potential of integrated wind-wave platforms and point toward future studies on structural modeling and advanced control.
comment: 28 pages, 17 figures
An Iterative Problem-Driven Scenario Reduction Framework for Stochastic Optimization with Conditional Value-at-Risk
Scenario reduction (SR) alleviates the computational complexity of scenario-based stochastic optimization with conditional value-at-risk (SBSO-CVaR) by identifying representative scenarios to depict the underlying uncertainty and tail risks. Existing distribution-driven SR methods emphasize statistical similarity but often exclude extreme scenarios, leading to weak tail-risk awareness and insufficient problem-specific representativeness. Instead, this paper proposes an iterative problem-driven scenario reduction framework. Specifically, we integrate the SBSO-CVaR problem structure into SR process and project the original scenario set from the distribution space onto the problem space. Subsequently, to minimize the SR optimality gap with acceptable computation complexity, we propose a tractable iterative problem-driven scenario reduction (IPDSR) method that selects representative scenarios that best approximate the optimality distribution of the original scenario set while preserving tail risks. Furthermore, the iteration process is rendered as a mixed-integer program to enable scenario partitioning and representative scenarios selection. And ex-post problem-driven evaluation indices are proposed to evaluate the SR performance. Numerical experiments show IPDSR significantly outperforms existing SR methods by achieving an optimality gap of less than 1% within an acceptable computation time.
Comprehensive Dynamic Modeling and Constraint-Aware Air Supply Control for Localized Water Management in Automotive Polymer Electrolyte Membrane Fuel Cells
In this paper, a predictive constraint-aware control scheme is formulated within the Command Governor (CG) framework for localized hydration management of a proton exchange membrane (PEM) fuel cell system. First, a comprehensive nonlinear dynamic model of the fuel cell system is presented which includes a pseudo 2-dimensional (P2D) model of the stack, reactant supply and cooling subsystems. The model captures the couplings among the various subsystems and serves as the basis for designing output feedback controllers to track the optimal set-points of the air supply and cooling systems for power optimization. The closed-loop nonlinear model is then used to analyze the dynamic behavior of membrane hydration near the anode inlet, the driest region of the membrane in a counter-flow configuration, under various operating conditions. A reduced-order linearized model is then derived to approximate hydration behavior with sufficient fidelity for constraint enforcement. This model is used within the CG framework to adjust the air supply set-points when necessary to prevent membrane dry-out. The effectiveness of the proposed approach in maintaining local membrane hydration while closely tracking the requested net power is demonstrated through realistic drive-cycle simulations.
comment: This is a manuscript submitted to Applied Energy
Techno-Economic Feasibility Analysis of Quantum Key Distribution for Power-System Communications
The accelerating digitalization and decentralization of modern power systems expose critical communication infrastructures to escalating cyber risks, particularly under emerging quantum computing threats. This paper presents an integrated techno-economic framework to evaluate the feasibility of Quantum Key Distribution (QKD) for secure power-system communications. A stochastic system model is developed to jointly capture time-varying key demand, QKD supply under optical-loss constraints, station-side buffering, and post-quantum cryptography (PQC) fallback mechanisms. Analytical conditions are derived for service-level assurance, including buffer stability, outage probability, and availability bounds. Building on this, two quantitative metrics, including the Levelized Cost of Security (LCoSec) and Cost of Incremental Security (CIS), are formulated to unify capital, operational, and risk-related expenditures within a discounted net-present-value framework. Using IEEE 118-bus, 123-node, and 39-bus test systems, we conduct discrete-event simulations comparing PQC-only, QKD-only, and Hybrid architectures across multiple topologies and service profiles. Results show that Hybrid architectures dominated by QKD significantly reduce key-outage probability and SLA shortfalls, achieving near-unit availability for real-time and confidentiality-critical services. Economic analyses reveal clear breakeven zones where QKD-enhanced deployments become cost-effective, primarily in metropolitan and distribution-level networks under moderate optical loss and buffer sizing. The proposed framework provides a reproducible, risk-aware decision tool for guiding large-scale, economically justified QKD adoption in future resilient power-system infrastructures.
Quantum-Key-Distribution Authenticated Aggregation and Settlement for Virtual Power Plants
The proliferation of distributed energy resources (DERs) and demand-side flexibility has made virtual power plants (VPPs) central to modern grid operation. Yet their end-to-end business pipeline, covering bidding, dispatch, metering, settlement, and archival, forms a tightly coupled cyber-physical-economic system where secure and timely communication is critical. Under the combined stress of sophisticated cyberattacks and extreme weather shocks, conventional cryptography offers limited long-term protection. Quantum key distribution (QKD), with information-theoretic guarantees, is viewed as a gold standard for securing critical infrastructures. However, limited key generation rates, routing capacity, and system overhead render key allocation a pressing challenge: scarce quantum keys must be scheduled across heterogeneous processes to minimize residual risk while maintaining latency guarantees. This paper introduces a quantum-authenticated aggregation and settlement framework for VPPs. We first develop a system-threat model that connects QKD key generation and routing with business-layer security strategies, authentication strength, refresh frequency, and delay constraints. Building on this, we formulate a key-budgeted risk minimization problem that jointly accounts for economic risk, service-level violations, and key-budget feasibility, and reveal a threshold property linking marginal security value to shadow prices. Case studies on a representative VPP system demonstrate that the proposed approach significantly reduces residual risk and SLA violations, enhances key efficiency and robustness, and aligns observed dynamics with the theoretical shadow price mechanism.
Spatial-to-Spectral Harmonic-Modulated Arrays for 6G Multi-Beam MIMO
This article presents an overview and analysis of spatial-to-spectral harmonic-modulated arrays (SHAs). Compared to traditional analog or digital beamforming arrays, SHAs enable concurrent multi-beamforming without requiring substantial hardware replication. SHAs replace the need for hardware replication with frequency-domain multiplexing. Furthermore, SHAs have the potential to become key contributors to future 6G networks by enabling scalable multi-user communications, joint communication and sensing, and spatial interference mitigation. In addition, an analysis of the SHA's harmonic-modulation waveform and its effects on gain, noise and bandwidth is presented. A comb-like modulation waveform for SHAs that minimizes spectral inefficiency is proposed. Further, an analysis of the SHA's capability to independently steer multiple beams is presented. This capability is quantified in terms of the SHA's spatial-to-spectral degrees of freedom. Lastly, this work introduces a novel SHA architecture that provides three spatial-to-spectral degrees of freedom with minimal hardware replication.
A Motivational Driver Steering Model: Task Difficulty Homeostasis From Control Theory Perspective
A general and psychologically plausible collision avoidance driver model can improve transportation safety significantly. Most computational driver models found in the literature have used control theory methods only, and they are not established based on psychological theories. In this paper, a unified approach is presented based on concepts taken from psychology and control theory. The "task difficulty homeostasis theory", a prominent motivational theory, is combined with the "Lyapunov stability method" in control theory to present a general and psychologically plausible model. This approach is used to model driver steering behavior for collision avoidance. The performance of this model is measured by simulation of two collision avoidance scenarios at a wide range of speeds from 20 km/h to 170 km/h. The model is validated by experiments on a driving simulator. The results demonstrate that the model follows human behavior accurately with a mean error of 7 percent.
comment: Cognitive systems Research
Personalized Collaborative Learning with Affinity-Based Variance Reduction
Multi-agent learning faces a fundamental tension: leveraging distributed collaboration without sacrificing the personalization needed for diverse agents. This tension intensifies when aiming for full personalization while adapting to unknown heterogeneity levels -- gaining collaborative speedup when agents are similar, without performance degradation when they are different. Embracing the challenge, we propose personalized collaborative learning (PCL), a novel framework for heterogeneous agents to collaboratively learn personalized solutions with seamless adaptivity. Through carefully designed bias correction and importance correction mechanisms, our method AffPCL robustly handles both environment and objective heterogeneity. We prove that AffPCL reduces sample complexity over independent learning by a factor of $\max\{n^{-1}, \delta\}$, where $n$ is the number of agents and $\delta\in[0,1]$ measures their heterogeneity. This affinity-based acceleration automatically interpolates between the linear speedup of federated learning in homogeneous settings and the baseline of independent learning, without requiring prior knowledge of the system. Our analysis further reveals that an agent may obtain linear speedup even by collaborating with arbitrarily dissimilar agents, unveiling new insights into personalization and collaboration in the high heterogeneity regime.
DeGrip: A Compact Cable-driven Robotic Gripper for Desktop Disassembly
Intelligent robotic disassembly of end-of-life (EOL) products has been a long-standing challenge in robotics. While machine learning techniques have shown promise, the lack of specialized hardware limits their application in real-world scenarios. We introduce DeGrip, a customized gripper designed for the disassembly of EOL computer desktops. DeGrip provides three degrees of freedom (DOF), enabling arbitrary configurations within the disassembly environment when mounted on a robotic manipulator. It employs a cable-driven transmission mechanism that reduces its overall size and enables operation in confined spaces. The wrist is designed to decouple the actuation of wrist and jaw joints. We also developed an EOL desktop disassembly environment in Isaac Sim to evaluate the effectiveness of DeGrip. The tasks were designed to demonstrate its ability to operate in confined spaces and disassemble components in arbitrary configurations. The evaluation results confirm the capability of DeGrip for EOL desktop disassembly.
Heterogeneous Multi-Agent Task-Assignment with Uncertain Execution Times and Preferences
While sequential task assignment for a single agent has been widely studied, such problems in a multi-agent setting, where the agents have heterogeneous task preferences or capabilities, remain less well-characterized. We study a multi-agent task assignment problem where a central planner assigns recurring tasks to multiple members of a team over a finite time horizon. For any given task, the members have heterogeneous capabilities in terms of task completion times, task resource consumption (which can model variables such as energy or attention), and preferences in terms of the rewards they collect upon task completion. We assume that the reward, execution time, and resource consumption for each member to complete any task are stochastic with unknown distributions. The goal of the planner is to maximize the total expected reward that the team receives over the problem horizon while ensuring that the resource consumption required for any assigned task is within the capability of the agent. We propose and analyze a bandit algorithm for this problem. Since the bandit algorithm relies on solving an optimal task assignment problem repeatedly, we analyze the achievable regret in two cases: when we can solve the optimal task assignment exactly and when we can solve it only approximately.
comment: 14 pages
Explore-then-Commit for Nonstationary Linear Bandits with Latent Dynamics
We study a nonstationary bandit problem where rewards depend on both actions and latent states, the latter governed by unknown linear dynamics. Crucially, the state dynamics also depend on the actions, resulting in tension between short-term and long-term rewards. We propose an explore-then-commit algorithm for a finite horizon $T$. During the exploration phase, random Rademacher actions enable estimation of the Markov parameters of the linear dynamics, which characterize the action-reward relationship. In the commit phase, the algorithm uses the estimated parameters to design an optimized action sequence for long-term reward. Our proposed algorithm achieves $\tilde{\mathcal{O}}(T^{2/3})$ regret. Our analysis handles two key challenges: learning from temporally correlated rewards, and designing action sequences with optimal long-term reward. We address the first challenge by providing near-optimal sample complexity and error bounds for system identification using bilinear rewards. We address the second challenge by proving an equivalence with indefinite quadratic optimization over a hypercube, a known NP-hard problem. We provide a sub-optimality guarantee for this problem, enabling our regret upper bound. Lastly, we propose a semidefinite relaxation with Goemans-Williamson rounding as a practical approach.
Residual Correction Models for AC Optimal Power Flow Using DC Optimal Power Flow Solutions
Solving the nonlinear AC optimal power flow (AC OPF) problem remains a major computational bottleneck for real-time grid operations. In this paper, we propose a residual learning paradigm that uses fast DC optimal power flow (DC OPF) solutions as a baseline, and learns only the nonlinear corrections required to provide the full AC-OPF solution. The method utilizes a topology-aware Graph Neural Network with local attention and two-level DC feature integration, trained using a physics-informed loss that enforces AC power-flow feasibility and operational limits. Evaluations on OPFData for 57-, 118-, and 2000-bus systems show around 25% lower MSE, up to 3X reduction in feasibility error, and up to 13X runtime speedup compared to conventional AC OPF solvers. The model maintains accuracy under N-1 contingencies and scales efficiently to large networks. These results demonstrate that residual learning is a practical and scalable bridge between linear approximations and AC-feasible OPF, enabling near real-time operational decision making.
Learning a Generalized Model for Substation Level Voltage Estimation in Distribution Networks
Accurate voltage estimation in distribution networks is critical for real-time monitoring and increasing the reliability of the grid. As DER penetration and distribution level voltage variability increase, robust distribution system state estimation (DSSE) has become more essential to maintain safe and efficient operations. Traditional DSSE techniques, however, struggle with sparse measurements and the scale of modern feeders, limiting their scalability to large networks. This paper presents a hierarchical graph neural network for substation-level voltage estimation that exploits both electrical topology and physical features, while remaining robust to the low observability levels common to real-world distribution networks. Leveraging the public SMART-DS datasets, the model is trained and evaluated on thousands of buses across multiple substations and DER penetration scenarios. Comprehensive experiments demonstrate that the proposed method achieves up to 2 times lower RMSE than alternative data-driven models, and maintains high accuracy with as little as 1\% measurement coverage. The results highlight the potential of GNNs to enable scalable, reproducible, and data-driven voltage monitoring for distribution systems.
DRL-Based Resource Allocation for Energy-Efficient IRS-Assisted UAV Spectrum Sharing Systems
Intelligent reflecting surface (IRS) assisted unmanned aerial vehicle (UAV) systems provide a new paradigm for reconfigurable and flexible wireless communications. To enable more energy efficient and spectrum efficient IRS assisted UAV wireless communications, this paper introduces a novel IRS-assisted UAV enabled spectrum sharing system with orthogonal frequency division multiplexing (OFDM). The goal is to maximize the energy efficiency (EE) of the secondary network by jointly optimizing the beamforming, subcarrier allocation, IRS phase shifts, and the UAV trajectory subject to practical transmit power and passive reflection constraints as well as UAV physical limitations. A physically grounded propulsion-energy model is adopted, with its tight upper bound used to form a tractable EE lower bound for the spectrum sharing system. To handle highly non convex, time coupled optimization problems with a mixed continuous and discrete policy space, we develop a deep reinforcement learning (DRL) approach based on the actor critic framework. Extended experiments show the significant EE improvement of the proposed DRL-based approach compared to several benchmark schemes, thus demonstrating the effectiveness and robustness of the proposed approach with mobility.
comment: 7 pages, 3 figures, 1 algorithm. LaTeX class: IEEEtran
Through-the-Earth Magnetic Induction Communication and Networking: A Comprehensive Survey
Magnetic induction (MI) communication (MIC) has emerged as a promising candidate for underground communication networks due to its excellent penetration capabilities. Integration with Space-Air-Ground-Underground (SAGUI) networks in next-generation mobile communication systems requires a well-defined network architecture. A recent discovery in MIC research, MI fast fading, remains in its early stages and presents unique challenges. This paper provides a comprehensive survey on through-the-earth (TTE) MIC, covering MI applications, channel modeling, point-to-point MIC design, relay techniques, network frameworks, and emerging technologies. We compare various MIC applications to highlight TTE-specific challenges and review the principles of channel modeling, addressing both MI slow fading and MI fast fading, along with its potential impact on existing MIC theories. We conduct a fine-grained decomposition of MI channel power gain into four distinct physical parameters, and propose a novel geometric model to analyze MI fast fading. We also summarize MI relay techniques, examine crosstalk effects in relay and high-density networks, and explore key research tasks within the OSI framework for a holistic MI network protocol in SAGUI. To bridge the gaps identified, we propose a MIC framework that supports TCP/IP and Linux, enabling full implementation of existing and emerging MIC solutions. This framework empowers researchers to leverage Linux resources and deep learning platforms for accelerated development of MIC in SAGUI networks. Remaining research challenges, open issues, and promising novel techniques are further identified to advance MIC research.
comment: This work has been accepted by the IEEE Communications Surveys & Tutorials (COMST) for publication. The final published version will be available on IEEE Xplore
Kernel-based Koopman approximants for control: Flexible sampling, error analysis, and stability
Data-driven techniques for analysis, modeling, and control of complex dynamical systems are on the uptake. Koopman theory provides the theoretical foundation for the popular kernel extended dynamic mode decomposition (kEDMD). In this work, we propose a novel kEDMD scheme to approximate nonlinear control systems accompanied by an in-depth error analysis. Key features are regularization-based robustness and an adroit decomposition into micro and macro grids enabling flexible sampling. But foremost, we prove proportionality, i.e., explicit dependence on the distance to the (controlled) equilibrium, of the derived bound on the full approximation error. Leveraging this key property, we rigorously show that asymptotic stability of the data-driven surrogate (control) system implies asymptotic stability of the original (control) system and vice versa.
comment: 29 pages, 5 figures
Contact-Aware Safety in Soft Robots Using High-Order Control Barrier and Lyapunov Functions
Robots operating alongside people, particularly in sensitive scenarios such as aiding the elderly with daily tasks or collaborating with workers in manufacturing, must guarantee safety and cultivate user trust. Continuum soft manipulators promise safety through material compliance, but as designs evolve for greater precision, payload capacity, and speed, and increasingly incorporate rigid elements, their injury risk resurfaces. In this letter, we introduce a comprehensive High-Order Control Barrier Function (HOCBF) + High-Order Control Lyapunov Function (HOCLF) framework that enforces strict contact force limits across the entire soft-robot body during environmental interactions. Our approach combines a differentiable Piecewise Cosserat-Segment (PCS) dynamics model with a convex-polygon distance approximation metric, named Differentiable Conservative Separating Axis Theorem (DCSAT), based on the soft robot geometry to enable real-time, whole-body collision detection, resolution, and enforcement of the safety constraints. By embedding HOCBFs into our optimization routine, we guarantee safety, allowing, for instance, safe navigation in operational space under HOCLF-driven motion objectives. Extensive planar simulations demonstrate that our method maintains safety-bounded contacts while achieving precise shape and task-space regulation. This work thus lays a foundation for the deployment of soft robots in human-centric environments with provable safety and performance.
comment: 8 pages
Pseudo-Kinematic Trajectory Control and Planning of Tracked Vehicles
Tracked vehicles distribute their weight continuously over a large surface area (the tracks). This distinctive feature makes them the preferred choice for vehicles required to traverse soft and uneven terrain. From a robotics perspective, however, this flexibility comes at a cost: the complexity of modelling the system and the resulting difficulty in designing theoretically sound navigation solutions. In this paper, we aim to bridge this gap by proposing a framework for the navigation of tracked vehicles, built upon three key pillars. The first pillar comprises two models: a simulation model and a control-oriented model. The simulation model captures the intricate terramechanics dynamics arising from soil-track interaction and is employed to develop faithful digital twins of the system across a wide range of operating conditions. The control-oriented model is pseudo-kinematic and mathematically tractable, enabling the design of efficient and theoretically robust control schemes. The second pillar is a Lyapunov-based feedback trajectory controller that provides certifiable tracking guarantees. The third pillar is a portfolio of motion planning solutions, each offering different complexity-accuracy trade-offs. The various components of the proposed approach are validated through an extensive set of simulation and experimental data.
RadioDiff-$k^2$: Helmholtz Equation Informed Generative Diffusion Model for Multi-Path Aware Radio Map Construction
In this paper, we propose a novel physics-informed generative learning approach, named RadioDiff-$k^2$, for accurate and efficient multipath-aware radio map (RM) construction. As future wireless communication evolves towards environment-aware paradigms, the accurate construction of RMs becomes crucial yet highly challenging. Conventional electromagnetic (EM)-based methods, such as full-wave solvers and ray-tracing approaches, exhibit substantial computational overhead and limited adaptability to dynamic scenarios. Although existing neural network (NN) approaches have efficient inferencing speed, they lack sufficient consideration of the underlying physics of EM wave propagation, limiting their effectiveness in accurately modeling critical EM singularities induced by complex multipath environments. To address these fundamental limitations, we propose a novel physics-inspired RM construction method guided explicitly by the Helmholtz equation, which inherently governs EM wave propagation. Specifically, based on the analysis of partial differential equations (PDEs), we theoretically establish a direct correspondence between EM singularities, which correspond to the critical spatial features influencing wireless propagation, and regions defined by negative wave numbers in the Helmholtz equation. We then design an innovative dual diffusion model (DM)-based large artificial intelligence framework comprising one DM dedicated to accurately inferring EM singularities and another DM responsible for reconstructing the complete RM using these singularities along with environmental contextual information. Experimental results demonstrate that the proposed RadioDiff-$k^2$ framework achieves state-of-the-art (SOTA) performance in both image-level RM construction and localization tasks, while maintaining inference latency within a few hundred milliseconds.
A kernel-based approach to physics-informed nonlinear system identification
This paper presents a kernel-based framework for physics-informed nonlinear system identification. The key contribution is a structured methodology that extends kernel-based techniques to seamlessly embed partially known physics-based models, improving parameter estimation and overall model accuracy. The proposed method enhances traditional modeling approaches by embedding a parametric model, which provides physical interpretability, with a kernel-based function, which accounts for unmodeled dynamics. The two models' components are identified from the data simultaneously, thereby minimizing a suitable cost that balances the relative importance of the physical and the black-box parts of the model. Additionally, nonlinear state smoothing is employed to address scenarios involving state-space models with not fully measurable states. Numerical simulations on an experimental benchmark system demonstrate the effectiveness of the proposed approach, achieving up to 51% reduction in simulation root mean square error compared to physics-only models and 31% performance improvement over state-of-the-art identification techniques.
comment: [Extended version] This work has been submitted to the IEEE for possible publication
Stochastic Model Predictive Control for Sub-Gaussian Noise
We propose a stochastic Model Predictive Control (MPC) framework that ensures closed-loop chance constraint satisfaction for linear systems with general sub-Gaussian process and measurement noise. By considering sub-Gaussian noise, we can provide guarantees for a large class of distributions, including time-varying distributions. Specifically, we first provide a new characterization of sub-Gaussian random vectors using matrix variance proxy, which can more accurately represent the predicted state distribution. We then derive tail bounds under linear propagation for the new characterization, enabling tractable computation of probabilistic reachable sets of linear systems. Lastly, we utilize these probabilistic reachable sets to formulate a stochastic MPC scheme that provides closed-loop guarantees for general sub-Gaussian noise. We further demonstrate our approach in simulations, including a challenging task of surgical planning from image observations.
comment: 15 pages, 6 figures, submitted to Automatica
Multi-stage model predictive control for slug flow crystallizers using uncertainty-aware surrogate models
This paper presents a novel dynamic model for slug flow crystallizers that addresses the challenges of spatial distribution without backmixing or diffusion, potentially enabling advanced model-based control. The developed model can accurately describe the main characteristics of slug flow crystallizers, including slug-to-slug variability but leads to a high computational complexity due to the consideration of partial differential equations and population balance equations. For that reason, the model cannot be directly used for process optimization and control. To solve this challenge, we propose two different approaches, conformalized quantile regression and Bayesian last layer neural networks, to develop surrogate models with uncertainty quantification capabilities. These surrogates output a prediction of the system states together with an uncertainty of these predictions to account for process variability and model uncertainty. We use the uncertainty of the predictions to formulate a robust model predictive control approach, enabling robust real-time advanced control of a slug flow crystallizer.
Decentralized Real-Time Iterations for Distributed NMPC
This article presents a Real-Time Iteration (RTI) scheme for distributed Nonlinear Model Predictive Control (NMPC). The scheme transfers the well-known RTI approach, a key enabler for many industrial real-time NMPC implementations, to the setting of cooperative distributed control. At each sampling instant, one outer iteration of a bi-level decentralized Sequential Quadratic Programming (dSQP) method is applied to a centralized optimal control problem. This ensures that real-time requirements are met and it facilitates cooperation between subsystems. Combining novel dSQP convergence results with RTI stability guarantees, we prove local exponential stability under standard assumptions on the MPC design with and without terminal constraints. The proposed scheme only requires neighbor-to-neighbor communication and avoids a central coordinator. A numerical example with coupled inverted pendulums demonstrates the efficacy of the approach.
A Set-Theoretic Robust Control Approach for Linear Quadratic Games with Unknown Counterparts
Ensuring robust decision-making in multi-agent systems is challenging when agents have distinct, possibly conflicting objectives and lack full knowledge of each other's strategies. This is apparent in safety-critical applications such as human-robot interaction and assisted driving, where uncertainty arises not only from unknown adversary strategies but also from external disturbances. To address this, the paper proposes a robust adaptive control approach based on linear quadratic differential games. Our method allows a controlled agent to iteratively refine its belief about the adversary strategy and disturbances using a set-membership approach, while simultaneously adapting its policy to guarantee robustness against the uncertain adversary policy and improve performance over time. We formally derive theoretical guarantees on the robustness of the proposed control scheme and its convergence to $\epsilon$-Nash strategies. The effectiveness of our approach is demonstrated in a numerical simulation.
comment: Accepted for publication in the Proceedings of the 64th IEEE Conference on Decision and Control
Wearable and Ultra-Low-Power Fusion of EMG and A-Mode US for Hand-Wrist Kinematic Tracking
Hand gesture recognition based on biosignals has shown strong potential for developing intuitive human-machine interaction strategies that closely mimic natural human behavior. In particular, sensor fusion approaches have gained attention for combining complementary information and overcoming the limitations of individual sensing modalities, thereby enabling more robust and reliable systems. Among them, the fusion of surface electromyography (EMG) and A-mode ultrasound (US) is very promising. However, prior solutions rely on power-hungry platforms unsuitable for multi-day use and are limited to discrete gesture classification. In this work, we present an ultra-low-power (sub-50 mW) system for concurrent acquisition of 8-channel EMG and 4-channel A-mode US signals, integrating two state-of-the-art platforms into fully wearable, dry-contact armbands. We propose a framework for continuous tracking of 23 degrees of freedom (DoFs), 20 for the hand and 3 for the wrist, using a kinematic glove for ground-truth labeling. Our method employs lightweight encoder-decoder architectures with multi-task learning to simultaneously estimate hand and wrist joint angles. Experimental results under realistic sensor repositioning conditions demonstrate that EMG-US fusion achieves a root mean squared error of $10.6^\circ\pm2.0^\circ$, compared to $12.0^\circ\pm1^\circ$ for EMG and $13.1^\circ\pm2.6^\circ$ for US, and a R$^2$ score of $0.61\pm0.1$, with $0.54\pm0.03$ for EMG and $0.38\pm0.20$ for US.
comment: 5 pages, 3 figures
VLMLight: Safety-Critical Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning Architecture
Traffic signal control (TSC) is a core challenge in urban mobility, where real-time decisions must balance efficiency and safety. Existing methods - ranging from rule-based heuristics to reinforcement learning (RL) - often struggle to generalize to complex, dynamic, and safety-critical scenarios. We introduce VLMLight, a novel TSC framework that integrates vision-language meta-control with dual-branch reasoning. At the core of VLMLight is the first image-based traffic simulator that enables multi-view visual perception at intersections, allowing policies to reason over rich cues such as vehicle type, motion, and spatial density. A large language model (LLM) serves as a safety-prioritized meta-controller, selecting between a fast RL policy for routine traffic and a structured reasoning branch for critical cases. In the latter, multiple LLM agents collaborate to assess traffic phases, prioritize emergency vehicles, and verify rule compliance. Experiments show that VLMLight reduces waiting times for emergency vehicles by up to 65% over RL-only systems, while preserving real-time performance in standard conditions with less than 1% degradation. VLMLight offers a scalable, interpretable, and safety-aware solution for next-generation traffic signal control.
comment: 25 pages, 15 figures
Learning to Capture Rocks using an Excavator: A Reinforcement Learning Approach with Guiding Reward Formulation
Rock capturing with standard excavator buckets is a challenging task typically requiring the expertise of skilled operators. Unlike soil digging, it involves manipulating large, irregular rocks in unstructured environments where complex contact interactions with granular material make model-based control impractical. Existing autonomous excavation methods focus mainly on continuous media or rely on specialized grippers, limiting their applicability to real-world construction sites. This paper introduces a fully data-driven control framework for rock capturing that eliminates the need for explicit modeling of rock or soil properties. A model-free reinforcement learning agent is trained in the AGX Dynamics simulator using the Proximal Policy Optimization (PPO) algorithm and a guiding reward formulation. The learned policy outputs joint velocity commands directly to the boom, arm, and bucket of a CAT365 excavator model. Robustness is enhanced through extensive domain randomization of rock geometry, density, and mass, as well as the initial configurations of the bucket, rock, and goal position. To the best of our knowledge, this is the first study to develop and evaluate an RL-based controller for the rock capturing task. Experimental results show that the policy generalizes well to unseen rocks and varying soil conditions, achieving high success rates comparable to those of human participants while maintaining machine stability. These findings demonstrate the feasibility of learning-based excavation strategies for discrete object manipulation without requiring specialized hardware or detailed material models.
Grid-Aware Real-Time Dispatch of Microgrid with Generalized Energy Storage: A Prediction-Free Online Optimization Approach
This paper proposes a novel prediction-free two-stage coordinated dispatch framework for the real-time dispatch of grid-connected microgrid with generalized energy storages (GES). The proposed framework explicitly addresses grid awareness, non-anticipativity constraints, and the time-coupling characteristics of GES, providing microgrid operators with a near-optimal, reliable, and adaptable dispatch tool. In the offline stage, we generate the hindsight state-of-charge (SoC) trajectories of GES by solving the multi-period economic dispatch with historical scenarios. Subsequently, leveraging this historical information (SoC trajectories, net loads, and electricity prices), we synthesize and dynamically update online references for both SoC and opportunity cost through kernel regression. We propose an adaptive Lagrange multiplier-based online convex optimization algorithm, which innovatively incorporates reference tracking for global vision and expert-tracking for step-size updates. We provide theoretical proof to show that the proposed OCO algorithm achieves a sublinear bound of both dynamic regret and time-varying hard constraint violation. Numerical studies using ground-truth data from the Australian Energy Market Operator demonstrate that the proposed method outperforms state-of-the-art methods, reducing operational costs by 5.0-6.2% and voltage violations by 0.8-9.1%. These improvements mainly result from mitigating myopia by reference tracking and the adaptive capability provided by dynamically updated references and adaptive Lagrange multipliers. Sensitivity analysis demonstrates the robustness, computational efficiency, and scalability of the proposed method.
A Multimodal Lightweight Approach to Fault Diagnosis of Induction Motors in High-Dimensional Dataset
An accurate AI-based diagnostic system for induction motors (IMs) holds the potential to enhance proactive maintenance, mitigating unplanned downtime and curbing overall maintenance costs within an industrial environment. Notably, among the prevalent faults in IMs, a Broken Rotor Bar (BRB) fault is frequently encountered. Researchers have proposed various fault diagnosis approaches using signal processing (SP), machine learning (ML), deep learning (DL), and hybrid architectures for BRB faults. One limitation in the existing literature is the training of these architectures on relatively small datasets, risking overfitting when implementing such systems in industrial environments. This paper addresses this limitation by implementing large-scale data of BRB faults by using a transfer-learning-based lightweight DL model named ShuffleNetV2 for diagnosing one, two, three, and four BRB faults using current and vibration signal data. Spectral images for training and testing are generated using a Short-Time Fourier Transform (STFT). The dataset comprises 57,500 images, with 47,500 used for training and 10,000 for testing. Remarkably, the ShuffleNetV2 model exhibited superior performance, in less computational cost as well as accurately classifying 98.856% of spectral images. To further enhance the visualization of harmonic sidebands resulting from broken bars, Fast Fourier Transform (FFT) is applied to current and vibration data. The paper also provides insights into the training and testing times for each model, contributing to a comprehensive understanding of the proposed fault diagnosis methodology. The findings of our research provide valuable insights into the performance and efficiency of different ML and DL models, offering a foundation for the development of robust fault diagnosis systems for induction motors in industrial settings.
Feedback Stackelberg-Nash equilibria in difference games with quasi-hierarchical interactions and inequality constraints
In this paper, we study a class of two-player deterministic finite-horizon difference games with coupled inequality constraints, where each player has two types of decision variables: one involving sequential interactions and the other simultaneous interactions. We refer to this class of games as quasi-hierarchical dynamic games and define a solution concept called the feedback Stackelberg-Nash (FSN) equilibrium. Under separability assumption on cost functions, we provide a recursive formulation of the FSN solution using dynamic programming. We show that the FSN solution can be derived from the parametric feedback Stackelberg solution of an associated unconstrained game involving only sequential interactions, with a specific choice of the parameters that satisfy certain implicit complementarity conditions. For the linear-quadratic case, we show that an FSN solution is obtained by reformulating these complementarity conditions as a single large-scale linear complementarity problem. Finally, we illustrate our results using a dynamic duopoly game with production constraints.
Real-Time Linear MPC for Quadrotors on SE(3): An Analytical Koopman-based Realization
This letter presents an analytical linear parameter-varying (LPV) representation of quadrotor dynamics utilizing Koopman theory, facilitating computationally efficient linear model predictive control (LMPC) for real-time trajectory tracking. By leveraging carefully designed Koopman observables, the proposed approach enables a compact lifted-space evolution that mitigates the curse of dimensionality while preserving the nonlinear characteristics of the system. Although model predictive control (MPC) is a powerful strategy for quadrotor control, it faces a trade-off between the high computational cost of nonlinear MPC (NMPC) and the reduced accuracy of LMPC. To address this gap, we introduce KQ-LMPC (Koopman Quasilinear LPV MPC), which leverages the Koopman-lifted LPV formulation to enforce constraints, ensure lower computational burden and real-time feasibility, and deliver tracking performance comparable to NMPC. Experimental validation confirms the effectiveness of the framework in reasonably agile flight. To the best of our knowledge, this is the first experimentally validated LMPC for quadrotors that employs analytically derived Koopman observables without requiring training data.
comment: 6 pages, 3 figures, accepted for publication at IEEE Robotics and Automation Letters
Recursive Gaussian Process State Space Model
Learning dynamical models from data is not only fundamental but also holds great promise for advancing principle discovery, time-series prediction, and controller design. Among various approaches, Gaussian Process State-Space Models (GPSSMs) have recently gained significant attention due to their combination of flexibility and interpretability. However, for online learning, the field lacks an efficient method suitable for scenarios where prior information regarding data distribution and model function is limited. To address this issue, this paper proposes a recursive GPSSM method with adaptive capabilities for both operating domains and Gaussian process (GP) hyperparameters. Specifically, we first utilize first-order linearization to derive a Bayesian update equation for the joint distribution between the system state and the GP model, enabling closed-form and domain-independent learning. Second, an online selection algorithm for inducing points is developed based on informative criteria to achieve lightweight learning. Third, to support online hyperparameter optimization, we recover historical measurement information from the current filtering distribution. Comprehensive evaluations on both synthetic and real-world datasets demonstrate the superior accuracy, computational efficiency, and adaptability of our method compared to state-of-the-art online GPSSM techniques.
A Human-Vector Susceptible-Infected-Susceptible Model for Analyzing and Controlling the Spread of Vector-Borne Diseases
We propose an epidemic model for the spread of vector-borne diseases. The model, which is built extending the classical susceptible-infected-susceptible model, accounts for two populations -- humans and vectors -- and for cross-contagion between the two species, whereby humans become infected upon interaction with carrier vectors, and vectors become carriers after interaction with infected humans. We formulate the model as a system of ordinary differential equations and leverage monotone systems theory to rigorously characterize the epidemic dynamics. Specifically, we characterize the global asymptotic behavior of the disease, determining conditions for quick eradication of the disease (i.e., for which all trajectories converge to a disease-free equilibrium), or convergence to a (unique) endemic equilibrium. Then, we incorporate two control actions: namely, vector control and incentives to adopt protection measures. Using the derived mathematical tools, we assess the impact of these two control actions and determine the optimal control policy.
comment: Published in the Proceedings of the 2025 European Control Conference (ECC)
Robust Closed-Form Control for MIMO Nonlinear Systems under Conflicting Time-Varying Hard and Soft Constraints (extended version)
This paper introduces a novel robust closed-form control law to handle time-varying hard and soft constraints in uncertain high-relative-degree nonlinear MIMO systems. These constraints represent spatiotemporal specifications in mechanical systems' operational space, with hard constraints ensuring safety-critical requirements and soft constraints encoding performance or task objectives. Initially, all constraints are consolidated into two separate scalar time-varying hard and soft constraint functions, whose positive level sets define feasible regions. A closed-form control law is developed to enforce these constraints using appropriately designed reciprocal barriers and nonlinear transformation functions. When conflicts between hard and soft constraints arise, the control law prioritizes hard constraints by virtually relaxing soft constraints via a dynamic relaxation law. Notably, the proposed control law maintains low complexity by avoiding approximation schemes for coping with system uncertainties. Simulation results confirm the effectiveness of the proposed method.
comment: 18 pages, 6 figures
Robotics
Dynamic Recalibration in LiDAR SLAM: Integrating AI and Geometric Methods with Real-Time Feedback Using INAF Fusion
This paper presents a novel fusion technique for LiDAR Simultaneous Localization and Mapping (SLAM), aimed at improving localization and 3D mapping using LiDAR sensor. Our approach centers on the Inferred Attention Fusion (INAF) module, which integrates AI with geometric odometry. Utilizing the KITTI dataset's LiDAR data, INAF dynamically adjusts attention weights based on environmental feedback, enhancing the system's adaptability and measurement accuracy. This method advances the precision of both localization and 3D mapping, demonstrating the potential of our fusion technique to enhance autonomous navigation systems in complex scenarios.
comment: 9 pages, 9 figures
DexCanvas: Bridging Human Demonstrations and Robot Learning for Dexterous Manipulation
We present DexCanvas, a large-scale hybrid real-synthetic human manipulation dataset containing 7,000 hours of dexterous hand-object interactions seeded from 70 hours of real human demonstrations, organized across 21 fundamental manipulation types based on the Cutkosky taxonomy. Each entry combines synchronized multi-view RGB-D, high-precision mocap with MANO hand parameters, and per-frame contact points with physically consistent force profiles. Our real-to-sim pipeline uses reinforcement learning to train policies that control an actuated MANO hand in physics simulation, reproducing human demonstrations while discovering the underlying contact forces that generate the observed object motion. DexCanvas is the first manipulation dataset to combine large-scale real demonstrations, systematic skill coverage based on established taxonomies, and physics-validated contact annotations. The dataset can facilitate research in robotic manipulation learning, contact-rich control, and skill transfer across different hand morphologies.
Few-Shot Demonstration-Driven Task Coordination and Trajectory Execution for Multi-Robot Systems
In this paper, we propose a novel few-shot learning framework for multi-robot systems that integrate both spatial and temporal elements: Few-Shot Demonstration-Driven Task Coordination and Trajectory Execution (DDACE). Our approach leverages temporal graph networks for learning task-agnostic temporal sequencing and Gaussian Processes for spatial trajectory modeling, ensuring modularity and generalization across various tasks. By decoupling temporal and spatial aspects, DDACE requires only a small number of demonstrations, significantly reducing data requirements compared to traditional learning from demonstration approaches. To validate our proposed framework, we conducted extensive experiments in task environments designed to assess various aspects of multi-robot coordination-such as multi-sequence execution, multi-action dynamics, complex trajectory generation, and heterogeneous configurations. The experimental results demonstrate that our approach successfully achieves task execution under few-shot learning conditions and generalizes effectively across dynamic and diverse settings. This work underscores the potential of modular architectures in enhancing the practicality and scalability of multi-robot systems in real-world applications. Additional materials are available at https://sites.google.com/view/ddace.
HEADER: Hierarchical Robot Exploration via Attention-Based Deep Reinforcement Learning with Expert-Guided Reward
This work pushes the boundaries of learning-based methods in autonomous robot exploration in terms of environmental scale and exploration efficiency. We present HEADER, an attention-based reinforcement learning approach with hierarchical graphs for efficient exploration in large-scale environments. HEADER follows existing conventional methods to construct hierarchical representations for the robot belief/map, but further designs a novel community-based algorithm to construct and update a global graph, which remains fully incremental, shape-adaptive, and operates with linear complexity. Building upon attention-based networks, our planner finely reasons about the nearby belief within the local range while coarsely leveraging distant information at the global scale, enabling next-best-viewpoint decisions that consider multi-scale spatial dependencies. Beyond novel map representation, we introduce a parameter-free privileged reward that significantly improves model performance and produces near-optimal exploration behaviors, by avoiding training objective bias caused by handcrafted reward shaping. In simulated challenging, large-scale exploration scenarios, HEADER demonstrates better scalability than most existing learning and non-learning methods, while achieving a significant improvement in exploration efficiency (up to 20%) over state-of-the-art baselines. We also deploy HEADER on hardware and validate it in complex, large-scale real-life scenarios, including a 300m*230m campus environment.
Freehand 3D Ultrasound Imaging: Sim-in-the-Loop Probe Pose Optimization via Visual Servoing
Freehand 3D ultrasound (US) imaging using conventional 2D probes offers flexibility and accessibility for diverse clinical applications but faces challenges in accurate probe pose estimation. Traditional methods depend on costly tracking systems, while neural network-based methods struggle with image noise and error accumulation, compromising reconstruction precision. We propose a cost-effective and versatile solution that leverages lightweight cameras and visual servoing in simulated environments for precise 3D US imaging. These cameras capture visual feedback from a textured planar workspace. To counter occlusions and lighting issues, we introduce an image restoration method that reconstructs occluded regions by matching surrounding texture patterns. For pose estimation, we develop a simulation-in-the-loop approach, which replicates the system setup in simulation and iteratively minimizes pose errors between simulated and real-world observations. A visual servoing controller refines the alignment of camera views, improving translational estimation by optimizing image alignment. Validations on a soft vascular phantom, a 3D-printed conical model, and a human arm demonstrate the robustness and accuracy of our approach, with Hausdorff distances to the reference reconstructions of 0.359 mm, 1.171 mm, and 0.858 mm, respectively. These results confirm the method's potential for reliable freehand 3D US reconstruction.
Integration of a Variable Stiffness Link for Long-Reach Aerial Manipulation
This paper presents the integration of a Variable Stiffness Link (VSL) for long-reach aerial manipulation, enabling adaptable mechanical coupling between an aerial multirotor platform and a dual-arm manipulator. Conventional long-reach manipulation systems rely on rigid or cable connections, which limit precision or transmit disturbances to the aerial vehicle. The proposed VSL introduces an adjustable stiffness mechanism that allows the link to behave either as a flexible rope or as a rigid rod, depending on task requirements. The system is mounted on a quadrotor equipped with the LiCAS dual-arm manipulator and evaluated through teleoperated experiments, involving external disturbances and parcel transportation tasks. Results demonstrate that varying the link stiffness significantly modifies the dynamic interaction between the UAV and the payload. The flexible configuration attenuates external impacts and aerodynamic perturbations, while the rigid configuration improves positional accuracy during manipulation phases. These results confirm that VSL enhances versatility and safety, providing a controllable trade-off between compliance and precision. Future work will focus on autonomous stiffness regulation, multi-rope configurations, cooperative aerial manipulation and user studies to further assess its impact on teleoperated and semi-autonomous aerial tasks.
Educational SoftHand-A: Building an Anthropomorphic Hand with Soft Synergies using LEGO MINDSTORMS IROS 2025
This paper introduces an anthropomorphic robot hand built entirely using LEGO MINDSTORMS: the Educational SoftHand-A, a tendon-driven, highly-underactuated robot hand based on the Pisa/IIT SoftHand and related hands. To be suitable for an educational context, the design is constrained to use only standard LEGO pieces with tests using common equipment available at home. The hand features dual motors driving an agonist/antagonist opposing pair of tendons on each finger, which are shown to result in reactive fine control. The finger motions are synchonized through soft synergies, implemented with a differential mechanism using clutch gears. Altogether, this design results in an anthropomorphic hand that can adaptively grasp a broad range of objects using a simple actuation and control mechanism. Since the hand can be constructed from LEGO pieces and uses state-of-the-art design concepts for robotic hands, it has the potential to educate and inspire children to learn about the frontiers of modern robotics.
comment: 6 pages. Accepted at IROS 2025
Adaptive Legged Locomotion via Online Learning for Model Predictive Control
We provide an algorithm for adaptive legged locomotion via online learning and model predictive control. The algorithm is composed of two interacting modules: model predictive control (MPC) and online learning of residual dynamics. The residual dynamics can represent modeling errors and external disturbances. We are motivated by the future of autonomy where quadrupeds will autonomously perform complex tasks despite real-world unknown uncertainty, such as unknown payload and uneven terrains. The algorithm uses random Fourier features to approximate the residual dynamics in reproducing kernel Hilbert spaces. Then, it employs MPC based on the current learned model of the residual dynamics. The model is updated online in a self-supervised manner using least squares based on the data collected while controlling the quadruped. The algorithm enjoys sublinear \textit{dynamic regret}, defined as the suboptimality against an optimal clairvoyant controller that knows how the residual dynamics. We validate our algorithm in Gazebo and MuJoCo simulations, where the quadruped aims to track reference trajectories. The Gazebo simulations include constant unknown external forces up to $12\boldsymbol{g}$, where $\boldsymbol{g}$ is the gravity vector, in flat terrain, slope terrain with $20\degree$ inclination, and rough terrain with $0.25m$ height variation. The MuJoCo simulations include time-varying unknown disturbances with payload up to $8~kg$ and time-varying ground friction coefficients in flat terrain.
comment: 9 pages
Improved Extended Kalman Filter-Based Disturbance Observers for Exoskeletons
The nominal performance of mechanical systems is often degraded by unknown disturbances. A two-degree-of-freedom control structure can decouple nominal performance from disturbance rejection. However, perfect disturbance rejection is unattainable when the disturbance dynamic is unknown. In this work, we reveal an inherent trade-off in disturbance estimation subject to tracking speed and tracking uncertainty. Then, we propose two novel methods to enhance disturbance estimation: an interacting multiple model extended Kalman filter-based disturbance observer and a multi-kernel correntropy extended Kalman filter-based disturbance observer. Experiments on an exoskeleton verify that the proposed two methods improve the tracking accuracy $36.3\%$ and $16.2\%$ in hip joint error, and $46.3\%$ and $24.4\%$ in knee joint error, respectively, compared to the extended Kalman filter-based disturbance observer, in a time-varying interaction force scenario, demonstrating the superiority of the proposed method.
VO-DP: Semantic-Geometric Adaptive Diffusion Policy for Vision-Only Robotic Manipulation
In the context of imitation learning, visuomotor-based diffusion policy learning is one of the main directions in robotic manipulation. Most of these approaches rely on point clouds as observation inputs and construct scene representations through point clouds feature learning, which enables them to achieve remarkable accuracy. However, the existing literature lacks an in-depth exploration of vision-only solutions that have significant potential. In this paper, we propose a Vision-Only and single-view Diffusion Policy learning method (VO-DP) that leverages pretrained visual foundation models to achieve effective fusion of semantic and geometric features. We utilize intermediate features from VGGT incorporating semantic features from DINOv2 and geometric features from Alternating Attention blocks. Features are fused via cross-attention and spatially compressed with a CNN to form the input to the policy head. Extensive experiments demonstrate that VO-DP not only outperforms the vision-only baseline DP significantly but also exhibits distinct performance trends against the point cloud-based method DP3: in simulation tasks, VO-DP achieves an average success rate of 64.6% on par with DP3 64.0% and far higher than DP 34.8%, while in real-world tasks, it reaches 87.9%, outperforming both DP3 67.5% and DP 11.2% by a notable margin. Further robustness evaluations confirm that VO-DP remains highly stable under varying conditions including color, size, background, and lighting. Lastly, we open-source a training library for robotic manipulation. Built on Accelerate, this library supports multi-machine and multi-GPU parallel training, as well as mixed precision training. It is compatible with visuomotor policies such as DP, DP3 and VO-DP, and also supports the RoboTwin simulator.
Exploring Conditions for Diffusion models in Robotic Control
While pre-trained visual representations have significantly advanced imitation learning, they are often task-agnostic as they remain frozen during policy learning. In this work, we explore leveraging pre-trained text-to-image diffusion models to obtain task-adaptive visual representations for robotic control, without fine-tuning the model itself. However, we find that naively applying textual conditions - a successful strategy in other vision domains - yields minimal or even negative gains in control tasks. We attribute this to the domain gap between the diffusion model's training data and robotic control environments, leading us to argue for conditions that consider the specific, dynamic visual information required for control. To this end, we propose ORCA, which introduces learnable task prompts that adapt to the control environment and visual prompts that capture fine-grained, frame-specific details. Through facilitating task-adaptive representations with our newly devised conditions, our approach achieves state-of-the-art performance on various robotic control benchmarks, significantly surpassing prior methods.
comment: Project page: https://orca-rc.github.io/
Perfect Prediction or Plenty of Proposals? What Matters Most in Planning for Autonomous Driving
Traditionally, prediction and planning in autonomous driving (AD) have been treated as separate, sequential modules. Recently, there has been a growing shift towards tighter integration of these components, known as Integrated Prediction and Planning (IPP), with the aim of enabling more informed and adaptive decision-making. However, it remains unclear to what extent this integration actually improves planning performance. In this work, we investigate the role of prediction in IPP approaches, drawing on the widely adopted Val14 benchmark, which encompasses more common driving scenarios with relatively low interaction complexity, and the interPlan benchmark, which includes highly interactive and out-of-distribution driving situations. Our analysis reveals that even access to perfect future predictions does not lead to better planning outcomes, indicating that current IPP methods often fail to fully exploit future behavior information. Instead, we focus on high-quality proposal generation, while using predictions primarily for collision checks. We find that many imitation learning-based planners struggle to generate realistic and plausible proposals, performing worse than PDM - a simple lane-following approach. Motivated by this observation, we build on PDM with an enhanced proposal generation method, shifting the emphasis towards producing diverse but realistic and high-quality proposals. This proposal-centric approach significantly outperforms existing methods, especially in out-of-distribution and highly interactive settings, where it sets new state-of-the-art results.
comment: 8 pages, 5 figures
VDRive: Leveraging Reinforced VLA and Diffusion Policy for End-to-end Autonomous Driving
In autonomous driving, dynamic environment and corner cases pose significant challenges to the robustness of ego vehicle's state understanding and decision making. We introduce VDRive, a novel pipeline for end-to-end autonomous driving that explicitly models state-action mapping to address these challenges, enabling interpretable and robust decision making. By leveraging the advancement of the state understanding of the Vision Language Action Model (VLA) with generative diffusion policy-based action head, our VDRive guides the driving contextually and geometrically. Contextually, VLA predicts future observations through token generation pre-training, where the observations are represented as discrete codes by a Conditional Vector Quantized Variational Autoencoder (CVQ-VAE). Geometrically, we perform reinforcement learning fine-tuning of the VLA to predict future trajectories and actions based on current driving conditions. VLA supplies the current state tokens and predicted state tokens for the action policy head to generate hierarchical actions and trajectories. During policy training, a learned critic evaluates the actions generated by the policy and provides gradient-based feedback, forming an actor-critic framework that enables a reinforcement-based policy learning pipeline. Experiments show that our VDRive achieves state-of-the-art performance in the Bench2Drive closed-loop benchmark and nuScenes open-loop planning.
comment: 1st version
Towards Robust Zero-Shot Reinforcement Learning
The recent development of zero-shot reinforcement learning (RL) has opened a new avenue for learning pre-trained generalist policies that can adapt to arbitrary new tasks in a zero-shot manner. While the popular Forward-Backward representations (FB) and related methods have shown promise in zero-shot RL, we empirically found that their modeling lacks expressivity and that extrapolation errors caused by out-of-distribution (OOD) actions during offline learning sometimes lead to biased representations, ultimately resulting in suboptimal performance. To address these issues, we propose Behavior-REgularizEd Zero-shot RL with Expressivity enhancement (BREEZE), an upgraded FB-based framework that simultaneously enhances learning stability, policy extraction capability, and representation learning quality. BREEZE introduces behavioral regularization in zero-shot RL policy learning, transforming policy optimization into a stable in-sample learning paradigm. Additionally, BREEZE extracts the policy using a task-conditioned diffusion model, enabling the generation of high-quality and multimodal action distributions in zero-shot RL settings. Moreover, BREEZE employs expressive attention-based architectures for representation modeling to capture the complex relationships between environmental dynamics. Extensive experiments on ExORL and D4RL Kitchen demonstrate that BREEZE achieves the best or near-the-best performance while exhibiting superior robustness compared to prior offline zero-shot RL methods. The official implementation is available at: https://github.com/Whiterrrrr/BREEZE.
comment: Neurips 2025, 36 pages, 18 figures
Towards Automated Chicken Deboning via Learning-based Dynamically-Adaptive 6-DoF Multi-Material Cutting
Automating chicken shoulder deboning requires precise 6-DoF cutting through a partially occluded, deformable, multi-material joint, since contact with the bones presents serious health and safety risks. Our work makes both systems-level and algorithmic contributions to train and deploy a reactive force-feedback cutting policy that dynamically adapts a nominal trajectory and enables full 6-DoF knife control to traverse the narrow joint gap while avoiding contact with the bones. First, we introduce an open-source custom-built simulator for multi-material cutting that models coupling, fracture, and cutting forces, and supports reinforcement learning, enabling efficient training and rapid prototyping. Second, we design a reusable physical testbed to emulate the chicken shoulder: two rigid "bone" spheres with controllable pose embedded in a softer block, enabling rigorous and repeatable evaluation while preserving essential multi-material characteristics of the target problem. Third, we train and deploy a residual RL policy, with discretized force observations and domain randomization, enabling robust zero-shot sim-to-real transfer and the first demonstration of a learned policy that debones a real chicken shoulder. Our experiments in our simulator, on our physical testbed, and on real chicken shoulders show that our learned policy reliably navigates the joint gap and reduces undesired bone/cartilage contact, resulting in up to a 4x improvement over existing open-loop cutting baselines in terms of success rate and bone avoidance. Our results also illustrate the necessity of force feedback for safe and effective multi-material cutting. The project website is at https://sites.google.com/view/chickendeboning-2026.
comment: 8 Pages, 8 figures
GaussGym: An open-source real-to-sim framework for learning locomotion from pixels
We present a novel approach for photorealistic robot simulation that integrates 3D Gaussian Splatting as a drop-in renderer within vectorized physics simulators such as IsaacGym. This enables unprecedented speed -- exceeding 100,000 steps per second on consumer GPUs -- while maintaining high visual fidelity, which we showcase across diverse tasks. We additionally demonstrate its applicability in a sim-to-real robotics setting. Beyond depth-based sensing, our results highlight how rich visual semantics improve navigation and decision-making, such as avoiding undesirable regions. We further showcase the ease of incorporating thousands of environments from iPhone scans, large-scale scene datasets (e.g., GrandTour, ARKit), and outputs from generative video models like Veo, enabling rapid creation of realistic training worlds. This work bridges high-throughput simulation and high-fidelity perception, advancing scalable and generalizable robot learning. All code and data will be open-sourced for the community to build upon. Videos, code, and data available at https://escontrela.me/gauss_gym/.
Nauplius Optimisation for Autonomous Hydrodynamics
Autonomous Underwater vehicles must operate in strong currents, limited acoustic bandwidth, and persistent sensing requirements where conventional swarm optimisation methods are unreliable. This paper presents NOAH, a novel nature-inspired swarm optimisation algorithm that combines current-aware drift, irreversible settlement in persistent sensing nodes, and colony-based communication. Drawing inspiration from the behaviour of barnacle nauplii, NOAH addresses the critical limitations of existing swarm algorithms by providing hydrodynamic awareness, irreversible anchoring mechanisms, and colony-based communication capabilities essential for underwater exploration missions. The algorithm establishes a comprehensive foundation for scalable and energy-efficient underwater swarm robotics with validated performance analysis. Validation studies demonstrate an 86% success rate for permanent anchoring scenarios, providing a unified formulation for hydrodynamic constraints and irreversible settlement behaviours with an empirical study under flow.
Adaptive Cost-Map-based Path Planning in Partially Unknown Environments with Movable Obstacles
Reliable navigation in disaster-response and other unstructured indoor settings requires robots not only to avoid obstacles but also to recognise when those obstacles can be pushed aside. We present an adaptive, LiDAR and odometry-based path-planning framework that embeds this capability into the ROS2 Nav2 stack. A new Movable Obstacles Layer labels all LiDAR returns missing from a prior static map as tentatively movable and assigns a reduced traversal cost. A companion Slow-Pose Progress Checker monitors the ratio of commanded to actual velocity; when the robot slows appreciably, the local cost is raised from light to heavy, and on a stall to lethal, prompting the global planner to back out and re-route. Gazebo evaluations on a Scout Mini, spanning isolated objects and cluttered corridors, show higher goal-reach rates and fewer deadlocks than a no-layer baseline, with traversal times broadly comparable. Because the method relies only on planar scans and CPU-level computation, it suits resource-constrained search and rescue robots and integrates into heterogeneous platforms with minimal engineering. Overall, the results indicate that interaction-aware cost maps are a lightweight, ROS2-native extension for navigating among potentially movable obstacles in unstructured settings. The full implementation will be released as open source athttps://costmap-namo.github.io.
ASBI: Leveraging Informative Real-World Data for Active Black-Box Simulator Tuning
Black-box simulators are widely used in robotics, but optimizing their parameters remains challenging due to inaccessible likelihoods. Simulation-Based Inference (SBI) tackles this issue using simulation-driven approaches, estimating the posterior from offline real observations and forward simulations. However, in black-box scenarios, preparing observations that contain sufficient information for parameter estimation is difficult due to the unknown relationship between parameters and observations. In this work, we present Active Simulation-Based Inference (ASBI), a parameter estimation framework that uses robots to actively collect real-world online data to achieve accurate black-box simulator tuning. Our framework optimizes robot actions to collect informative observations by maximizing information gain, which is defined as the expected reduction in Shannon entropy between the posterior and the prior. While calculating information gain requires the likelihood, which is inaccessible in black-box simulators, our method solves this problem by leveraging Neural Posterior Estimation (NPE), which leverages a neural network to learn the posterior estimator. Three simulation experiments quantitatively verify that our method achieves accurate parameter estimation, with posteriors sharply concentrated around the true parameters. Moreover, we show a practical application using a real robot to estimate the simulation parameters of cubic particles corresponding to two real objects, beads and gravel, with a bucket pouring action.
Traversability-aware Consistent Situational Graphs for Indoor Localization and Mapping
Scene graphs enhance 3D mapping capabilities in robotics by understanding the relationships between different spatial elements, such as rooms and objects. Recent research extends scene graphs to hierarchical layers, adding and leveraging constraints across these levels. This approach is tightly integrated with pose-graph optimization, improving both localization and mapping accuracy simultaneously. However, when segmenting spatial characteristics, consistently recognizing rooms becomes challenging due to variations in viewpoints and limited field of view (FOV) of sensors. For example, existing real-time approaches often over-segment large rooms into smaller, non-functional spaces that are not useful for localization and mapping due to the time-dependent method. Conversely, their voxel-based room segmentation method often under-segment in complex cases like not fully enclosed 3D space that are non-traversable for ground robots or humans, leading to false constraints in pose-graph optimization. We propose a traversability-aware room segmentation method that considers the interaction between robots and surroundings, with consistent feasibility of traversability information. This enhances both the semantic coherence and computational efficiency of pose-graph optimization. Improved performance is demonstrated through the re-detection frequency of the same rooms in a dataset involving repeated traversals of the same space along the same path, as well as the optimization time consumption.
comment: Accepted by RiTA 2024
CuSfM: CUDA-Accelerated Structure-from-Motion
Efficient and accurate camera pose estimation forms the foundational requirement for dense reconstruction in autonomous navigation, robotic perception, and virtual simulation systems. This paper addresses the challenge via cuSfM, a CUDA-accelerated offline Structure-from-Motion system that leverages GPU parallelization to efficiently employ computationally intensive yet highly accurate feature extractors, generating comprehensive and non-redundant data associations for precise camera pose estimation and globally consistent mapping. The system supports pose optimization, mapping, prior-map localization, and extrinsic refinement. It is designed for offline processing, where computational resources can be fully utilized to maximize accuracy. Experimental results demonstrate that cuSfM achieves significantly improved accuracy and processing speed compared to the widely used COLMAP method across various testing scenarios, while maintaining the high precision and global consistency essential for offline SfM applications. The system is released as an open-source Python wrapper implementation, PyCuSfM, available at https://github.com/nvidia-isaac/pyCuSFM, to facilitate research and applications in computer vision and robotics.
A Generalized Sylvester-Fermat-Torricelli problem with application in disaster relief operations by UAVs
Natural and human-made disasters can cause severe devastation and claim thousands of lives worldwide. Therefore, developing efficient methods for disaster response and management is a critical task for relief teams. One of the most essential components of effective response is the rapid collection of information about affected areas, damages, and victims. More data translates into better coordination, faster rescue operations, and ultimately, more lives saved. However, in some disasters, such as earthquakes, the communication infrastructure is often partially or completely destroyed, making it extremely difficult for victims to send distress signals and for rescue teams to locate and assist them in time. Unmanned Aerial Vehicles (UAVs) have emerged as valuable tools in such scenarios. In particular, a fleet of UAVs can be dispatched from a mobile station to the affected area to facilitate data collection and establish temporary communication networks. Nevertheless, real-world deployment of UAVs faces several challenges, with adverse weather conditions--especially wind--being among the most significant. To address this, we develop a novel mathematical framework to determine the optimal location of a mobile UAV station while explicitly accounting for the heterogeneity of the UAVs and the effect of wind. In particular, we generalize the Sylvester problem to introduce the Sylvester-Fermat-Torricelli (SFT) problem, which captures complex factors such as wind influence, UAV heterogeneity, and back-and-forth motion within a unified framework. The proposed framework enhances the practicality of UAV-based disaster response planning by accounting for real-world factors such as wind and UAV heterogeneity. Experimental results demonstrate that it can reduce wasted operational time by up to 84%, making post-disaster missions significantly more efficient and effective.
PolyFly: Polytopic Optimal Planning for Collision-Free Cable-Suspended Aerial Payload Transportation
Aerial transportation robots using suspended cables have emerged as versatile platforms for disaster response and rescue operations. To maximize the capabilities of these systems, robots need to aggressively fly through tightly constrained environments, such as dense forests and structurally unsafe buildings, while minimizing flight time and avoiding obstacles. Existing methods geometrically over-approximate the vehicle and obstacles, leading to conservative maneuvers and increased flight times. We eliminate these restrictions by proposing PolyFly, an optimal global planner which considers a non-conservative representation for aerial transportation by modeling each physical component of the environment, and the robot (quadrotor, cable and payload), as independent polytopes. We further increase the model accuracy by incorporating the attitude of the physical components by constructing orientation-aware polytopes. The resulting optimal control problem is efficiently solved by converting the polytope constraints into smooth differentiable constraints via duality theory. We compare our method against the existing state-of-the-art approach in eight maze-like environments and show that PolyFly produces faster trajectories in each scenario. We also experimentally validate our proposed approach on a real quadrotor with a suspended payload, demonstrating the practical reliability and accuracy of our method.
LVI-Q: Robust LiDAR-Visual-Inertial-Kinematic Odometry for Quadruped Robots Using Tightly-Coupled and Efficient Alternating Optimization
Autonomous navigation for legged robots in complex and dynamic environments relies on robust simultaneous localization and mapping (SLAM) systems to accurately map surroundings and localize the robot, ensuring safe and efficient operation. While prior sensor fusion-based SLAM approaches have integrated various sensor modalities to improve their robustness, these algorithms are still susceptible to estimation drift in challenging environments due to their reliance on unsuitable fusion strategies. Therefore, we propose a robust LiDAR-visual-inertial-kinematic odometry system that integrates information from multiple sensors, such as a camera, LiDAR, inertial measurement unit (IMU), and joint encoders, for visual and LiDAR-based odometry estimation. Our system employs a fusion-based pose estimation approach that runs optimization-based visual-inertial-kinematic odometry (VIKO) and filter-based LiDAR-inertial-kinematic odometry (LIKO) based on measurement availability. In VIKO, we utilize the footpreintegration technique and robust LiDAR-visual depth consistency using superpixel clusters in a sliding window optimization. In LIKO, we incorporate foot kinematics and employ a point-toplane residual in an error-state iterative Kalman filter (ESIKF). Compared with other sensor fusion-based SLAM algorithms, our approach shows robust performance across public and longterm datasets.
comment: 8 Pages, 9 Figures
NEBULA: Do We Evaluate Vision-Language-Action Agents Correctly?
The evaluation of Vision-Language-Action (VLA) agents is hindered by the coarse, end-task success metric that fails to provide precise skill diagnosis or measure robustness to real-world perturbations. This challenge is exacerbated by a fragmented data landscape that impedes reproducible research and the development of generalist models. To address these limitations, we introduce \textbf{NEBULA}, a unified ecosystem for single-arm manipulation that enables diagnostic and reproducible evaluation. NEBULA features a novel dual-axis evaluation protocol that combines fine-grained \textit{capability tests} for precise skill diagnosis with systematic \textit{stress tests} that measure robustness. A standardized API and a large-scale, aggregated dataset are provided to reduce fragmentation and support cross-dataset training and fair comparison. Using NEBULA, we demonstrate that top-performing VLAs struggle with key capabilities such as spatial reasoning and dynamic adaptation, which are consistently obscured by conventional end-task success metrics. By measuring both what an agent can do and when it does so reliably, NEBULA provides a practical foundation for robust, general-purpose embodied agents.
comment: Homepage: https://vulab-ai.github.io/NEBULA-Alpha/
Cosmos-Surg-dVRK: World Foundation Model-based Automated Online Evaluation of Surgical Robot Policy Learning
The rise of surgical robots and vision-language-action models has accelerated the development of autonomous surgical policies and efficient assessment strategies. However, evaluating these policies directly on physical robotic platforms such as the da Vinci Research Kit (dVRK) remains hindered by high costs, time demands, reproducibility challenges, and variability in execution. World foundation models (WFM) for physical AI offer a transformative approach to simulate complex real-world surgical tasks, such as soft tissue deformation, with high fidelity. This work introduces Cosmos-Surg-dVRK, a surgical finetune of the Cosmos WFM, which, together with a trained video classifier, enables fully automated online evaluation and benchmarking of surgical policies. We evaluate Cosmos-Surg-dVRK using two distinct surgical datasets. On tabletop suture pad tasks, the automated pipeline achieves strong correlation between online rollouts in Cosmos-Surg-dVRK and policy outcomes on the real dVRK Si platform, as well as good agreement between human labelers and the V-JEPA 2-derived video classifier. Additionally, preliminary experiments with ex-vivo porcine cholecystectomy tasks in Cosmos-Surg-dVRK demonstrate promising alignment with real-world evaluations, highlighting the platform's potential for more complex surgical procedures.
DeGrip: A Compact Cable-driven Robotic Gripper for Desktop Disassembly
Intelligent robotic disassembly of end-of-life (EOL) products has been a long-standing challenge in robotics. While machine learning techniques have shown promise, the lack of specialized hardware limits their application in real-world scenarios. We introduce DeGrip, a customized gripper designed for the disassembly of EOL computer desktops. DeGrip provides three degrees of freedom (DOF), enabling arbitrary configurations within the disassembly environment when mounted on a robotic manipulator. It employs a cable-driven transmission mechanism that reduces its overall size and enables operation in confined spaces. The wrist is designed to decouple the actuation of wrist and jaw joints. We also developed an EOL desktop disassembly environment in Isaac Sim to evaluate the effectiveness of DeGrip. The tasks were designed to demonstrate its ability to operate in confined spaces and disassemble components in arbitrary configurations. The evaluation results confirm the capability of DeGrip for EOL desktop disassembly.
VAR-SLAM: Visual Adaptive and Robust SLAM for Dynamic Environments
Visual SLAM in dynamic environments remains challenging, as several existing methods rely on semantic filtering that only handles known object classes, or use fixed robust kernels that cannot adapt to unknown moving objects, leading to degraded accuracy when they appear in the scene. We present VAR-SLAM (Visual Adaptive and Robust SLAM), an ORB-SLAM3-based system that combines a lightweight semantic keypoint filter to deal with known moving objects, with Barron's adaptive robust loss to handle unknown ones. The shape parameter of the robust kernel is estimated online from residuals, allowing the system to automatically adjust between Gaussian and heavy-tailed behavior. We evaluate VAR-SLAM on the TUM RGB-D, Bonn RGB-D Dynamic, and OpenLORIS datasets, which include both known and unknown moving objects. Results show improved trajectory accuracy and robustness over state-of-the-art baselines, achieving up to 25% lower ATE RMSE than NGD-SLAM on challenging sequences, while maintaining performance at 27 FPS on average.
comment: Code available at https://github.com/iit-DLSLab/VAR-SLAM
Zero-Shot Coordination in Ad Hoc Teams with Generalized Policy Improvement and Difference Rewards
Real-world multi-agent systems may require ad hoc teaming, where an agent must coordinate with other previously unseen teammates to solve a task in a zero-shot manner. Prior work often either selects a pretrained policy based on an inferred model of the new teammates or pretrains a single policy that is robust to potential teammates. Instead, we propose to leverage all pretrained policies in a zero-shot transfer setting. We formalize this problem as an ad hoc multi-agent Markov decision process and present a solution that uses two key ideas, generalized policy improvement and difference rewards, for efficient and effective knowledge transfer between different teams. We empirically demonstrate that our algorithm, Generalized Policy improvement for Ad hoc Teaming (GPAT), successfully enables zero-shot transfer to new teams in three simulated environments: cooperative foraging, predator-prey, and Overcooked. We also demonstrate our algorithm in a real-world multi-robot setting.
comment: 10 pages, 8 figures
Aria Gen 2 Pilot Dataset
The Aria Gen 2 Pilot Dataset (A2PD) is an egocentric multimodal open dataset captured using the state-of-the-art Aria Gen 2 glasses. To facilitate timely access, A2PD is released incrementally with ongoing dataset enhancements. The initial release features Dia'ane, our primary subject, who records her daily activities alongside friends, each equipped with Aria Gen 2 glasses. It encompasses five primary scenarios: cleaning, cooking, eating, playing, and outdoor walking. In each of the scenarios, we provide comprehensive raw sensor data and output data from various machine perception algorithms. These data illustrate the device's ability to perceive the wearer, the surrounding environment, and interactions between the wearer and the environment, while maintaining robust performance across diverse users and conditions. The A2PD is publicly available at projectaria.com, with open-source tools and usage examples provided in Project Aria Tools.
CLOVER: Context-aware Long-term Object Viewpoint- and Environment- Invariant Representation Learning
Mobile service robots can benefit from object-level understanding of their environments, including the ability to distinguish object instances and re-identify previously seen instances. Object re-identification is challenging across different viewpoints and in scenes with significant appearance variation arising from weather or lighting changes. Existing works on object re-identification either focus on specific classes or require foreground segmentation. Further, these methods, along with object re-identification datasets, have limited consideration of challenges such as outdoor scenes and illumination changes. To address this problem, we introduce CODa Re-ID: an in-the-wild object re-identification dataset containing 1,037,814 observations of 557 objects across 8 classes under diverse lighting conditions and viewpoints. Further, we propose CLOVER, a representation learning method for object observations that can distinguish between static object instances without requiring foreground segmentation. We also introduce MapCLOVER, a method for scalably summarizing CLOVER descriptors for use in object maps and matching new observations to summarized descriptors. Our results show that CLOVER achieves superior performance in static object re-identification under varying lighting conditions and viewpoint changes and can generalize to unseen instances and classes.
comment: 8 pages, 3 figures, 8 tables
From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance
Natural language offers a natural interface for humanoid robots, but existing language-guided humanoid locomotion pipelines remain cumbersome and untrustworthy. They typically decode human motion, retarget it to robot morphology, and then track it with a physics-based controller. However, this multi-stage process is prone to cumulative errors, introduces high latency, and yields weak coupling between semantics and control. These limitations call for a more direct pathway from language to action, one that eliminates fragile intermediate stages. Therefore, we present RoboGhost, a retargeting-free framework that directly conditions humanoid policies on language-grounded motion latents. By bypassing explicit motion decoding and retargeting, RoboGhost enables a diffusion-based policy to denoise executable actions directly from noise, preserving semantic intent and supporting fast, reactive control. A hybrid causal transformer-diffusion motion generator further ensures long-horizon consistency while maintaining stability and diversity, yielding rich latent representations for precise humanoid behavior. Extensive experiments demonstrate that RoboGhost substantially reduces deployment latency, improves success rates and tracking precision, and produces smooth, semantically aligned locomotion on real humanoids. Beyond text, the framework naturally extends to other modalities such as images, audio, and music, providing a universal foundation for vision-language-action humanoid systems.
Onboard Mission Replanning for Adaptive Cooperative Multi-Robot Systems
Cooperative autonomous robotic systems have significant potential for executing complex multi-task missions across space, air, ground, and maritime domains. But they commonly operate in remote, dynamic and hazardous environments, requiring rapid in-mission adaptation without reliance on fragile or slow communication links to centralised compute. Fast, on-board replanning algorithms are therefore needed to enhance resilience. Reinforcement Learning shows strong promise for efficiently solving mission planning tasks when formulated as Travelling Salesperson Problems (TSPs), but existing methods: 1) are unsuitable for replanning, where agents do not start at a single location; 2) do not allow cooperation between agents; 3) are unable to model tasks with variable durations; or 4) lack practical considerations for on-board deployment. Here we define the Cooperative Mission Replanning Problem as a novel variant of multiple TSP with adaptations to overcome these issues, and develop a new encoder/decoder-based model using Graph Attention Networks and Attention Models to solve it effectively and efficiently. Using a simple example of cooperative drones, we show our replanner consistently (90% of the time) maintains performance within 10% of the state-of-the-art LKH3 heuristic solver, whilst running 85-370 times faster on a Raspberry Pi. This work paves the way for increased resilience in autonomous multi-agent systems.
comment: 9 pages, 5 figures, 1 table
Development and Adaptation of Robotic Vision in the Real-World: the Challenge of Door Detection
Mobile service robots are increasingly prevalent in human-centric, real-world domains, operating autonomously in unconstrained indoor environments. In such a context, robotic vision plays a central role in enabling service robots to perceive high-level environmental features from visual observations. Despite the data-driven approaches based on deep learning push the boundaries of vision systems, applying these techniques to real-world robotic scenarios presents unique methodological challenges. Traditional models fail to represent the challenging perception constraints typical of service robots and must be adapted for the specific environment where robots finally operate. We propose a method leveraging photorealistic simulations that balances data quality and acquisition costs for synthesizing visual datasets from the robot perspective used to train deep architectures. Then, we show the benefits in qualifying a general detector for the target domain in which the robot is deployed, showing also the trade-off between the effort for obtaining new examples from such a setting and the performance gain. In our extensive experimental campaign, we focus on the door detection task (namely recognizing the presence and the traversability of doorways) that, in dynamic settings, is useful to infer the topology of the map. Our findings are validated in a real-world robot deployment, comparing prominent deep-learning models and demonstrating the effectiveness of our approach in practical settings.
Contact-Aware Safety in Soft Robots Using High-Order Control Barrier and Lyapunov Functions
Robots operating alongside people, particularly in sensitive scenarios such as aiding the elderly with daily tasks or collaborating with workers in manufacturing, must guarantee safety and cultivate user trust. Continuum soft manipulators promise safety through material compliance, but as designs evolve for greater precision, payload capacity, and speed, and increasingly incorporate rigid elements, their injury risk resurfaces. In this letter, we introduce a comprehensive High-Order Control Barrier Function (HOCBF) + High-Order Control Lyapunov Function (HOCLF) framework that enforces strict contact force limits across the entire soft-robot body during environmental interactions. Our approach combines a differentiable Piecewise Cosserat-Segment (PCS) dynamics model with a convex-polygon distance approximation metric, named Differentiable Conservative Separating Axis Theorem (DCSAT), based on the soft robot geometry to enable real-time, whole-body collision detection, resolution, and enforcement of the safety constraints. By embedding HOCBFs into our optimization routine, we guarantee safety, allowing, for instance, safe navigation in operational space under HOCLF-driven motion objectives. Extensive planar simulations demonstrate that our method maintains safety-bounded contacts while achieving precise shape and task-space regulation. This work thus lays a foundation for the deployment of soft robots in human-centric environments with provable safety and performance.
comment: 8 pages
Pseudo-Kinematic Trajectory Control and Planning of Tracked Vehicles
Tracked vehicles distribute their weight continuously over a large surface area (the tracks). This distinctive feature makes them the preferred choice for vehicles required to traverse soft and uneven terrain. From a robotics perspective, however, this flexibility comes at a cost: the complexity of modelling the system and the resulting difficulty in designing theoretically sound navigation solutions. In this paper, we aim to bridge this gap by proposing a framework for the navigation of tracked vehicles, built upon three key pillars. The first pillar comprises two models: a simulation model and a control-oriented model. The simulation model captures the intricate terramechanics dynamics arising from soil-track interaction and is employed to develop faithful digital twins of the system across a wide range of operating conditions. The control-oriented model is pseudo-kinematic and mathematically tractable, enabling the design of efficient and theoretically robust control schemes. The second pillar is a Lyapunov-based feedback trajectory controller that provides certifiable tracking guarantees. The third pillar is a portfolio of motion planning solutions, each offering different complexity-accuracy trade-offs. The various components of the proposed approach are validated through an extensive set of simulation and experimental data.
Real-time Recognition of Human Interactions from a Single RGB-D Camera for Socially-Aware Robot Navigation
{Recognizing human interactions is essential for social robots as it enables them to navigate safely and naturally in shared environments. Conventional robotic systems however often focus on obstacle avoidance, neglecting social cues necessary for seamless human-robot interaction. To address this gap, we propose a framework to recognize human group interactions for socially aware navigation. Our method utilizes color and depth frames from a monocular RGB-D camera to estimate 3D human keypoints and positions. Principal component analysis (PCA) is then used to determine dominant interaction directions. The shoelace formula is finally applied to compute interest points and engagement areas. Extensive experiments have been conducted to evaluate the validity of the proposed method. The results show that our method is capable of recognizing group interactions across different scenarios with varying numbers of individuals. It also achieves high-speed performance, processing each frame in approximately 4 ms on a single-board computer used in robotic systems. The method is implemented as a ROS 2 package making it simple to integrate into existing navigation systems. Source code is available at https://github.com/thanhlong103/social-interaction-detector
Scaling Multi Agent Reinforcement Learning for Underwater Acoustic Tracking via Autonomous Vehicles
Autonomous vehicles (AV) offer a cost-effective solution for scientific missions such as underwater tracking. Recently, reinforcement learning (RL) has emerged as a powerful method for controlling AVs in complex marine environments. However, scaling these techniques to a fleet--essential for multi-target tracking or targets with rapid, unpredictable motion--presents significant computational challenges. Multi-Agent Reinforcement Learning (MARL) is notoriously sample-inefficient, and while high-fidelity simulators like Gazebo's LRAUV provide 100x faster-than-real-time single-robot simulations, they offer no significant speedup for multi-vehicle scenarios, making MARL training impractical. To address these limitations, we propose an iterative distillation method that transfers high-fidelity simulations into a simplified, GPU-accelerated environment while preserving high-level dynamics. This approach achieves up to a 30,000x speedup over Gazebo through parallelization, enabling efficient training via end-to-end GPU acceleration. Additionally, we introduce a novel Transformer-based architecture (TransfMAPPO) that learns multi-agent policies invariant to the number of agents and targets, significantly improving sample efficiency. Following large-scale curriculum learning conducted entirely on GPU, we perform extensive evaluations in Gazebo, demonstrating that our method maintains tracking errors below 5 meters over extended durations, even in the presence of multiple fast-moving targets. This work bridges the gap between large-scale MARL training and high-fidelity deployment, providing a scalable framework for autonomous fleet control in real-world sea missions.
CLASP: General-Purpose Clothes Manipulation with Semantic Keypoints
Clothes manipulation, such as folding or hanging, is a critical capability for home service robots. Despite recent advances, most existing methods remain limited to specific clothes types and tasks, due to the complex, high-dimensional geometry of clothes. This paper presents CLothes mAnipulation with Semantic keyPoints (CLASP), which aims at general-purpose clothes manipulation over diverse clothes types, T-shirts, shorts, skirts, long dresses, ..., as well as different tasks, folding, flattening, hanging, .... The core idea of CLASP is semantic keypoints-e.g., ''left sleeve'' and ''right shoulder''-a sparse spatial-semantic representation, salient for both perception and action. Semantic keypoints of clothes can be reliably extracted from RGB-D images and provide an effective representation for a wide range of clothes manipulation policies. CLASP uses semantic keypoints as an intermediate representation to connect high-level task planning and low-level action execution. At the high level, it exploits vision language models (VLMs) to predict task plans over the semantic keypoints. At the low level, it executes the plans with the help of a set of pre-built manipulation skills conditioned on the keypoints. Extensive simulation experiments show that CLASP outperforms state-of-the-art baseline methods on multiple tasks across diverse clothes types, demonstrating strong performance and generalization. Further experiments with a Franka dual-arm system on four distinct tasks-folding, flattening, hanging, and placing-confirm CLASP's performance on real-life clothes manipulation.
U-ARM : Ultra low-cost general teleoperation interface for robot manipulation
We propose U-Arm, a low-cost and rapidly adaptable leader-follower teleoperation framework designed to interface with most of commercially available robotic arms. Our system supports teleoperation through three structurally distinct 3D-printed leader arms that share consistent control logic, enabling seamless compatibility with diverse commercial robot configurations. Compared with previous open-source leader-follower interfaces, we further optimized both the mechanical design and servo selection, achieving a bill of materials (BOM) cost of only \$50.5 for the 6-DoF leader arm and \$56.8 for the 7-DoF version. To enhance usability, we mitigate the common challenge in controlling redundant degrees of freedom by %engineering methods mechanical and control optimizations. Experimental results demonstrate that U-Arm achieves 39\% higher data collection efficiency and comparable task success rates across multiple manipulation scenarios compared with Joycon, another low-cost teleoperation interface. We have open-sourced all CAD models of three configs and also provided simulation support for validating teleoperation workflows. We also open-sourced real-world manipulation data collected with U-Arm. The project website is https://github.com/MINT-SJTU/LeRobot-Anything-U-Arm.
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation
We introduce Genie Envisioner (GE), a unified world foundation platform for robotic manipulation that integrates policy learning, evaluation, and simulation within a single video-generative framework. At its core, GE-Base is a large-scale, instruction-conditioned video diffusion model that captures the spatial, temporal, and semantic dynamics of real-world robotic interactions in a structured latent space. Built upon this foundation, GE-Act maps latent representations to executable action trajectories through a lightweight, flow-matching decoder, enabling precise and generalizable policy inference across diverse embodiments with minimal supervision. To support scalable evaluation and training, GE-Sim serves as an action-conditioned neural simulator, producing high-fidelity rollouts for closed-loop policy development. The platform is further equipped with EWMBench, a standardized benchmark suite measuring visual fidelity, physical consistency, and instruction-action alignment. Together, these components establish Genie Envisioner as a scalable and practical foundation for instruction-driven, general-purpose embodied intelligence. All code, models, and benchmarks will be released publicly.
comment: https://genie-envisioner.github.io/
Learning to Capture Rocks using an Excavator: A Reinforcement Learning Approach with Guiding Reward Formulation
Rock capturing with standard excavator buckets is a challenging task typically requiring the expertise of skilled operators. Unlike soil digging, it involves manipulating large, irregular rocks in unstructured environments where complex contact interactions with granular material make model-based control impractical. Existing autonomous excavation methods focus mainly on continuous media or rely on specialized grippers, limiting their applicability to real-world construction sites. This paper introduces a fully data-driven control framework for rock capturing that eliminates the need for explicit modeling of rock or soil properties. A model-free reinforcement learning agent is trained in the AGX Dynamics simulator using the Proximal Policy Optimization (PPO) algorithm and a guiding reward formulation. The learned policy outputs joint velocity commands directly to the boom, arm, and bucket of a CAT365 excavator model. Robustness is enhanced through extensive domain randomization of rock geometry, density, and mass, as well as the initial configurations of the bucket, rock, and goal position. To the best of our knowledge, this is the first study to develop and evaluate an RL-based controller for the rock capturing task. Experimental results show that the policy generalizes well to unseen rocks and varying soil conditions, achieving high success rates comparable to those of human participants while maintaining machine stability. These findings demonstrate the feasibility of learning-based excavation strategies for discrete object manipulation without requiring specialized hardware or detailed material models.
General-Purpose Robotic Navigation via LVLM-Orchestrated Perception, Reasoning, and Acting
Developing general-purpose navigation policies for unknown environments remains a core challenge in robotics. Most existing systems rely on task-specific neural networks and fixed information flows, limiting their generalizability. Large Vision-Language Models (LVLMs) offer a promising alternative by embedding human-like knowledge for reasoning and planning, but prior LVLM-robot integrations have largely depended on pre-mapped spaces, hard-coded representations, and rigid control logic. We introduce the Agentic Robotic Navigation Architecture (ARNA), a general-purpose framework that equips an LVLM-based agent with a library of perception, reasoning, and navigation tools drawn from modern robotic stacks. At runtime, the agent autonomously defines and executes task-specific workflows that iteratively query modules, reason over multimodal inputs, and select navigation actions. This agentic formulation enables robust navigation and reasoning in previously unmapped environments, offering a new perspective on robotic stack design. Evaluated in Habitat Lab on the HM-EQA benchmark, ARNA outperforms state-of-the-art EQA-specific approaches. Qualitative results on RxR and custom tasks further demonstrate its ability to generalize across a broad range of navigation challenges.
An Intention-driven Lane Change Framework Considering Heterogeneous Dynamic Cooperation in Mixed-traffic Environment
In mixed-traffic environments, where autonomous vehicles (AVs) interact with diverse human-driven vehicles (HVs), unpredictable intentions and heterogeneous behaviors make safe and efficient lane change maneuvers highly challenging. Existing methods often oversimplify these interactions by assuming uniform patterns. We propose an intention-driven lane change framework that integrates driving-style recognition, cooperation-aware decision-making, and coordinated motion planning. A deep learning classifier trained on the NGSIM dataset identifies human driving styles in real time. A cooperation score with intrinsic and interactive components estimates surrounding drivers' intentions and quantifies their willingness to cooperate with the ego vehicle. Decision-making combines behavior cloning with inverse reinforcement learning to determine whether a lane change should be initiated. For trajectory generation, model predictive control is integrated with IRL-based intention inference to produce collision-free and socially compliant maneuvers. Experiments show that the proposed model achieves 94.2\% accuracy and 94.3\% F1-score, outperforming rule-based and learning-based baselines by 4-15\% in lane change recognition. These results highlight the benefit of modeling inter-driver heterogeneity and demonstrate the potential of the framework to advance context-aware and human-like autonomous driving in complex traffic environments.
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model
Vision-language-action (VLA) models have recently shown strong potential in enabling robots to follow language instructions and execute precise actions. However, most VLAs are built upon vision-language models pretrained solely on 2D data, which lack accurate spatial awareness and hinder their ability to operate in the 3D physical world. Existing solutions attempt to incorporate explicit 3D sensor inputs such as depth maps or point clouds, but these approaches face challenges due to sensor noise, hardware heterogeneity, and incomplete depth coverage in existing datasets. Alternative methods that estimate 3D cues from 2D images also suffer from the limited performance of depth estimators. We propose Spatial Forcing (SF), a simple yet effective alignment strategy that implicitly forces VLA models to develop spatial comprehension capabilities without relying on explicit 3D inputs or depth estimators. SF aligns intermediate visual embeddings of VLAs with geometric representations produced by pretrained 3D foundation models. By enforcing alignment at intermediate layers, SF guides VLAs to encode richer spatial representations that enhance action precision. Extensive experiments in simulation and real-world environments demonstrate that SF achieves state-of-the-art results, surpassing both 2D- and 3D-based VLAs. SF further accelerates training by up to 3.8x and improves data efficiency across diverse robotic tasks. Project page is at https://spatial-forcing.github.io/
Self-supervised Multi-future Occupancy Forecasting for Autonomous Driving
Environment prediction frameworks are critical for the safe navigation of autonomous vehicles (AVs) in dynamic settings. LiDAR-generated occupancy grid maps (L-OGMs) offer a robust bird's-eye view for the scene representation, enabling self-supervised joint scene predictions while exhibiting resilience to partial observability and perception detection failures. Prior approaches have focused on deterministic L-OGM prediction architectures within the grid cell space. While these methods have seen some success, they frequently produce unrealistic predictions and fail to capture the stochastic nature of the environment. Additionally, they do not effectively integrate additional sensor modalities present in AVs. Our proposed framework, Latent Occupancy Prediction (LOPR), performs stochastic L-OGM prediction in the latent space of a generative architecture and allows for conditioning on RGB cameras, maps, and planned trajectories. We decode predictions using either a single-step decoder, which provides high-quality predictions in real-time, or a diffusion-based batch decoder, which can further refine the decoded frames to address temporal consistency issues and reduce compression losses. Our experiments on the nuScenes and Waymo Open datasets show that all variants of our approach qualitatively and quantitatively outperform prior approaches.
comment: Accepted to Robotics: Science and Systems (RSS) 2025
MOBODY: Model Based Off-Dynamics Offline Reinforcement Learning
We study off-dynamics offline reinforcement learning, where the goal is to learn a policy from offline source and limited target datasets with mismatched dynamics. Existing methods either penalize the reward or discard source transitions occurring in parts of the transition space with high dynamics shift. As a result, they optimize the policy using data from low-shift regions, limiting exploration of high-reward states in the target domain that do not fall within these regions. Consequently, such methods often fail when the dynamics shift is significant or the optimal trajectories lie outside the low-shift regions. To overcome this limitation, we propose MOBODY, a Model-Based Off-Dynamics Offline RL algorithm that optimizes a policy using learned target dynamics transitions to explore the target domain, rather than only being trained with the low dynamics-shift transitions. For the dynamics learning, built on the observation that achieving the same next state requires taking different actions in different domains, MOBODY employs separate action encoders for each domain to encode different actions to the shared latent space while sharing a unified representation of states and a common transition function. We further introduce a target Q-weighted behavior cloning loss in policy optimization to avoid out-of-distribution actions, which push the policy toward actions with high target-domain Q-values, rather than high source domain Q-values or uniformly imitating all actions in the offline dataset. We evaluate MOBODY on a wide range of MuJoCo and Adroit benchmarks, demonstrating that it outperforms state-of-the-art off-dynamics RL baselines as well as policy learning methods based on different dynamics learning baselines, with especially pronounced improvements in challenging scenarios where existing methods struggle.
MLFM: Multi-Layered Feature Maps for Richer Language Understanding in Zero-Shot Semantic Navigation
Recent progress in large vision-language models has driven improvements in language-based semantic navigation, where an embodied agent must reach a target object described in natural language. Yet we still lack a clear, language-focused evaluation framework to test how well agents ground the words in their instructions. We address this gap by proposing LangNav, an open-vocabulary multi-object navigation dataset with natural language goal descriptions (e.g. 'go to the red short candle on the table') and corresponding fine-grained linguistic annotations (e.g., attributes: color=red, size=short; relations: support=on). These labels enable systematic evaluation of language understanding. To evaluate on this setting, we extend multi-object navigation task setting to Language-guided Multi-Object Navigation (LaMoN), where the agent must find a sequence of goals specified using language. Furthermore, we propose Multi-Layered Feature Map (MLFM), a novel method that builds a queryable, multi-layered semantic map from pretrained vision-language features and proves effective for reasoning over fine-grained attributes and spatial relations in goal descriptions. Experiments on LangNav show that MLFM outperforms state-of-the-art zero-shot mapping-based navigation baselines.
SegDAC: Improving Visual Reinforcement Learning by Extracting Dynamic Objectc-Centric Representations from Pretrained Vision Models
Visual reinforcement learning (RL) is challenging due to the need to extract useful representations from high-dimensional inputs while learning effective control from sparse and noisy rewards. Although large perception models exist, integrating them effectively into RL for visual generalization and improved sample efficiency remains difficult. We propose SegDAC, a Segmentation-Driven Actor-Critic method. SegDAC uses Segment Anything (SAM) for object-centric decomposition and YOLO-World to ground the image segmentation process via text inputs. It includes a novel transformer-based architecture that supports a dynamic number of segments at each time step and effectively learns which segments to focus on using online RL, without using human labels. By evaluating SegDAC over a challenging visual generalization benchmark using Maniskill3, which covers diverse manipulation tasks under strong visual perturbations, we demonstrate that SegDAC achieves significantly better visual generalization, doubling prior performance on the hardest setting and matching or surpassing prior methods in sample efficiency across all evaluated tasks.
Interpretable Decision-Making for End-to-End Autonomous Driving ICCV 2025
Trustworthy AI is mandatory for the broad deployment of autonomous vehicles. Although end-to-end approaches derive control commands directly from raw data, interpreting these decisions remains challenging, especially in complex urban scenarios. This is mainly attributed to very deep neural networks with non-linear decision boundaries, making it challenging to grasp the logic behind AI-driven decisions. This paper presents a method to enhance interpretability while optimizing control commands in autonomous driving. To address this, we propose loss functions that promote the interpretability of our model by generating sparse and localized feature maps. The feature activations allow us to explain which image regions contribute to the predicted control command. We conduct comprehensive ablation studies on the feature extraction step and validate our method on the CARLA benchmarks. We also demonstrate that our approach improves interpretability, which correlates with reducing infractions, yielding a safer, high-performance driving model. Notably, our monocular, non-ensemble model surpasses the top-performing approaches from the CARLA Leaderboard by achieving lower infraction scores and the highest route completion rate, all while ensuring interpretability.
comment: Accepted to the ICCV 2025 2nd Workshop on the Challenge Of Out-of-Label Hazards in Autonomous Driving (2COOOL)
ZeST: an LLM-based Zero-Shot Traversability Navigation for Unknown Environments
The advancement of robotics and autonomous navigation systems hinges on the ability to accurately predict terrain traversability. Traditional methods for generating datasets to train these prediction models often involve putting robots into potentially hazardous environments, posing risks to equipment and safety. To solve this problem, we present ZeST, a novel approach leveraging visual reasoning capabilities of Large Language Models (LLMs) to create a traversability map in real-time without exposing robots to danger. Our approach not only performs zero-shot traversability and mitigates the risks associated with real-world data collection but also accelerates the development of advanced navigation systems, offering a cost-effective and scalable solution. To support our findings, we present navigation results, in both controlled indoor and unstructured outdoor environments. As shown in the experiments, our method provides safer navigation when compared to other state-of-the-art methods, constantly reaching the final goal.
LOPR: Latent Occupancy PRediction using Generative Models
Environment prediction frameworks are integral for autonomous vehicles, enabling safe navigation in dynamic environments. LiDAR generated occupancy grid maps (L-OGMs) offer a robust bird's eye-view scene representation that facilitates joint scene predictions without relying on manual labeling unlike commonly used trajectory prediction frameworks. Prior approaches have optimized deterministic L-OGM prediction architectures directly in grid cell space. While these methods have achieved some degree of success in prediction, they occasionally grapple with unrealistic and incorrect predictions. We claim that the quality and realism of the forecasted occupancy grids can be enhanced with the use of generative models. We propose a framework that decouples occupancy prediction into: representation learning and stochastic prediction within the learned latent space. Our approach allows for conditioning the model on other available sensor modalities such as RGB-cameras and high definition maps. We demonstrate that our approach achieves state-of-the-art performance and is readily transferable between different robotic platforms on the real-world NuScenes, Waymo Open, and a custom dataset we collected on an experimental vehicle platform.
comment: We recommend referring to the peer-reviewed and updated version of this approach, available at arXiv:2407.21126
Multiagent Systems
Grassroots Logic Programs: A Secure, Multiagent, Concurrent, Logic Programming Language
Grassroots platforms are distributed applications run by\linebreak cryptographically-identified people on their networked personal devices, where multiple disjoint platform instances emerge independently and coalesce when they interoperate. Their foundation is the grassroots social graph, upon which grassroots social networks, grassroots cryptocurrencies, and grassroots democratic federations can be built. Grassroots platforms have yet to be implemented, the key challenge being faulty and malicious participants: without secure programming support, correct participants cannot reliably identify each other, establish secure communication, or verify each other's code integrity. We present Grassroots Logic Programs (GLP), a secure, multiagent, concurrent, logic programming language for implementing grassroots platforms. GLP extends logic programs with paired single-reader/single-writer (SRSW) logic variables, providing secure communication channels among cryptographically-identified people through encrypted, signed and attested messages, which enable identity and code integrity verification. We present GLP progressively: logic programs, concurrent GLP, multiagent GLP, augmenting it with cryptographic security, and providing smartphone implementation-ready specifications. We prove safety properties including that GLP computations are deductions, SRSW preservation, acyclicity, and monotonicity. We prove multiagent GLP is grassroots and that GLP streams achieve blockchain security properties. We present a grassroots social graph protocol establishing authenticated peer-to-peer connections and demonstrate secure grassroots social networking applications.
AURA: An Agent Autonomy Risk Assessment Framework AAMAS 2026
As autonomous agentic AI systems see increasing adoption across organisations, persistent challenges in alignment, governance, and risk management threaten to impede deployment at scale. We present AURA (Agent aUtonomy Risk Assessment), a unified framework designed to detect, quantify, and mitigate risks arising from agentic AI. Building on recent research and practical deployments, AURA introduces a gamma-based risk scoring methodology that balances risk assessment accuracy with computational efficiency and practical considerations. AURA provides an interactive process to score, evaluate and mitigate the risks of running one or multiple AI Agents, synchronously or asynchronously (autonomously). The framework is engineered for Human-in-the-Loop (HITL) oversight and presents Agent-to-Human (A2H) communication mechanisms, allowing for seamless integration with agentic systems for autonomous self-assessment, rendering it interoperable with established protocols (MCP and A2A) and tools. AURA supports a responsible and transparent adoption of agentic AI and provides robust risk detection and mitigation while balancing computational resources, positioning it as a critical enabler for large-scale, governable agentic AI in enterprise environments.
comment: 10 pages, 2 figures. Submitted for open-access preprint on arXiv. Based on the AAMAS 2026 paper template
Build Your Personalized Research Group: A Multiagent Framework for Continual and Interactive Science Automation
The automation of scientific discovery represents a critical milestone in Artificial Intelligence (AI) research. However, existing agentic systems for science suffer from two fundamental limitations: rigid, pre-programmed workflows that cannot adapt to intermediate findings, and inadequate context management that hinders long-horizon research. We present \texttt{freephdlabor}, an open-source multiagent framework featuring \textit{fully dynamic workflows} determined by real-time agent reasoning and a \coloremph{\textit{modular architecture}} enabling seamless customization -- users can modify, add, or remove agents to address domain-specific requirements. The framework provides comprehensive infrastructure including \textit{automatic context compaction}, \textit{workspace-based communication} to prevent information degradation, \textit{memory persistence} across sessions, and \textit{non-blocking human intervention} mechanisms. These features collectively transform automated research from isolated, single-run attempts into \textit{continual research programs} that build systematically on prior explorations and incorporate human feedback. By providing both the architectural principles and practical implementation for building customizable co-scientist systems, this work aims to facilitate broader adoption of automated research across scientific domains, enabling practitioners to deploy interactive multiagent systems that autonomously conduct end-to-end research -- from ideation through experimentation to publication-ready manuscripts.
comment: 37 pages, 5 figures. Code: https://github.com/ltjed/freephdlabor
Hypergame-based Cognition Modeling and Intention Interpretation for Human-Driven Vehicles in Connected Mixed Traffic
With the practical implementation of connected and autonomous vehicles (CAVs), the traffic system is expected to remain a mix of CAVs and human-driven vehicles (HVs) for the foreseeable future. To enhance safety and traffic efficiency, the trajectory planning strategies of CAVs must account for the influence of HVs, necessitating accurate HV trajectory prediction. Current research often assumes that human drivers have perfect knowledge of all vehicles' objectives, an unrealistic premise. This paper bridges the gap by leveraging hypergame theory to account for cognitive and perception limitations in HVs. We model human bounded rationality without assuming them to be merely passive followers and propose a hierarchical cognition modeling framework that captures cognitive relationships among vehicles. We further analyze the cognitive stability of the system, proving that the strategy profile where all vehicles adopt cognitively equilibrium strategies constitutes a hyper Nash equilibrium when CAVs accurately learn HV parameters. To achieve this, we develop an inverse learning algorithm for distributed intention interpretation via vehicle-to-everything (V2X) communication, which extends the framework to both offline and online scenarios. Additionally, we introduce a distributed trajectory prediction and planning approach for CAVs, leveraging the learned parameters in real time. Simulations in highway lane-changing scenarios demonstrate the proposed method's accuracy in parameter learning, robustness to noisy trajectory observations, and safety in HV trajectory prediction. The results validate the effectiveness of our method in both offline and online implementations.
TranSimHub:A Unified Air-Ground Simulation Platform for Multi-Modal Perception and Decision-Making
Air-ground collaborative intelligence is becoming a key approach for next-generation urban intelligent transportation management, where aerial and ground systems work together on perception, communication, and decision-making. However, the lack of a unified multi-modal simulation environment has limited progress in studying cross-domain perception, coordination under communication constraints, and joint decision optimization. To address this gap, we present TranSimHub, a unified simulation platform for air-ground collaborative intelligence. TranSimHub offers synchronized multi-view rendering across RGB, depth, and semantic segmentation modalities, ensuring consistent perception between aerial and ground viewpoints. It also supports information exchange between the two domains and includes a causal scene editor that enables controllable scenario creation and counterfactual analysis under diverse conditions such as different weather, emergency events, and dynamic obstacles. We release TranSimHub as an open-source platform that supports end-to-end research on perception, fusion, and control across realistic air and ground traffic scenes. Our code is available at https://github.com/Traffic-Alpha/TranSimHub.
comment: 9 pages, 4 figures
Personalized Collaborative Learning with Affinity-Based Variance Reduction
Multi-agent learning faces a fundamental tension: leveraging distributed collaboration without sacrificing the personalization needed for diverse agents. This tension intensifies when aiming for full personalization while adapting to unknown heterogeneity levels -- gaining collaborative speedup when agents are similar, without performance degradation when they are different. Embracing the challenge, we propose personalized collaborative learning (PCL), a novel framework for heterogeneous agents to collaboratively learn personalized solutions with seamless adaptivity. Through carefully designed bias correction and importance correction mechanisms, our method AffPCL robustly handles both environment and objective heterogeneity. We prove that AffPCL reduces sample complexity over independent learning by a factor of $\max\{n^{-1}, \delta\}$, where $n$ is the number of agents and $\delta\in[0,1]$ measures their heterogeneity. This affinity-based acceleration automatically interpolates between the linear speedup of federated learning in homogeneous settings and the baseline of independent learning, without requiring prior knowledge of the system. Our analysis further reveals that an agent may obtain linear speedup even by collaborating with arbitrarily dissimilar agents, unveiling new insights into personalization and collaboration in the high heterogeneity regime.
Heterogeneous Multi-Agent Task-Assignment with Uncertain Execution Times and Preferences
While sequential task assignment for a single agent has been widely studied, such problems in a multi-agent setting, where the agents have heterogeneous task preferences or capabilities, remain less well-characterized. We study a multi-agent task assignment problem where a central planner assigns recurring tasks to multiple members of a team over a finite time horizon. For any given task, the members have heterogeneous capabilities in terms of task completion times, task resource consumption (which can model variables such as energy or attention), and preferences in terms of the rewards they collect upon task completion. We assume that the reward, execution time, and resource consumption for each member to complete any task are stochastic with unknown distributions. The goal of the planner is to maximize the total expected reward that the team receives over the problem horizon while ensuring that the resource consumption required for any assigned task is within the capability of the agent. We propose and analyze a bandit algorithm for this problem. Since the bandit algorithm relies on solving an optimal task assignment problem repeatedly, we analyze the achievable regret in two cases: when we can solve the optimal task assignment exactly and when we can solve it only approximately.
comment: 14 pages
Zero-Shot Coordination in Ad Hoc Teams with Generalized Policy Improvement and Difference Rewards
Real-world multi-agent systems may require ad hoc teaming, where an agent must coordinate with other previously unseen teammates to solve a task in a zero-shot manner. Prior work often either selects a pretrained policy based on an inferred model of the new teammates or pretrains a single policy that is robust to potential teammates. Instead, we propose to leverage all pretrained policies in a zero-shot transfer setting. We formalize this problem as an ad hoc multi-agent Markov decision process and present a solution that uses two key ideas, generalized policy improvement and difference rewards, for efficient and effective knowledge transfer between different teams. We empirically demonstrate that our algorithm, Generalized Policy improvement for Ad hoc Teaming (GPAT), successfully enables zero-shot transfer to new teams in three simulated environments: cooperative foraging, predator-prey, and Overcooked. We also demonstrate our algorithm in a real-world multi-robot setting.
comment: 10 pages, 8 figures
Agentic AI for Ultra-Modern Networks: Multi-Agent Framework for RAN Autonomy and Assurance
The increasing complexity of Beyond 5G and 6G networks necessitates new paradigms for autonomy and assur- ance. Traditional O-RAN control loops rely heavily on RIC- based orchestration, which centralizes intelligence and exposes the system to risks such as policy conflicts, data drift, and unsafe actions under unforeseen conditions. In this work, we argue that the future of autonomous networks lies in a multi-agentic architecture, where specialized agents collaborate to perform data collection, model training, prediction, policy generation, verification, deployment, and assurance. By replacing tightly- coupled centralized RIC-based workflows with distributed agents, the framework achieves autonomy, resilience, explainability, and system-wide safety. To substantiate this vision, we design and evaluate a traffic steering use case under surge and drift conditions. Results across four KPIs: RRC connected users, IP throughput, PRB utilization, and SINR, demonstrate that a naive predictor-driven deployment improves local KPIs but destabilizes neighbors, whereas the agentic system blocks unsafe policies, preserving global network health. This study highlights multi- agent architectures as a credible foundation for trustworthy AI- driven autonomy in next-generation RANs.
VLMLight: Safety-Critical Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning Architecture
Traffic signal control (TSC) is a core challenge in urban mobility, where real-time decisions must balance efficiency and safety. Existing methods - ranging from rule-based heuristics to reinforcement learning (RL) - often struggle to generalize to complex, dynamic, and safety-critical scenarios. We introduce VLMLight, a novel TSC framework that integrates vision-language meta-control with dual-branch reasoning. At the core of VLMLight is the first image-based traffic simulator that enables multi-view visual perception at intersections, allowing policies to reason over rich cues such as vehicle type, motion, and spatial density. A large language model (LLM) serves as a safety-prioritized meta-controller, selecting between a fast RL policy for routine traffic and a structured reasoning branch for critical cases. In the latter, multiple LLM agents collaborate to assess traffic phases, prioritize emergency vehicles, and verify rule compliance. Experiments show that VLMLight reduces waiting times for emergency vehicles by up to 65% over RL-only systems, while preserving real-time performance in standard conditions with less than 1% degradation. VLMLight offers a scalable, interpretable, and safety-aware solution for next-generation traffic signal control.
comment: 25 pages, 15 figures
Bayesian Ego-graph inference for Networked Multi-Agent Reinforcement Learning NeurIPS 2025
In networked multi-agent reinforcement learning (Networked-MARL), decentralized agents must act under local observability and constrained communication over fixed physical graphs. Existing methods often assume static neighborhoods, limiting adaptability to dynamic or heterogeneous environments. While centralized frameworks can learn dynamic graphs, their reliance on global state access and centralized infrastructure is impractical in real-world decentralized systems. We propose a stochastic graph-based policy for Networked-MARL, where each agent conditions its decision on a sampled subgraph over its local physical neighborhood. Building on this formulation, we introduce BayesG, a decentralized actor-framework that learns sparse, context-aware interaction structures via Bayesian variational inference. Each agent operates over an ego-graph and samples a latent communication mask to guide message passing and policy computation. The variational distribution is trained end-to-end alongside the policy using an evidence lower bound (ELBO) objective, enabling agents to jointly learn both interaction topology and decision-making strategies. BayesG outperforms strong MARL baselines on large-scale traffic control tasks with up to 167 agents, demonstrating superior scalability, efficiency, and performance.
comment: Accepted at NeurIPS 2025
Topological Structure Learning Should Be A Research Priority for LLM-Based Multi-Agent Systems
Large Language Model-based Multi-Agent Systems (MASs) have emerged as a powerful paradigm for tackling complex tasks through collaborative intelligence. However, the topology of these systems--how agents in MASs should be configured, connected, and coordinated--remains largely unexplored. In this position paper, we call for a paradigm shift toward \emph{topology-aware MASs} that explicitly model and dynamically optimize the structure of inter-agent interactions. We identify three fundamental components--agents, communication links, and overall topology--that collectively determine the system's adaptability, efficiency, robustness, and fairness. To operationalize this vision, we introduce a systematic three-stage framework: 1) agent selection, 2) structure profiling, and 3) topology synthesis. This framework not only provides a principled foundation for designing MASs but also opens new research frontiers across language modeling, reinforcement learning, graph learning, and generative modeling to ultimately unleash their full potential in complex real-world applications. We conclude by outlining key challenges and opportunities in MASs evaluation. We hope our framework and perspectives offer critical new insights in the era of agentic AI.
LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration
Text-to-image (T2I) generation has made remarkable progress, yet existing systems still lack intuitive control over spatial composition, object consistency, and multi-step editing. We present $\textbf{LayerCraft}$, a modular framework that uses large language models (LLMs) as autonomous agents to orchestrate structured, layered image generation and editing. LayerCraft supports two key capabilities: (1) $\textit{structured generation}$ from simple prompts via chain-of-thought (CoT) reasoning, enabling it to decompose scenes, reason about object placement, and guide composition in a controllable, interpretable manner; and (2) $\textit{layered object integration}$, allowing users to insert and customize objects -- such as characters or props -- across diverse images or scenes while preserving identity, context, and style. The system comprises a coordinator agent, the $\textbf{ChainArchitect}$ for CoT-driven layout planning, and the $\textbf{Object Integration Network (OIN)}$ for seamless image editing using off-the-shelf T2I models without retraining. Through applications like batch collage editing and narrative scene generation, LayerCraft empowers non-experts to iteratively design, customize, and refine visual content with minimal manual effort. Code will be released at https://github.com/PeterYYZhang/LayerCraft.
comment: 26 pages
Robust Federated Inference
Federated inference, in the form of one-shot federated learning, edge ensembles, or federated ensembles, has emerged as an attractive solution to combine predictions from multiple models. This paradigm enables each model to remain local and proprietary while a central server queries them and aggregates predictions. Yet, the robustness of federated inference has been largely neglected, leaving them vulnerable to even simple attacks. To address this critical gap, we formalize the problem of robust federated inference and provide the first robustness analysis of this class of methods. Our analysis of averaging-based aggregators shows that the error of the aggregator is small either when the dissimilarity between honest responses is small or the margin between the two most probable classes is large. Moving beyond linear averaging, we show that problem of robust federated inference with non-linear aggregators can be cast as an adversarial machine learning problem. We then introduce an advanced technique using the DeepSet aggregation model, proposing a novel composition of adversarial training and test-time robust aggregation to robustify non-linear aggregators. Our composition yields significant improvements, surpassing existing robust aggregation methods by 4.7 - 22.2% in accuracy points across diverse benchmarks.
Audit the Whisper: Detecting Steganographic Collusion in Multi-Agent LLMs
Multi-agent deployments of large language models (LLMs) are increasingly embedded in market, allocation, and governance workflows, yet covert coordination among agents can silently erode trust and social welfare. Existing audits are dominated by heuristics that lack theoretical guarantees, struggle to transfer across tasks, and seldom ship with the infrastructure needed for independent replication. We introduce Audit the Whisper, a conference-grade research artifact that spans theory, benchmark design, detection, and reproducibility. Our contributions are: (i) a channel-capacity analysis showing how interventions such as paraphrase, rate limiting, and role permutation impose quantifiable capacity penalties-operationalised via paired-run Kullback--Leibler diagnostics-that tighten mutual-information thresholds with finite-sample guarantees and full proofs; (ii) ColludeBench-v0, covering pricing, first-price auctions, peer review, and hosted Gemini/Groq APIs with configurable covert schemes, deterministic manifests, and reward instrumentation; and (iii) a calibrated auditing pipeline that fuses cross-run mutual information, permutation invariance, watermark variance, and fairness-aware acceptance bias, each tuned to a $10^{-3}$ false-positive budget and validated by 10k honest runs plus an e-value martingale. Across ColludeBench and external suites including Secret Collusion, CASE, Perfect Collusion Benchmark, and SentinelAgent, the union meta-test attains state-of-the-art power at fixed FPR while ablations surface price-of-auditing trade-offs and fairness-driven colluders invisible to MI alone. We release regeneration scripts, anonymized manifests, and documentation so that external auditors can reproduce every figure, satisfy double-blind requirements, and extend the framework with minimal effort.
comment: 13 pages, 0 figures
Robotics
Ponimator: Unfolding Interactive Pose for Versatile Human-human Interaction Animation ICCV 2025
Close-proximity human-human interactive poses convey rich contextual information about interaction dynamics. Given such poses, humans can intuitively infer the context and anticipate possible past and future dynamics, drawing on strong priors of human behavior. Inspired by this observation, we propose Ponimator, a simple framework anchored on proximal interactive poses for versatile interaction animation. Our training data consists of close-contact two-person poses and their surrounding temporal context from motion-capture interaction datasets. Leveraging interactive pose priors, Ponimator employs two conditional diffusion models: (1) a pose animator that uses the temporal prior to generate dynamic motion sequences from interactive poses, and (2) a pose generator that applies the spatial prior to synthesize interactive poses from a single pose, text, or both when interactive poses are unavailable. Collectively, Ponimator supports diverse tasks, including image-based interaction animation, reaction animation, and text-to-interaction synthesis, facilitating the transfer of interaction knowledge from high-quality mocap data to open-world scenarios. Empirical experiments across diverse datasets and applications demonstrate the universality of the pose prior and the effectiveness and robustness of our framework.
comment: Accepted to ICCV 2025. Project page: https://stevenlsw.github.io/ponimator/
RDD: Retrieval-Based Demonstration Decomposer for Planner Alignment in Long-Horizon Tasks NeurIPS 2025
To tackle long-horizon tasks, recent hierarchical vision-language-action (VLAs) frameworks employ vision-language model (VLM)-based planners to decompose complex manipulation tasks into simpler sub-tasks that low-level visuomotor policies can easily handle. Typically, the VLM planner is finetuned to learn to decompose a target task. This finetuning requires target task demonstrations segmented into sub-tasks by either human annotation or heuristic rules. However, the heuristic subtasks can deviate significantly from the training data of the visuomotor policy, which degrades task performance. To address these issues, we propose a Retrieval-based Demonstration Decomposer (RDD) that automatically decomposes demonstrations into sub-tasks by aligning the visual features of the decomposed sub-task intervals with those from the training data of the low-level visuomotor policies. Our method outperforms the state-of-the-art sub-task decomposer on both simulation and real-world tasks, demonstrating robustness across diverse settings. Code and more results are available at rdd-neurips.github.io.
comment: 39th Conference on Neural Information Processing Systems (NeurIPS 2025); Project Website: rdd-neurips.github.io
CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier Functions
Reinforcement learning (RL), while powerful and expressive, can often prioritize performance at the expense of safety. Yet safety violations can lead to catastrophic outcomes in real-world deployments. Control Barrier Functions (CBFs) offer a principled method to enforce dynamic safety -- traditionally deployed \emph{online} via safety filters. While the result is safe behavior, the fact that the RL policy does not have knowledge of the CBF can lead to conservative behaviors. This paper proposes CBF-RL, a framework for generating safe behaviors with RL by enforcing CBFs \emph{in training}. CBF-RL has two key attributes: (1) minimally modifying a nominal RL policy to encode safety constraints via a CBF term, (2) and safety filtering of the policy rollouts in training. Theoretically, we prove that continuous-time safety filters can be deployed via closed-form expressions on discrete-time roll-outs. Practically, we demonstrate that CBF-RL internalizes the safety constraints in the learned policy -- both enforcing safer actions and biasing towards safer rewards -- enabling safe deployment without the need for an online safety filter. We validate our framework through ablation studies on navigation tasks and on the Unitree G1 humanoid robot, where CBF-RL enables safer exploration, faster convergence, and robust performance under uncertainty, enabling the humanoid robot to avoid obstacles and climb stairs safely in real-world settings without a runtime safety filter.
comment: 8 pages
From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance
Natural language offers a natural interface for humanoid robots, but existing language-guided humanoid locomotion pipelines remain cumbersome and unreliable. They typically decode human motion, retarget it to robot morphology, and then track it with a physics-based controller. However, this multi-stage process is prone to cumulative errors, introduces high latency, and yields weak coupling between semantics and control. These limitations call for a more direct pathway from language to action, one that eliminates fragile intermediate stages. Therefore, we present RoboGhost, a retargeting-free framework that directly conditions humanoid policies on language-grounded motion latents. By bypassing explicit motion decoding and retargeting, RoboGhost enables a diffusion-based policy to denoise executable actions directly from noise, preserving semantic intent and supporting fast, reactive control. A hybrid causal transformer-diffusion motion generator further ensures long-horizon consistency while maintaining stability and diversity, yielding rich latent representations for precise humanoid behavior. Extensive experiments demonstrate that RoboGhost substantially reduces deployment latency, improves success rates and tracking accuracy, and produces smooth, semantically aligned locomotion on real humanoids. Beyond text, the framework naturally extends to other modalities such as images, audio, and music, providing a general foundation for vision-language-action humanoid systems.
Architecture Is All You Need: Diversity-Enabled Sweet Spots for Robust Humanoid Locomotion
Robust humanoid locomotion in unstructured environments requires architectures that balance fast low-level stabilization with slower perceptual decision-making. We show that a simple layered control architecture (LCA), a proprioceptive stabilizer running at high rate, coupled with a compact low-rate perceptual policy, enables substantially more robust performance than monolithic end-to-end designs, even when using minimal perception encoders. Through a two-stage training curriculum (blind stabilizer pretraining followed by perceptual fine-tuning), we demonstrate that layered policies consistently outperform one-stage alternatives in both simulation and hardware. On a Unitree G1 humanoid, our approach succeeds across stair and ledge tasks where one-stage perceptual policies fail. These results highlight that architectural separation of timescales, rather than network scale or complexity, is the key enabler for robust perception-conditioned locomotion.
comment: 8 pages
EdgeNavMamba: Mamba Optimized Object Detection for Energy Efficient Edge Devices
Deployment of efficient and accurate Deep Learning models has long been a challenge in autonomous navigation, particularly for real-time applications on resource-constrained edge devices. Edge devices are limited in computing power and memory, making model efficiency and compression essential. In this work, we propose EdgeNavMamba, a reinforcement learning-based framework for goal-directed navigation using an efficient Mamba object detection model. To train and evaluate the detector, we introduce a custom shape detection dataset collected in diverse indoor settings, reflecting visual cues common in real-world navigation. The object detector serves as a pre-processing module, extracting bounding boxes (BBOX) from visual input, which are then passed to an RL policy to control goal-oriented navigation. Experimental results show that the student model achieved a reduction of 67% in size, and up to 73% in energy per inference on edge devices of NVIDIA Jetson Orin Nano and Raspberry Pi 5, while keeping the same performance as the teacher model. EdgeNavMamba also maintains high detection accuracy in MiniWorld and IsaacLab simulators while reducing parameters by 31% compared to the baseline. In the MiniWorld simulator, the navigation policy achieves over 90% success across environments of varying complexity.
comment: The 11th IEEE International Conference on Edge Computing and Scalable Cloud (IEEE EdgeCom 2025)
VT-Refine: Learning Bimanual Assembly with Visuo-Tactile Feedback via Simulation Fine-Tunin
Humans excel at bimanual assembly tasks by adapting to rich tactile feedback -- a capability that remains difficult to replicate in robots through behavioral cloning alone, due to the suboptimality and limited diversity of human demonstrations. In this work, we present VT-Refine, a visuo-tactile policy learning framework that combines real-world demonstrations, high-fidelity tactile simulation, and reinforcement learning to tackle precise, contact-rich bimanual assembly. We begin by training a diffusion policy on a small set of demonstrations using synchronized visual and tactile inputs. This policy is then transferred to a simulated digital twin equipped with simulated tactile sensors and further refined via large-scale reinforcement learning to enhance robustness and generalization. To enable accurate sim-to-real transfer, we leverage high-resolution piezoresistive tactile sensors that provide normal force signals and can be realistically modeled in parallel using GPU-accelerated simulation. Experimental results show that VT-Refine improves assembly performance in both simulation and the real world by increasing data diversity and enabling more effective policy fine-tuning. Our project page is available at https://binghao-huang.github.io/vt_refine/.
comment: Accepted by 9th Conference on Robot Learning (CoRL 2025); Website: https://binghao-huang.github.io/vt_refine/
Design of Paper Robot Building Kits
Building robots is an engaging activity that provides opportunities for hands-on learning. However, traditional robot-building kits are usually costly with limited functionality due to material and technology constraints. To improve the accessibility and flexibility of such kits, we take paper as the building material and extensively explore the versatility of paper-based interactions. Based on an analysis of current robot-building kits and paper-based interaction research, we propose a design space for devising paper robots. We also analyzed our building kit designs using this design space, where these kits demonstrate the potential of paper as a cost-effective material for robot building. As a starting point, our design space and building kit examples provide a guideline that inspires and informs future research and development of novel paper robot-building kits.
VLA^2: Empowering Vision-Language-Action Models with an Agentic Framework for Unseen Concept Manipulation
Current vision-language-action (VLA) models, pre-trained on large-scale robotic data, exhibit strong multi-task capabilities and generalize well to variations in visual and language instructions for manipulation. However, their success rate drops significantly when faced with object concepts outside the training data, such as unseen object descriptions and textures in the dataset. To address this, we propose a novel agentic framework, VLA^2, which leverages OpenVLA as the execution backbone and effectively leverages external modules such as web retrieval and object detection to provide visual and textual knowledge about target objects to the VLA. This approach mitigates generalization failure when handling out-of-distribution objects. Based on the LIBERO simulation environment, we introduced novel objects and object descriptions to construct a new evaluation benchmark with three difficulty levels to test the effectiveness of our method. Our framework successfully outperformed the current state-of-the-art models on our designed hard-level generalization benchmark. Compared to the standalone OpenVLA baseline, VLA^2 achieves a 44.2% improvement in the success rate in the hard-level benchmark and an average improvement of 20.2% in all customized environments without any performance degradation on in-domain tasks. Project website: https://vla-2.github.io.
STITCHER: Constrained Trajectory Planning in Known Environments with Real-Time Motion Primitive Search
Autonomous high-speed navigation through large, complex environments requires real-time generation of agile trajectories that are dynamically feasible, collision-free, and satisfy state or actuator constraints. Modern trajectory planning techniques primarily use numerical optimization, as they enable the systematic computation of high-quality, expressive trajectories that satisfy various constraints. However, stringent requirements on computation time and the risk of numerical instability can limit the use of optimization-based planners in safety-critical scenarios. This work presents an optimization-free planning framework called STITCHER that stitches short trajectory segments together with graph search to compute long-range, expressive, and near-optimal trajectories in real-time. STITCHER outperforms modern optimization-based planners through our innovative planning architecture and several algorithmic developments that make real-time planning possible. Extensive simulation testing is performed to analyze the algorithmic components that make up STITCHER, along with a thorough comparison with two state-of-the-art optimization planners. Simulation tests show that safe trajectories can be created within a few milliseconds for paths that span the entirety of two 50 m x 50 m environments. Hardware tests with a custom quadrotor verify that STITCHER can produce trackable paths in real-time while respecting nonconvex constraints, such as limits on tilt angle and motor forces, which are otherwise hard to include in optimization-based planners.
SADCHER: Scheduling using Attention-based Dynamic Coalitions of Heterogeneous Robots in Real-Time
We present Sadcher, a real-time task assignment framework for heterogeneous multi-robot teams that incorporates dynamic coalition formation and task precedence constraints. Sadcher is trained through Imitation Learning and combines graph attention and transformers to predict assignment rewards between robots and tasks. Based on the predicted rewards, a relaxed bipartite matching step generates high-quality schedules with feasibility guarantees. We explicitly model robot and task positions, task durations, and robots' remaining processing times, enabling advanced temporal and spatial reasoning and generalization to environments with different spatiotemporal distributions compared to training. Trained on optimally solved small-scale instances, our method can scale to larger task sets and team sizes. Sadcher outperforms other learning-based and heuristic baselines on randomized, unseen problems for small and medium-sized teams with computation times suitable for real-time operation. We also explore sampling-based variants and evaluate scalability across robot and task counts. In addition, we release our dataset of 250,000 optimal schedules: https://autonomousrobots.nl/paper_websites/sadcher_MRTA/
comment: 7 pages, 5 figures. 2025 IEEE Int. Symposium on Multi-Robot and Multi-Agent Systems (MRS 2025). Website and Code: https://autonomousrobots.nl/paper_websites/sadcher_MRTA/
Multi Agent Switching Mode Controller for Sound Source localization
Source seeking is an important topic in robotic research, especially considering sound-based sensors since they allow the agents to locate a target even in critical conditions where it is not possible to establish a direct line of sight. In this work, we design a multi- agent switching mode control strategy for acoustic-based target localization. Two scenarios are considered: single source localization, in which the agents are driven maintaining a rigid formation towards the target, and multi-source scenario, in which each agent searches for the targets independently from the others.
QDepth-VLA: Quantized Depth Prediction as Auxiliary Supervision for Vision-Language-Action Models
Spatial perception and reasoning are crucial for Vision-Language-Action (VLA) models to accomplish fine-grained manipulation tasks. However, existing approaches often lack the ability to understand and reason over the essential 3D structures necessary for precise control. To address this limitation, we propose QDepth-VLA, a general framework that augments VLA models with an auxiliary depth prediction task. A dedicated depth expert is designed to predict quantized latent tokens of depth maps obtained from a VQ-VAE encoder, enabling the model to learn depth-aware representations that capture critical geometric cues. Experimental results on the simulation benchmarks and real-world tasks demonstrate that QDepth-VLA yields strong spatial reasoning and competitive performance on manipulation tasks.
RL-100: Performant Robotic Manipulation with Real-World Reinforcement Learning
Real-world robotic manipulation in homes and factories demands reliability, efficiency, and robustness that approach or surpass skilled human operators. We present RL-100, a real-world reinforcement learning training framework built on diffusion visuomotor policies trained bu supervised learning. RL-100 introduces a three-stage pipeline. First, imitation learning leverages human priors. Second, iterative offline reinforcement learning uses an Offline Policy Evaluation procedure, abbreviated OPE, to gate PPO-style updates that are applied in the denoising process for conservative and reliable improvement. Third, online reinforcement learning eliminates residual failure modes. An additional lightweight consistency distillation head compresses the multi-step sampling process in diffusion into a single-step policy, enabling high-frequency control with an order-of-magnitude reduction in latency while preserving task performance. The framework is task-, embodiment-, and representation-agnostic and supports both 3D point clouds and 2D RGB inputs, a variety of robot platforms, and both single-step and action-chunk policies. We evaluate RL-100 on seven real-robot tasks spanning dynamic rigid-body control, such as Push-T and Agile Bowling, fluids and granular pouring, deformable cloth folding, precise dexterous unscrewing, and multi-stage orange juicing. RL-100 attains 100\% success across evaluated trials for a total of 900 out of 900 episodes, including up to 250 out of 250 consecutive trials on one task. The method achieves near-human teleoperation or better time efficiency and demonstrates multi-hour robustness with uninterrupted operation lasting up to two hours.
comment: https://lei-kun.github.io/RL-100/
RoboGPT-R1: Enhancing Robot Planning with Reinforcement Learning
Improving the reasoning capabilities of embodied agents is crucial for robots to complete complex human instructions in long-view manipulation tasks successfully. Despite the success of large language models and vision language models based on Supervised Fine-Tuning (SFT) in planning tasks, they continue facing challenges in performing long-horizon manipulation tasks in complex real-world environments, owing to their restricted common sense and reasoning capabilities. Considering that aligning general-purpose vision language models to robotic planning tasks via supervised fine-tuning suffers from poor generalization and insufficient physical understanding, we propose RoboGPT-R1, a two-stage fine-tuning framework for embodied planning. In this framework, supervised training acquires foundational knowledge through expert sequences, followed by RL to address the model's shortcomings in visual-spatial understanding and reasoning. To achieve physical understanding and action sequence consistency in multi-step reasoning tasks, we design a rule-based reward function that simultaneously considers long-horizon performance and action constraint in the environment. The reasoning model, trained on Qwen2.5-VL-3B, significantly outperforms the larger-scale model, GPT-4o-mini, by 21.33% and surpasses other work trained on Qwen2.5-VL-7B by 20.33% on the EmbodiedBench benchmark.
Neural Implicit Flow Fields for Spatio-Temporal Motion Mapping
Safe and efficient robot operation in complex human environments can benefit from good models of site-specific motion patterns. Maps of Dynamics (MoDs) provide such models by encoding statistical motion patterns in a map, but existing representations use discrete spatial sampling and typically require costly offline construction. We propose a continuous spatio-temporal MoD representation based on implicit neural functions that directly map coordinates to the parameters of a Semi-Wrapped Gaussian Mixture Model. This removes the need for discretization and imputation for unevenly sampled regions, enabling smooth generalization across both space and time. Evaluated on a large public dataset with long-term real-world people tracking data, our method achieves better accuracy of motion representation and smoother velocity distributions in sparse regions while still being computationally efficient, compared to available baselines. The proposed approach demonstrates a powerful and efficient way of modeling complex human motion patterns.
SkyDreamer: Interpretable End-to-End Vision-Based Drone Racing with Model-Based Reinforcement Learning
Autonomous drone racing (ADR) systems have recently achieved champion-level performance, yet remain highly specific to drone racing. While end-to-end vision-based methods promise broader applicability, no system to date simultaneously achieves full sim-to-real transfer, onboard execution, and champion-level performance. In this work, we present SkyDreamer, to the best of our knowledge, the first end-to-end vision-based ADR policy that maps directly from pixel-level representations to motor commands. SkyDreamer builds on informed Dreamer, a model-based reinforcement learning approach where the world model decodes to privileged information only available during training. By extending this concept to end-to-end vision-based ADR, the world model effectively functions as an implicit state and parameter estimator, greatly improving interpretability. SkyDreamer runs fully onboard without external aid, resolves visual ambiguities by tracking progress using the state decoded from the world model's hidden state, and requires no extrinsic camera calibration, enabling rapid deployment across different drones without retraining. Real-world experiments show that SkyDreamer achieves robust, high-speed flight, executing tight maneuvers such as an inverted loop, a split-S and a ladder, reaching speeds of up to 21 m/s and accelerations of up to 6 g. It further demonstrates a non-trivial visual sim-to-real transfer by operating on poor-quality segmentation masks, and exhibits robustness to battery depletion by accurately estimating the maximum attainable motor RPM and adjusting its flight path in real-time. These results highlight SkyDreamer's adaptability to important aspects of the reality gap, bringing robustness while still achieving extremely high-speed, agile flight.
Open TeleDex: A Hardware-Agnostic Teleoperation System for Imitation Learning based Dexterous Manipulation
Accurate and high-fidelity demonstration data acquisition is a critical bottleneck for deploying robot Imitation Learning (IL) systems, particularly when dealing with heterogeneous robotic platforms. Existing teleoperation systems often fail to guarantee high-precision data collection across diverse types of teleoperation devices. To address this, we developed Open TeleDex, a unified teleoperation framework engineered for demonstration data collection. Open TeleDex specifically tackles the TripleAny challenge, seamlessly supporting any robotic arm, any dexterous hand, and any external input device. Furthermore, we propose a novel hand pose retargeting algorithm that significantly boosts the interoperability of Open TeleDex, enabling robust and accurate compatibility with an even wider spectrum of heterogeneous master and slave equipment. Open TeleDex establishes a foundational, high-quality, and publicly available platform for accelerating both academic research and industry development in complex robotic manipulation and IL.
comment: 17 pages
Leveraging Neural Descriptor Fields for Learning Contact-Aware Dynamic Recovery
Real-world dexterous manipulation often encounters unexpected errors and disturbances, which can lead to catastrophic failures, such as dropping the manipulated object. To address this challenge, we focus on the problem of catching a falling object while it remains within grasping range and, importantly, resetting the system to a configuration favorable for resuming the primary manipulation task. We propose Contact-Aware Dynamic Recovery (CADRE), a reinforcement learning framework that incorporates a Neural Descriptor Field (NDF)-inspired module to extract implicit contact features. Compared to methods that rely solely on object pose or point cloud input, NDFs can directly reason about finger-object correspondence and adapt to different object geometries. Our experiments show that incorporating contact features improves training efficiency, enhances convergence performance for RL training, and ultimately leads to more successful recoveries. Additionally, we demonstrate that CADRE can generalize zero-shot to unseen objects with different geometries.
When Planners Meet Reality: How Learned, Reactive Traffic Agents Shift nuPlan Benchmarks
Planner evaluation in closed-loop simulation often uses rule-based traffic agents, whose simplistic and passive behavior can hide planner deficiencies and bias rankings. Widely used IDM agents simply follow a lead vehicle and cannot react to vehicles in adjacent lanes, hindering tests of complex interaction capabilities. We address this issue by integrating the state-of-the-art learned traffic agent model SMART into nuPlan. Thus, we are the first to evaluate planners under more realistic conditions and quantify how conclusions shift when narrowing the sim-to-real gap. Our analysis covers 14 recent planners and established baselines and shows that IDM-based simulation overestimates planning performance: nearly all scores deteriorate. In contrast, many planners interact better than previously assumed and even improve in multi-lane, interaction-heavy scenarios like lane changes or turns. Methods trained in closed-loop demonstrate the best and most stable driving performance. However, when reaching their limits in augmented edge-case scenarios, all learned planners degrade abruptly, whereas rule-based planners maintain reasonable basic behavior. Based on our results, we suggest SMART-reactive simulation as a new standard closed-loop benchmark in nuPlan and release the SMART agents as a drop-in alternative to IDM at https://github.com/shgd95/InteractiveClosedLoop.
Requirement Identification for Traffic Simulations in Driving Simulators
This paper addresses the challenge of ensuring realistic traffic conditions by proposing a methodology that systematically identifies traffic simulation requirements. Using a structured approach based on sub-goals in each study phase, specific technical needs are derived for microscopic levels, agent models, and visual representation. The methodology aims to maintain a high degree of fidelity, enhancing both the validity of experimental outcomes and participant engagement. By providing a clear link between study objectives and traffic simulation design, this approach supports robust automotive development and testing.
comment: 2 Pages, 1 figure
Spatially anchored Tactile Awareness for Robust Dexterous Manipulation
Dexterous manipulation requires precise geometric reasoning, yet existing visuo-tactile learning methods struggle with sub-millimeter precision tasks that are routine for traditional model-based approaches. We identify a key limitation: while tactile sensors provide rich contact information, current learning frameworks fail to effectively leverage both the perceptual richness of tactile signals and their spatial relationship with hand kinematics. We believe an ideal tactile representation should explicitly ground contact measurements in a stable reference frame while preserving detailed sensory information, enabling policies to not only detect contact occurrence but also precisely infer object geometry in the hand's coordinate system. We introduce SaTA (Spatially-anchored Tactile Awareness for dexterous manipulation), an end-to-end policy framework that explicitly anchors tactile features to the hand's kinematic frame through forward kinematics, enabling accurate geometric reasoning without requiring object models or explicit pose estimation. Our key insight is that spatially grounded tactile representations allow policies to not only detect contact occurrence but also precisely infer object geometry in the hand's coordinate system. We validate SaTA on challenging dexterous manipulation tasks, including bimanual USB-C mating in free space, a task demanding sub-millimeter alignment precision, as well as light bulb installation requiring precise thread engagement and rotational control, and card sliding that demands delicate force modulation and angular precision. These tasks represent significant challenges for learning-based methods due to their stringent precision requirements. Across multiple benchmarks, SaTA significantly outperforms strong visuo-tactile baselines, improving success rates by up to 30 percentage while reducing task completion times by 27 percentage.
comment: 8 pages
Generative Models From and For Sampling-Based MPC: A Bootstrapped Approach For Adaptive Contact-Rich Manipulation
We present a generative predictive control (GPC) framework that amortizes sampling-based Model Predictive Control (SPC) by bootstrapping it with conditional flow-matching models trained on SPC control sequences collected in simulation. Unlike prior work relying on iterative refinement or gradient-based solvers, we show that meaningful proposal distributions can be learned directly from noisy SPC data, enabling more efficient and informed sampling during online planning. We further demonstrate, for the first time, the application of this approach to real-world contact-rich loco-manipulation with a quadruped robot. Extensive experiments in simulation and on hardware show that our method improves sample efficiency, reduces planning horizon requirements, and generalizes robustly across task variations.
comment: 9 pages, 5 figures
GOPLA: Generalizable Object Placement Learning via Synthetic Augmentation of Human Arrangement
Robots are expected to serve as intelligent assistants, helping humans with everyday household organization. A central challenge in this setting is the task of object placement, which requires reasoning about both semantic preferences (e.g., common-sense object relations) and geometric feasibility (e.g., collision avoidance). We present GOPLA, a hierarchical framework that learns generalizable object placement from augmented human demonstrations. A multi-modal large language model translates human instructions and visual inputs into structured plans that specify pairwise object relationships. These plans are then converted into 3D affordance maps with geometric common sense by a spatial mapper, while a diffusion-based planner generates placement poses guided by test-time costs, considering multi-plan distributions and collision avoidance. To overcome data scarcity, we introduce a scalable pipeline that expands human placement demonstrations into diverse synthetic training data. Extensive experiments show that our approach improves placement success rates by 30.04 percentage points over the runner-up, evaluated on positioning accuracy and physical plausibility, demonstrating strong generalization across a wide range of real-world robotic placement scenarios.
Accelerated Multi-Modal Motion Planning Using Context-Conditioned Diffusion Models
Classical methods in robot motion planning, such as sampling-based and optimization-based methods, often struggle with scalability towards higher-dimensional state spaces and complex environments. Diffusion models, known for their capability to learn complex, high-dimensional and multi-modal data distributions, provide a promising alternative when applied to motion planning problems and have already shown interesting results. However, most of the current approaches train their model for a single environment, limiting their generalization to environments not seen during training. The techniques that do train a model for multiple environments rely on a specific camera to provide the model with the necessary environmental information and therefore always require that sensor. To effectively adapt to diverse scenarios without the need for retraining, this research proposes Context-Aware Motion Planning Diffusion (CAMPD). CAMPD leverages a classifier-free denoising probabilistic diffusion model, conditioned on sensor-agnostic contextual information. An attention mechanism, integrated in the well-known U-Net architecture, conditions the model on an arbitrary number of contextual parameters. CAMPD is evaluated on a 7-DoF robot manipulator and benchmarked against state-of-the-art approaches on real-world tasks, showing its ability to generalize to unseen environments and generate high-quality, multi-modal trajectories, at a fraction of the time required by existing methods.
comment: This paper has been submitted and has not yet been peer reviewed or accepted for publication
Proprioceptive Image: An Image Representation of Proprioceptive Data from Quadruped Robots for Contact Estimation Learning
This paper presents a novel approach for representing proprioceptive time-series data from quadruped robots as structured two-dimensional images, enabling the use of convolutional neural networks for learning locomotion-related tasks. The proposed method encodes temporal dynamics from multiple proprioceptive signals, such as joint positions, IMU readings, and foot velocities, while preserving the robot's morphological structure in the spatial arrangement of the image. This transformation captures inter-signal correlations and gait-dependent patterns, providing a richer feature space than direct time-series processing. We apply this concept in the problem of contact estimation, a key capability for stable and adaptive locomotion on diverse terrains. Experimental evaluations on both real-world datasets and simulated environments show that our image-based representation consistently enhances prediction accuracy and generalization over conventional sequence-based models, underscoring the potential of cross-modal encoding strategies for robotic state learning. Our method achieves superior performance on the contact dataset, improving contact state accuracy from 87.7% to 94.5% over the recently proposed MI-HGNN method, using a 15 times shorter window size.
A Generalized Placeability Metric for Model-Free Unified Pick-and-Place Reasoning
To reliably pick and place unknown objects under real-world sensing noise remains a challenging task, as existing methods rely on strong object priors (e.g., CAD models), or planar-support assumptions, limiting generalization and unified reasoning between grasping and placing. In this work, we introduce a generalized placeability metric that evaluates placement poses directly from noisy point clouds, without any shape priors. The metric jointly scores stability, graspability, and clearance. From raw geometry, we extract the support surfaces of the object to generate diverse candidates for multi-orientation placement and sample contacts that satisfy collision and stability constraints. By conditioning grasp scores on each candidate placement, our proposed method enables model-free unified pick-and-place reasoning and selects grasp-place pairs that lead to stable, collision-free placements. On unseen real objects and non-planar object supports, our metric delivers CAD-comparable accuracy in predicting stability loss and generally produces more physically plausible placements than learning-based predictors.
QuASH: Using Natural-Language Heuristics to Query Visual-Language Robotic Maps ICRA 2026
Embeddings from Visual-Language Models are increasingly utilized to represent semantics in robotic maps, offering an open-vocabulary scene understanding that surpasses traditional, limited labels. Embeddings enable on-demand querying by comparing embedded user text prompts to map embeddings via a similarity metric. The key challenge in performing the task indicated in a query is that the robot must determine the parts of the environment relevant to the query. This paper proposes a solution to this challenge. We leverage natural-language synonyms and antonyms associated with the query within the embedding space, applying heuristics to estimate the language space relevant to the query, and use that to train a classifier to partition the environment into matches and non-matches. We evaluate our method through extensive experiments, querying both maps and standard image benchmarks. The results demonstrate increased queryability of maps and images. Our querying technique is agnostic to the representation and encoder used, and requires limited training.
comment: Submitted to ICRA 2026
Stability Criteria and Motor Performance in Delayed Haptic Dyadic Interactions Mediated by Robots
This paper establishes analytical stability criteria for robot-mediated human-human (dyadic) interaction systems, focusing on haptic communication under network-induced time delays. Through frequency-domain analysis supported by numerical simulations, we identify both delay-independent and delay-dependent stability criteria. The delay-independent criterion guarantees stability irrespective of the delay, whereas the delay-dependent criterion is characterised by a maximum tolerable delay before instability occurs. The criteria demonstrate dependence on controller and robot dynamic parameters, where increasing stiffness reduces the maximum tolerable delay in a non-linear manner, thereby heightening system vulnerability. The proposed criteria can be generalised to a wide range of robot-mediated interactions and serve as design guidelines for stable remote dyadic systems. Experiments with robots performing human-like movements further illustrate the correlation between stability and motor performance. The findings of this paper suggest the prerequisites for effective delay-compensation strategies.
Restoring Noisy Demonstration for Imitation Learning With Diffusion Models
Imitation learning (IL) aims to learn a policy from expert demonstrations and has been applied to various applications. By learning from the expert policy, IL methods do not require environmental interactions or reward signals. However, most existing imitation learning algorithms assume perfect expert demonstrations, but expert demonstrations often contain imperfections caused by errors from human experts or sensor/control system inaccuracies. To address the above problems, this work proposes a filter-and-restore framework to best leverage expert demonstrations with inherent noise. Our proposed method first filters clean samples from the demonstrations and then learns conditional diffusion models to recover the noisy ones. We evaluate our proposed framework and existing methods in various domains, including robot arm manipulation, dexterous manipulation, and locomotion. The experiment results show that our proposed framework consistently outperforms existing methods across all the tasks. Ablation studies further validate the effectiveness of each component and demonstrate the framework's robustness to different noise types and levels. These results confirm the practical applicability of our framework to noisy offline demonstration data.
comment: Published in IEEE Transactions on Neural Networks and Learning Systems (TNNLS)
Towards Adaptable Humanoid Control via Adaptive Motion Tracking
Humanoid robots are envisioned to adapt demonstrated motions to diverse real-world conditions while accurately preserving motion patterns. Existing motion prior approaches enable well adaptability with a few motions but often sacrifice imitation accuracy, whereas motion-tracking methods achieve accurate imitation yet require many training motions and a test-time target motion to adapt. To combine their strengths, we introduce AdaMimic, a novel motion tracking algorithm that enables adaptable humanoid control from a single reference motion. To reduce data dependence while ensuring adaptability, our method first creates an augmented dataset by sparsifying the single reference motion into keyframes and applying light editing with minimal physical assumptions. A policy is then initialized by tracking these sparse keyframes to generate dense intermediate motions, and adapters are subsequently trained to adjust tracking speed and refine low-level actions based on the adjustment, enabling flexible time warping that further improves imitation accuracy and adaptability. We validate these significant improvements in our approach in both simulation and the real-world Unitree G1 humanoid robot in multiple tasks across a wide range of adaptation conditions. Videos and code are available at https://taohuang13.github.io/adamimic.github.io/.
comment: 9 pages
RoboANKLE: Design, Development, and Functional Evaluation of a Robotic Ankle with a Motorized Compliant Unit
This study presents a powered transtibial prosthesis with complete push-off assistance, RoboANKLE. The design aims to fulfill specific requirements, such as a sufficient range of motion (RoM) while providing the necessary torque for achieving natural ankle motion in daily activities. Addressing the challenges faced in designing active transtibial prostheses, such as maintaining energetic autonomy and minimizing weight, is vital for the study. With this aim, we try to imitate the human ankle by providing extensive push-off assistance to achieve a natural-like torque profile. Thus, Energy Store and Extended Release mechanism (ESER) is employed with a novel Extra Energy Storage (EES) mechanism. Kinematic and kinetic analyses are carried out to determine the design parameters and assess the design performance. Subsequently, a Computer-Aided Design (CAD) model is built and used in comprehensive dynamic and structural analyses. These analyses are used for the design performance evaluation and determine the forces and torques applied to the prosthesis, which aids in optimizing the design for minimal weight via structural analysis and topology optimization. The design of the prototype is then finalized and manufactured for experimental evaluation to validate the design and functionality. The prototype is realized with a mass of 1.92 kg and dimensions of 261x107x420 mm. The Functional evaluations of the RoboANKLE revealed that it is capable of achieving the natural maximum dorsi-flexion angle with 95% accuracy. Also, Thanks to the implemented mechanisms, the results show that RoboANKLE can generate 57% higher than the required torque for natural walking. The result of the power generation capacity of the RoboANKLE is 10% more than the natural power during the gait cycle.
SUM-AgriVLN: Spatial Understanding Memory for Agricultural Vision-and-Language Navigation
Agricultural robots are emerging as powerful assistants across a wide range of agricultural tasks, nevertheless, still heavily rely on manual operation or fixed rail systems for movement. The AgriVLN method and the A2A benchmark pioneeringly extend Vision-and-Language Navigation (VLN) to the agricultural domain, enabling robots to navigate to the target positions following the natural language instructions. In practical agricultural scenarios, navigation instructions often repeatedly occur, yet AgriVLN treat each instruction as an independent episode, overlooking the potential of past experiences to provide spatial context for subsequent ones. To bridge this gap, we propose the method of Spatial Understanding Memory for Agricultural Vision-and-Language Navigation (SUM-AgriVLN), in which the SUM module employs spatial understanding and save spatial memory through 3D reconstruction and representation. When evaluated on the A2A benchmark, our SUM-AgriVLN effectively improves Success Rate from 0.47 to 0.54 with slight sacrifice on Navigation Error from 2.91m to 2.93m, demonstrating the state-of-the-art performance in the agricultural domain. Code: https://github.com/AlexTraveling/SUM-AgriVLN.
Leveraging Cycle-Consistent Anchor Points for Self-Supervised RGB-D Registration ICRA 2024
With the rise in consumer depth cameras, a wealth of unlabeled RGB-D data has become available. This prompts the question of how to utilize this data for geometric reasoning of scenes. While many RGB-D registration meth- ods rely on geometric and feature-based similarity, we take a different approach. We use cycle-consistent keypoints as salient points to enforce spatial coherence constraints during matching, improving correspondence accuracy. Additionally, we introduce a novel pose block that combines a GRU recurrent unit with transformation synchronization, blending historical and multi-view data. Our approach surpasses previous self- supervised registration methods on ScanNet and 3DMatch, even outperforming some older supervised methods. We also integrate our components into existing methods, showing their effectiveness.
comment: 8 pages, accepted at ICRA 2024 (International Conference on Robotics and Automation)
Risk-Aware Reinforcement Learning with Bandit-Based Adaptation for Quadrupedal Locomotion
In this work, we study risk-aware reinforcement learning for quadrupedal locomotion. Our approach trains a family of risk-conditioned policies using a Conditional Value-at-Risk (CVaR) constrained policy optimization technique that provides improved stability and sample efficiency. At deployment, we adaptively select the best performing policy from the family of policies using a multi-armed bandit framework that uses only observed episodic returns, without any privileged environment information, and adapts to unknown conditions on the fly. Hence, we train quadrupedal locomotion policies at various levels of robustness using CVaR and adaptively select the desired level of robustness online to ensure performance in unknown environments. We evaluate our method in simulation across eight unseen settings (by changing dynamics, contacts, sensing noise, and terrain) and on a Unitree Go2 robot in previously unseen terrains. Our risk-aware policy attains nearly twice the mean and tail performance in unseen environments compared to other baselines and our bandit-based adaptation selects the best-performing risk-aware policy in unknown terrain within two minutes of operation.
Expertise need not monopolize: Action-Specialized Mixture of Experts for Vision-Language-Action Learning
Vision-Language-Action (VLA) models are experiencing rapid development and demonstrating promising capabilities in robotic manipulation tasks. However, scaling up VLA models presents several critical challenges: (1) Training new VLA models from scratch demands substantial computational resources and extensive datasets. Given the current scarcity of robot data, it becomes particularly valuable to fully leverage well-pretrained VLA model weights during the scaling process. (2) Real-time control requires carefully balancing model capacity with computational efficiency. To address these challenges, We propose AdaMoE, a Mixture-of-Experts (MoE) architecture that inherits pretrained weights from dense VLA models, and scales up the action expert by substituting the feedforward layers into sparsely activated MoE layers. AdaMoE employs a decoupling technique that decouples expert selection from expert weighting through an independent scale adapter working alongside the traditional router. This enables experts to be selected based on task relevance while contributing with independently controlled weights, allowing collaborative expert utilization rather than winner-takes-all dynamics. Our approach demonstrates that expertise need not monopolize. Instead, through collaborative expert utilization, we can achieve superior performance while maintaining computational efficiency. AdaMoE consistently outperforms the baseline model across key benchmarks, delivering performance gains of 1.8% on LIBERO and 9.3% on RoboTwin. Most importantly, a substantial 21.5% improvement in real-world experiments validates its practical effectiveness for robotic manipulation tasks.
Learning Human-Humanoid Coordination for Collaborative Object Carrying
Human-humanoid collaboration shows significant promise for applications in healthcare, domestic assistance, and manufacturing. While compliant robot-human collaboration has been extensively developed for robotic arms, enabling compliant human-humanoid collaboration remains largely unexplored due to humanoids' complex whole-body dynamics. In this paper, we propose a proprioception-only reinforcement learning approach, COLA, that combines leader and follower behaviors within a single policy. The model is trained in a closed-loop environment with dynamic object interactions to predict object motion patterns and human intentions implicitly, enabling compliant collaboration to maintain load balance through coordinated trajectory planning. We evaluate our approach through comprehensive simulator and real-world experiments on collaborative carrying tasks, demonstrating the effectiveness, generalization, and robustness of our model across various terrains and objects. Simulation experiments demonstrate that our model reduces human effort by 24.7%. compared to baseline approaches while maintaining object stability. Real-world experiments validate robust collaborative carrying across different object types (boxes, desks, stretchers, etc.) and movement patterns (straight-line, turning, slope climbing). Human user studies with 23 participants confirm an average improvement of 27.4% compared to baseline models. Our method enables compliant human-humanoid collaborative carrying without requiring external sensors or complex interaction models, offering a practical solution for real-world deployment.
Prescribed Performance Control of Deformable Object Manipulation in Spatial Latent Space
Manipulating three-dimensional (3D) deformable objects presents significant challenges for robotic systems due to their infinite-dimensional state space and complex deformable dynamics. This paper proposes a novel model-free approach for shape control with constraints imposed on key points. Unlike existing methods that rely on feature dimensionality reduction, the proposed controller leverages the coordinates of key points as the feature vector, which are extracted from the deformable object's point cloud using deep learning methods. This approach not only reduces the dimensionality of the feature space but also retains the spatial information of the object. By extracting key points, the manipulation of deformable objects is simplified into a visual servoing problem, where the shape dynamics are described using a deformation Jacobian matrix. To enhance control accuracy, a prescribed performance control method is developed by integrating barrier Lyapunov functions (BLF) to enforce constraints on the key points. The stability of the closed-loop system is rigorously analyzed and verified using the Lyapunov method. Experimental results further demonstrate the effectiveness and robustness of the proposed method.
Lagrange-Poincaré-Kepler Equations of Disturbed Space-Manipulator Systems in Orbit
This article presents an extension of the Lagrange-Poincare Equations (LPE) to model the dynamics of spacecraft-manipulator systems operating within a non-inertial orbital reference frame. Building upon prior formulations of LPE for vehicle-manipulator systems, the proposed framework, termed the Lagrange-Poincare-Kepler Equations (LPKE), incorporates the coupling between spacecraft attitude dynamics, orbital motion, and manipulator kinematics. The formalism combines the Euler-Poincare equations for the base spacecraft, Keplerian orbital dynamics for the reference frame, and reduced Euler-Lagrange equations for the manipulator's shape space, using an exponential joint parametrization. Leveraging the Lagrange-d'Alembert principle on principal bundles, we derive novel closed-form structural matrices that explicitly capture the effects of orbital disturbances and their dynamic coupling with the manipulator system. The LPKE framework also systematically includes externally applied, symmetry-breaking wrenches, allowing for immediate integration into hardware-in-the-loop simulations and model-based control architectures for autonomous robotic operations in the orbital environment. To illustrate the effectiveness of the proposed model and its numerical superiority, we present a simulation study analyzing orbital effects on a 7-degree-of-freedom manipulator mounted on a spacecraft.
RM-RL: Role-Model Reinforcement Learning for Precise Robot Manipulation
Precise robot manipulation is critical for fine-grained applications such as chemical and biological experiments, where even small errors (e.g., reagent spillage) can invalidate an entire task. Existing approaches often rely on pre-collected expert demonstrations and train policies via imitation learning (IL) or offline reinforcement learning (RL). However, obtaining high-quality demonstrations for precision tasks is difficult and time-consuming, while offline RL commonly suffers from distribution shifts and low data efficiency. We introduce a Role-Model Reinforcement Learning (RM-RL) framework that unifies online and offline training in real-world environments. The key idea is a role-model strategy that automatically generates labels for online training data using approximately optimal actions, eliminating the need for human demonstrations. RM-RL reformulates policy learning as supervised training, reducing instability from distribution mismatch and improving efficiency. A hybrid training scheme further leverages online role-model data for offline reuse, enhancing data efficiency through repeated sampling. Extensive experiments show that RM-RL converges faster and more stably than existing RL methods, yielding significant gains in real-world manipulation: 53% improvement in translation accuracy and 20% in rotation accuracy. Finally, we demonstrate the successful execution of a challenging task, precisely placing a cell plate onto a shelf, highlighting the framework's effectiveness where prior methods fail.
Autonomous Reactive Masonry Construction using Collaborative Heterogeneous Aerial Robots with Experimental Demonstration
This article presents a fully autonomous aerial masonry construction framework using heterogeneous unmanned aerial vehicles (UAVs), supported by experimental validation. Two specialized UAVs were developed for the task: (i) a brick-carrier UAV equipped with a ball-joint actuation mechanism for precise brick manipulation, and (ii) an adhesion UAV integrating a servo-controlled valve and extruder nozzle for accurate adhesion application. The proposed framework employs a reactive mission planning unit that combines a dependency graph of the construction layout with a conflict graph to manage simultaneous task execution, while hierarchical state machines ensure robust operation and safe transitions during task execution. Dynamic task allocation allows real-time adaptation to environmental feedback, while minimum-jerk trajectory generation ensures smooth and precise UAV motion during brick pickup and placement. Additionally, the brick-carrier UAV employs an onboard vision system that estimates brick poses in real time using ArUco markers and a least-squares optimization filter, enabling accurate alignment during construction. To the best of the authors' knowledge, this work represents the first experimental demonstration of fully autonomous aerial masonry construction using heterogeneous UAVs, where one UAV precisely places the bricks while another autonomously applies adhesion material between them. The experimental results supported by the video showcase the effectiveness of the proposed framework and demonstrate its potential to serve as a foundation for future developments in autonomous aerial robotic construction.
UrbanVerse: Scaling Urban Simulation by Watching City-Tour Videos
Urban embodied AI agents, ranging from delivery robots to quadrupeds, are increasingly populating our cities, navigating chaotic streets to provide last-mile connectivity. Training such agents requires diverse, high-fidelity urban environments to scale, yet existing human-crafted or procedurally generated simulation scenes either lack scalability or fail to capture real-world complexity. We introduce UrbanVerse, a data-driven real-to-sim system that converts crowd-sourced city-tour videos into physics-aware, interactive simulation scenes. UrbanVerse consists of: (i) UrbanVerse-100K, a repository of 100k+ annotated urban 3D assets with semantic and physical attributes, and (ii) UrbanVerse-Gen, an automatic pipeline that extracts scene layouts from video and instantiates metric-scale 3D simulations using retrieved assets. Running in IsaacSim, UrbanVerse offers 160 high-quality constructed scenes from 24 countries, along with a curated benchmark of 10 artist-designed test scenes. Experiments show that UrbanVerse scenes preserve real-world semantics and layouts, achieving human-evaluated realism comparable to manually crafted scenes. In urban navigation, policies trained in UrbanVerse exhibit scaling power laws and strong generalization, improving success by +6.3% in simulation and +30.1% in zero-shot sim-to-real transfer comparing to prior methods, accomplishing a 300 m real-world mission with only two interventions.
comment: Technical report. Project page: https://urbanverseproject.github.io/
MimicKit: A Reinforcement Learning Framework for Motion Imitation and Control
MimicKit is an open-source framework for training motion controllers using motion imitation and reinforcement learning. The codebase provides implementations of commonly-used motion-imitation techniques and RL algorithms. This framework is intended to support research and applications in computer graphics and robotics by providing a unified training framework, along with standardized environment, agent, and data structures. The codebase is designed to be modular and easily configurable, enabling convenient modification and extension to new characters and tasks. The open-source codebase is available at: https://github.com/xbpeng/MimicKit.
Hoecken-D Hand: A Novel Robotic Hand for Linear Parallel Pinching and Self-Adaptive Grasping
This paper presents the Hoecken-D Hand, an underactuated robotic gripper that combines a modified Hoecken linkage with a differential spring mechanism to achieve both linear parallel pinching and a mid-stroke transition to adaptive envelope. The original Hoecken linkage is reconfigured by replacing one member with differential links, preserving straight-line guidance while enabling contact-triggered reconfiguration without additional actuators. A double-parallelogram arrangement maintains fingertip parallelism during conventional pinching, whereas the differential mechanism allows one finger to wrap inward upon encountering an obstacle, improving stability on irregular or thin objects. The mechanism can be driven by a single linear actuator, minimizing complexity and cost; in our prototype, each finger is driven by its own linear actuator for simplicity. We perform kinematic modeling and force analysis to characterize grasp performance, including simulated grasping forces and spring-opening behavior under varying geometric parameters. The design was prototyped using PLA-based 3D printing, achieving a linear pinching span of approximately 200 mm. Preliminary tests demonstrate reliable grasping in both modes across a wide range of object geometries, highlighting the Hoecken-D Hand as a compact, adaptable, and cost-effective solution for manipulation in unstructured environments.
comment: Accepted by IEEE International Conference on Robotics and Biomimetics (ROBIO) 2025, Chengdu, China. This version includes updated contact information
SimULi: Real-Time LiDAR and Camera Simulation with Unscented Transforms
Rigorous testing of autonomous robots, such as self-driving vehicles, is essential to ensure their safety in real-world deployments. This requires building high-fidelity simulators to test scenarios beyond those that can be safely or exhaustively collected in the real-world. Existing neural rendering methods based on NeRF and 3DGS hold promise but suffer from low rendering speeds or can only render pinhole camera models, hindering their suitability to applications that commonly require high-distortion lenses and LiDAR data. Multi-sensor simulation poses additional challenges as existing methods handle cross-sensor inconsistencies by favoring the quality of one modality at the expense of others. To overcome these limitations, we propose SimULi, the first method capable of rendering arbitrary camera models and LiDAR data in real-time. Our method extends 3DGUT, which natively supports complex camera models, with LiDAR support, via an automated tiling strategy for arbitrary spinning LiDAR models and ray-based culling. To address cross-sensor inconsistencies, we design a factorized 3D Gaussian representation and anchoring strategy that reduces mean camera and depth error by up to 40% compared to existing methods. SimULi renders 10-20x faster than ray tracing approaches and 1.5-10x faster than prior rasterization-based work (and handles a wider range of camera models). When evaluated on two widely benchmarked autonomous driving datasets, SimULi matches or exceeds the fidelity of existing state-of-the-art methods across numerous camera and LiDAR metrics.
comment: Project page: https://research.nvidia.com/labs/sil/projects/simuli
SIG-Chat: Spatial Intent-Guided Conversational Gesture Generation Involving How, When and Where
The accompanying actions and gestures in dialogue are often closely linked to interactions with the environment, such as looking toward the interlocutor or using gestures to point to the described target at appropriate moments. Speech and semantics guide the production of gestures by determining their timing (WHEN) and style (HOW), while the spatial locations of interactive objects dictate their directional execution (WHERE). Existing approaches either rely solely on descriptive language to generate motions or utilize audio to produce non-interactive gestures, thereby lacking the characterization of interactive timing and spatial intent. This significantly limits the applicability of conversational gesture generation, whether in robotics or in the fields of game and animation production. To address this gap, we present a full-stack solution. We first established a unique data collection method to simultaneously capture high-precision human motion and spatial intent. We then developed a generation model driven by audio, language, and spatial data, alongside dedicated metrics for evaluating interaction timing and spatial accuracy. Finally, we deployed the solution on a humanoid robot, enabling rich, context-aware physical interactions.
Act to See, See to Act: Diffusion-Driven Perception-Action Interplay for Adaptive Policies NeurIPS 2025
Existing imitation learning methods decouple perception and action, which overlooks the causal reciprocity between sensory representations and action execution that humans naturally leverage for adaptive behaviors. To bridge this gap, we introduce Action-Guided Diffusion Policy (DP-AG), a unified representation learning that explicitly models a dynamic interplay between perception and action through probabilistic latent dynamics. DP-AG encodes latent observations into a Gaussian posterior via variational inference and evolves them using an action-guided SDE, where the Vector-Jacobian Product (VJP) of the diffusion policy's noise predictions serves as a structured stochastic force driving latent updates. To promote bidirectional learning between perception and action, we introduce a cycle-consistent contrastive loss that organizes the gradient flow of the noise predictor into a coherent perception-action loop, enforcing mutually consistent transitions in both latent updates and action refinements. Theoretically, we derive a variational lower bound for the action-guided SDE, and prove that the contrastive objective enhances continuity in both latent and action trajectories. Empirically, DP-AG significantly outperforms state-of-the-art methods across simulation benchmarks and real-world UR5 manipulation tasks. As a result, our DP-AG offers a promising step toward bridging biological adaptability and artificial policy learning.
comment: 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
STITCHER: Real-Time Trajectory Planning with Motion Primitive Search
Autonomous high-speed navigation through large, complex environments requires real-time generation of agile trajectories that are dynamically feasible, collision-free, and satisfy constraints. Most modern trajectory planning techniques rely on numerical optimization because high-quality, expressive trajectories that satisfy constraints can be systematically computed. However, strict requirements on computation time and the risk of numerical instability can limit the use of optimization-based planners in safety-critical situations. This work presents an optimization-free planning framework called STITCHER that leverages graph search to generate long-range trajectories by stitching short trajectory segments together in real time. STITCHER is shown to outperform modern optimization-based planners through its innovative planning architecture and several algorithmic developments that make real-time planning possible. Simulation results show safe trajectories through complex environments can be generated in milliseconds that cover tens of meters.
SGAligner++: Cross-Modal Language-Aided 3D Scene Graph Alignment
Aligning 3D scene graphs is a crucial initial step for several applications in robot navigation and embodied perception. Current methods in 3D scene graph alignment often rely on single-modality point cloud data and struggle with incomplete or noisy input. We introduce SGAligner++, a cross-modal, language-aided framework for 3D scene graph alignment. Our method addresses the challenge of aligning partially overlapping scene observations across heterogeneous modalities by learning a unified joint embedding space, enabling accurate alignment even under low-overlap conditions and sensor noise. By employing lightweight unimodal encoders and attention-based fusion, SGAligner++ enhances scene understanding for tasks such as visual localization, 3D reconstruction, and navigation, while ensuring scalability and minimal computational overhead. Extensive evaluations on real-world datasets demonstrate that SGAligner++ outperforms state-of-the-art methods by up to 40% on noisy real-world reconstructions, while enabling cross-modal generalization.
comment: Project Page: https://singhbino3d.github.io/sgpp/
Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos ICCV 2025
Recent developments in Large Language Models pre-trained on extensive corpora have shown significant success in various natural language processing tasks with minimal fine-tuning. This success offers new promise for robotics, which has long been constrained by the high cost of action-labeled data. We ask: given the abundant video data containing interaction-related knowledge available as a rich "corpus", can a similar generative pre-training approach be effectively applied to enhance robot learning? The key challenge is to identify an effective representation for autoregressive pre-training that benefits robot manipulation tasks. Inspired by the way humans learn new skills through observing dynamic environments, we propose that effective robotic learning should emphasize motion-related knowledge, which is closely tied to low-level actions and is hardware-agnostic, facilitating the transfer of learned motions to actual robot actions. To this end, we introduce Moto, which converts video content into latent Motion Token sequences by a Latent Motion Tokenizer, learning a bridging "language" of motion from videos in an unsupervised manner. We pre-train Moto-GPT through motion token autoregression, enabling it to capture diverse visual motion knowledge. After pre-training, Moto-GPT demonstrates the promising ability to produce semantically interpretable motion tokens, predict plausible motion trajectories, and assess trajectory rationality through output likelihood. To transfer learned motion priors to real robot actions, we implement a co-fine-tuning strategy that seamlessly bridges latent motion token prediction and real robot control. Extensive experiments show that the fine-tuned Moto-GPT exhibits superior robustness and efficiency on robot manipulation benchmarks, underscoring its effectiveness in transferring knowledge from video data to downstream visual manipulation tasks.
comment: ICCV 2025. Project page: https://chenyi99.github.io/moto/
FEWT: Improving Humanoid Robot Perception with Frequency-Enhanced Wavelet-based Transformers
The embodied intelligence bridges the physical world and information space. As its typical physical embodiment, humanoid robots have shown great promise through robot learning algorithms in recent years. In this study, a hardware platform, including humanoid robot and exoskeleton-style teleoperation cabin, was developed to realize intuitive remote manipulation and efficient collection of anthropomorphic action data. To improve the perception representation of humanoid robot, an imitation learning framework, termed Frequency-Enhanced Wavelet-based Transformer (FEWT), was proposed, which consists of two primary modules: Frequency-Enhanced Efficient Multi-Scale Attention (FE-EMA) and Time-Series Discrete Wavelet Transform (TS-DWT). By combining multi-scale wavelet decomposition with the residual network, FE-EMA can dynamically fuse features from both cross-spatial and frequency-domain. This fusion is able to capture feature information across various scales effectively, thereby enhancing model robustness. Experimental performance demonstrates that FEWT improves the success rate of the state-of-the-art algorithm (Action Chunking with Transformers, ACT baseline) by up to 30% in simulation and by 6-12% in real-world.
FTIN: Frequency-Time Integration Network for Inertial Odometry
Inertial odometry (IO) leverages inertial measurement unit (IMU) signals for cost-effective localization. However, high IMU sampling rates introduce substantial redundancy that impedes IO's ability to attend to salient components, thereby creating an information bottleneck. To address this challenge, we propose a cross-domain IO framework that fuses information from the frequency and time domains. Specifically, we exploit the global context and energy-compaction properties of frequency-domain representations to capture holistic motion patterns and alleviate the bottleneck. To the best of our knowledge, this is among the first attempts to incorporate frequency-domain feature processing into IO. Experimental results on multiple public datasets demonstrate the effectiveness of the proposed frequency--time-domain fusion strategy.
CKANIO: Learnable Chebyshev Polynomials for Inertial Odometry
Inertial odometry (IO) relies exclusively on signals from an inertial measurement unit (IMU) for localization and offers a promising avenue for consumer grade positioning. However, accurate modeling of the nonlinear motion patterns present in IMU signals remains the principal limitation on IO accuracy. To address this challenge, we propose CKANIO, an IO framework that integrates Chebyshev based Kolmogorov-Arnold Networks (Chebyshev KAN). Specifically, we design a novel residual architecture that leverages the nonlinear approximation capabilities of Chebyshev polynomials within the KAN framework to more effectively model the complex motion characteristics inherent in IMU signals. To the best of our knowledge, this work represents the first application of an interpretable KAN model to IO. Experimental results on five publicly available datasets demonstrate the effectiveness of CKANIO.
NAMO-LLM: Efficient Navigation Among Movable Obstacles with Large Language Model Guidance
Several planners have been proposed to compute robot paths that reach desired goal regions while avoiding obstacles. However, these methods fail when all pathways to the goal are blocked. In such cases, the robot must reason about how to reconfigure the environment to access task-relevant regions - a problem known as Navigation Among Movable Objects (NAMO). While various solutions to this problem have been developed, they often struggle to scale to highly cluttered environments. To address this, we propose NAMO-LLM, a sampling-based planner that searches over robot and obstacle configurations to compute feasible plans specifying which obstacles to move, where, and in what order. Its key novelty is a non-uniform sampling strategy guided by Large Language Models (LLMs) biasing the tree construction toward directions more likely to yield a solution. We show that NAMO-LLM is probabilistically complete and demonstrate through experiments that it efficiently scales to cluttered environments, outperforming related works in both runtime and plan quality.
comment: 9 pages, 6 figures
WoW: Towards a World omniscient World model Through Embodied Interaction
Humans develop an understanding of intuitive physics through active interaction with the world. This approach is in stark contrast to current video models, such as Sora, which rely on passive observation and therefore struggle with grasping physical causality. This observation leads to our central hypothesis: authentic physical intuition of the world model must be grounded in extensive, causally rich interactions with the real world. To test this hypothesis, we present WoW, a 14-billion-parameter generative world model trained on 2 million robot interaction trajectories. Our findings reveal that the model's understanding of physics is a probabilistic distribution of plausible outcomes, leading to stochastic instabilities and physical hallucinations. Furthermore, we demonstrate that this emergent capability can be actively constrained toward physical realism by SOPHIA, where vision-language model agents evaluate the DiT-generated output and guide its refinement by iteratively evolving the language instructions. In addition, a co-trained Inverse Dynamics Model translates these refined plans into executable robotic actions, thus closing the imagination-to-action loop. We establish WoWBench, a new benchmark focused on physical consistency and causal reasoning in video, where WoW achieves state-of-the-art performance in both human and autonomous evaluation, demonstrating strong ability in physical causality, collision dynamics, and object permanence. Our work provides systematic evidence that large-scale, real-world interaction is a cornerstone for developing physical intuition in AI. Models, data, and benchmarks will be open-sourced.
Insect-Scale Tailless Robot with Flapping Wings: A Simple Structure and Drive for Yaw Control
Insect-scale micro-aerial vehicles, especially, lightweight, flapping-wing robots, are becoming increasingly important for safe motion sensing in spatially constrained environments such as living spaces. However, yaw control using flapping wings is fundamentally more difficult than using rotating wings. In this study, an insect-scale, tailless robot with four paired tilted flapping wings (weighing 1.52 g) to enable yaw control was fabricated. It benefits from the simplicity of a directly driven wing actuator with no transmission and a lift control signal; however, it still has an offset in the lift force. Therefore, an adaptive controller was designed to alleviate the offset. Numerical experiments confirm that the proposed controller outperforms the linear quadratic integral controller. Finally, in a tethered and controlled demonstration flight, the yaw drift was suppressed by the wing-tilting arrangement and the proposed controller. The simple structure drive system demonstrates the potential for future controlled flights of battery-powered, tailless, flapping-wing robots weighing less than 10 grams.
comment: Submitted to Control Engineering Practice (Elsevier)
Real-Time Adaptive Motion Planning via Point Cloud-Guided, Energy-Based Diffusion and Potential Fields
Motivated by the problem of pursuit-evasion, we present a motion planning framework that combines energy-based diffusion models with artificial potential fields for robust real time trajectory generation in complex environments. Our approach processes obstacle information directly from point clouds, enabling efficient planning without requiring complete geometric representations. The framework employs classifier-free guidance training and integrates local potential fields during sampling to enhance obstacle avoidance. In dynamic scenarios, the system generates initial trajectories using the diffusion model and continuously refines them through potential field-based adaptation, demonstrating effective performance in pursuit-evasion scenarios with partial pursuer observability.
comment: Accepted to IEEE RA-L 2025
TACS-Graphs: Traversability-Aware Consistent Scene Graphs for Ground Robot Localization and Mapping IROS 2025
Scene graphs have emerged as a powerful tool for robots, providing a structured representation of spatial and semantic relationships for advanced task planning. Despite their potential, conventional 3D indoor scene graphs face critical limitations, particularly under- and over-segmentation of room layers in structurally complex environments. Under-segmentation misclassifies non-traversable areas as part of a room, often in open spaces, while over-segmentation fragments a single room into overlapping segments in complex environments. These issues stem from naive voxel-based map representations that rely solely on geometric proximity, disregarding the structural constraints of traversable spaces and resulting in inconsistent room layers within scene graphs. To the best of our knowledge, this work is the first to tackle segmentation inconsistency as a challenge and address it with Traversability-Aware Consistent Scene Graphs (TACS-Graphs), a novel framework that integrates ground robot traversability with room segmentation. By leveraging traversability as a key factor in defining room boundaries, the proposed method achieves a more semantically meaningful and topologically coherent segmentation, effectively mitigating the inaccuracies of voxel-based scene graph approaches in complex environments. Furthermore, the enhanced segmentation consistency improves loop closure detection efficiency in the proposed Consistent Scene Graph-leveraging Loop Closure Detection (CoSG-LCD) leading to higher pose estimation accuracy. Experimental results confirm that the proposed approach outperforms state-of-the-art methods in terms of scene graph consistency and pose graph optimization performance.
comment: Accepted by IROS 2025
EffiTune: Diagnosing and Mitigating Training Inefficiency for Parameter Tuner in Robot Navigation System IROS 2025
Robot navigation systems are critical for various real-world applications such as delivery services, hospital logistics, and warehouse management. Although classical navigation methods provide interpretability, they rely heavily on expert manual tuning, limiting their adaptability. Conversely, purely learning-based methods offer adaptability but often lead to instability and erratic robot behaviors. Recently introduced parameter tuners aim to balance these approaches by integrating data-driven adaptability into classical navigation frameworks. However, the parameter tuning process currently suffers from training inefficiencies and redundant sampling, with critical regions in environment often underrepresented in training data. In this paper, we propose EffiTune, a novel framework designed to diagnose and mitigate training inefficiency for parameter tuners in robot navigation systems. EffiTune first performs robot-behavior-guided diagnostics to pinpoint critical bottlenecks and underrepresented regions. It then employs a targeted up-sampling strategy to enrich the training dataset with critical samples, significantly reducing redundancy and enhancing training efficiency. Our comprehensive evaluation demonstrates that EffiTune achieves more than a 13.5% improvement in navigation performance, enhanced robustness in out-of-distribution scenarios, and a 4x improvement in training efficiency within the same computational budget.
comment: Accepted to IROS 2025
APEX: Empowering LLMs with Physics-Based Task Planning for Real-time Insight
Large Language Models (LLMs) demonstrate strong reasoning and task planning capabilities but remain fundamentally limited in physical interaction modeling. Existing approaches integrate perception via Vision-Language Models (VLMs) or adaptive decision-making through Reinforcement Learning (RL), but they fail to capture dynamic object interactions or require task-specific training, limiting their real-world applicability. We introduce APEX (Anticipatory Physics-Enhanced Execution), a framework that equips LLMs with physics-driven foresight for real-time task planning. APEX constructs structured graphs to identify and model the most relevant dynamic interactions in the environment, providing LLMs with explicit physical state updates. Simultaneously, APEX provides low-latency forward simulations of physically feasible actions, allowing LLMs to select optimal strategies based on predictive outcomes rather than static observations. We evaluate APEX on three benchmarks designed to assess perception, prediction, and decision-making: (1) Physics Reasoning Benchmark, testing causal inference and object motion prediction; (2) Tetris, evaluating whether physics-informed prediction enhances decision-making performance in long-horizon planning tasks; (3) Dynamic Obstacle Avoidance, assessing the immediate integration of perception and action feasibility analysis. APEX significantly outperforms standard LLMs and VLM-based models, demonstrating the necessity of explicit physics reasoning for bridging the gap between language-based intelligence and real-world task execution. The source code and experiment setup are publicly available at https://github.com/hwj20/APEX_EXP .
Never too Prim to Swim: An LLM-Enhanced RL-based Adaptive S-Surface Controller for AUVs under Extreme Sea Conditions IROS 2025
The adaptivity and maneuvering capabilities of Autonomous Underwater Vehicles (AUVs) have drawn significant attention in oceanic research, due to the unpredictable disturbances and strong coupling among the AUV's degrees of freedom. In this paper, we developed large language model (LLM)-enhanced reinforcement learning (RL)-based adaptive S-surface controller for AUVs. Specifically, LLMs are introduced for the joint optimization of controller parameters and reward functions in RL training. Using multi-modal and structured explicit task feedback, LLMs enable joint adjustments, balance multiple objectives, and enhance task-oriented performance and adaptability. In the proposed controller, the RL policy focuses on upper-level tasks, outputting task-oriented high-level commands that the S-surface controller then converts into control signals, ensuring cancellation of nonlinear effects and unpredictable external disturbances in extreme sea conditions. Under extreme sea conditions involving complex terrain, waves, and currents, the proposed controller demonstrates superior performance and adaptability in high-level tasks such as underwater target tracking and data collection, outperforming traditional PID and SMC controllers.
comment: Accepted by IEEE/RSJ IROS 2025
Inferring Foresightedness in Dynamic Noncooperative Games
Dynamic game theory is an increasingly popular tool for modeling multi-agent, e.g. human-robot, interactions. Game-theoretic models presume that each agent wishes to minimize a private cost function that depends on others' actions. These games typically evolve over a fixed time horizon, specifying how far into the future each agent plans. In practical settings, however, decision-makers may vary in foresightedness, or how much they care about their current cost in relation to their past and future costs. We conjecture that quantifying and estimating each agent's foresightedness from online data will enable safer and more efficient interactions with other agents. To this end, we frame this inference problem as an inverse dynamic game. We consider a specific objective function parametrization that smoothly interpolates myopic and farsighted planning. Games of this form are readily transformed into parametric mixed complementarity problems; we exploit the directional differentiability of solutions to these problems with respect to their hidden parameters to solve for agents' foresightedness. We conduct three experiments: one with synthetically generated delivery robot motion, one with real-world data involving people walking, biking, and driving vehicles, and one using high-fidelity simulators. The results of these experiments demonstrate that explicitly inferring agents' foresightedness enables game-theoretic models to make 33% more accurate models for agents' behavior.
MotionScript: Natural Language Descriptions for Expressive 3D Human Motions
We introduce MotionScript, a novel framework for generating highly detailed, natural language descriptions of 3D human motions. Unlike existing motion datasets that rely on broad action labels or generic captions, MotionScript provides fine-grained, structured descriptions that capture the full complexity of human movement including expressive actions (e.g., emotions, stylistic walking) and interactions beyond standard motion capture datasets. MotionScript serves as both a descriptive tool and a training resource for text-to-motion models, enabling the synthesis of highly realistic and diverse human motions from text. By augmenting motion datasets with MotionScript captions, we demonstrate significant improvements in out-of-distribution motion generation, allowing large language models (LLMs) to generate motions that extend beyond existing data. Additionally, MotionScript opens new applications in animation, virtual human simulation, and robotics, providing an interpretable bridge between intuitive descriptions and motion synthesis. To the best of our knowledge, this is the first attempt to systematically translate 3D motion into structured natural language without requiring training data.
comment: Project webpage: https://pjyazdian.github.io/MotionScript
Towards smart and adaptive agents for active sensing on edge devices
TinyML has made deploying deep learning models on low-power edge devices feasible, creating new opportunities for real-time perception in constrained environments. However, the adaptability of such deep learning methods remains limited to data drift adaptation, lacking broader capabilities that account for the environment's underlying dynamics and inherent uncertainty. Deep learning's scaling laws, which counterbalance this limitation by massively up-scaling data and model size, cannot be applied when deploying on the Edge, where deep learning limitations are further amplified as models are scaled down for deployment on resource-constrained devices. This paper presents an innovative agentic system capable of performing on-device perception and planning, enabling active sensing on the edge. By incorporating active inference into our solution, our approach extends beyond deep learning capabilities, allowing the system to plan in dynamic environments while operating in real-time with a compact memory footprint of as little as 300 MB. We showcase our proposed system by creating and deploying a saccade agent connected to an IoT camera with pan and tilt capabilities on an NVIDIA Jetson embedded device. The saccade agent controls the camera's field of view following optimal policies derived from the active inference principles, simulating human-like saccadic motion for surveillance and robotics applications.
COMPASS: Cross-embodiment Mobility Policy via Residual RL and Skill Synthesis
As robots are increasingly deployed in diverse application domains, enabling robust mobility across different embodiments has become a critical challenge. Classical mobility stacks, though effective on specific platforms, require extensive per-robot tuning and do not scale easily to new embodiments. Learning-based approaches, such as imitation learning (IL), offer alternatives, but face significant limitations on the need for high-quality demonstrations for each embodiment. To address these challenges, we introduce COMPASS, a unified framework that enables scalable cross-embodiment mobility using expert demonstrations from only a single embodiment. We first pre-train a mobility policy on a single robot using IL, combining a world model with a policy model. We then apply residual reinforcement learning (RL) to efficiently adapt this policy to diverse embodiments through corrective refinements. Finally, we distill specialist policies into a single generalist policy conditioned on an embodiment embedding vector. This design significantly reduces the burden of collecting data while enabling robust generalization across a wide range of robot designs. Our experiments demonstrate that COMPASS scales effectively across diverse robot platforms while maintaining adaptability to various environment configurations, achieving a generalist policy with a success rate approximately 5X higher than the pre-trained IL policy on unseen embodiments, and further demonstrates zero-shot sim-to-real transfer.
TAS: A Transit-Aware Strategy for Embodied Navigation with Non-Stationary Targets
Embodied navigation methods commonly operate in static environments with stationary targets. In this work, we present a new algorithm for navigation in dynamic scenarios with non-stationary targets. Our novel Transit-Aware Strategy (TAS) enriches embodied navigation policies with object path information. TAS improves performance in non-stationary environments by rewarding agents for synchronizing their routes with target routes. To evaluate TAS, we further introduce Dynamic Object Maps (DOMs), a dynamic variant of node-attributed topological graphs with structured object transitions. DOMs are inspired by human habits to simulate realistic object routes on a graph. Our experiments show that on average, TAS improves agent Success Rate (SR) by 21.1 in non-stationary environments, while also generalizing better from static environments by 44.5% when measured by Relative Change in Success (RCS). We qualitatively investigate TAS-agent performance on DOMs and draw various inferences to help better model generalist navigation policies. To the best of our knowledge, ours is the first work that quantifies the adaptability of embodied navigation methods in non-stationary environments. Code and data for our benchmark will be made publicly available.
comment: 15 pages
Multiagent Systems
SADCHER: Scheduling using Attention-based Dynamic Coalitions of Heterogeneous Robots in Real-Time
We present Sadcher, a real-time task assignment framework for heterogeneous multi-robot teams that incorporates dynamic coalition formation and task precedence constraints. Sadcher is trained through Imitation Learning and combines graph attention and transformers to predict assignment rewards between robots and tasks. Based on the predicted rewards, a relaxed bipartite matching step generates high-quality schedules with feasibility guarantees. We explicitly model robot and task positions, task durations, and robots' remaining processing times, enabling advanced temporal and spatial reasoning and generalization to environments with different spatiotemporal distributions compared to training. Trained on optimally solved small-scale instances, our method can scale to larger task sets and team sizes. Sadcher outperforms other learning-based and heuristic baselines on randomized, unseen problems for small and medium-sized teams with computation times suitable for real-time operation. We also explore sampling-based variants and evaluate scalability across robot and task counts. In addition, we release our dataset of 250,000 optimal schedules: https://autonomousrobots.nl/paper_websites/sadcher_MRTA/
comment: 7 pages, 5 figures. 2025 IEEE Int. Symposium on Multi-Robot and Multi-Agent Systems (MRS 2025). Website and Code: https://autonomousrobots.nl/paper_websites/sadcher_MRTA/
Multi Agent Switching Mode Controller for Sound Source localization
Source seeking is an important topic in robotic research, especially considering sound-based sensors since they allow the agents to locate a target even in critical conditions where it is not possible to establish a direct line of sight. In this work, we design a multi- agent switching mode control strategy for acoustic-based target localization. Two scenarios are considered: single source localization, in which the agents are driven maintaining a rigid formation towards the target, and multi-source scenario, in which each agent searches for the targets independently from the others.
When Planners Meet Reality: How Learned, Reactive Traffic Agents Shift nuPlan Benchmarks
Planner evaluation in closed-loop simulation often uses rule-based traffic agents, whose simplistic and passive behavior can hide planner deficiencies and bias rankings. Widely used IDM agents simply follow a lead vehicle and cannot react to vehicles in adjacent lanes, hindering tests of complex interaction capabilities. We address this issue by integrating the state-of-the-art learned traffic agent model SMART into nuPlan. Thus, we are the first to evaluate planners under more realistic conditions and quantify how conclusions shift when narrowing the sim-to-real gap. Our analysis covers 14 recent planners and established baselines and shows that IDM-based simulation overestimates planning performance: nearly all scores deteriorate. In contrast, many planners interact better than previously assumed and even improve in multi-lane, interaction-heavy scenarios like lane changes or turns. Methods trained in closed-loop demonstrate the best and most stable driving performance. However, when reaching their limits in augmented edge-case scenarios, all learned planners degrade abruptly, whereas rule-based planners maintain reasonable basic behavior. Based on our results, we suggest SMART-reactive simulation as a new standard closed-loop benchmark in nuPlan and release the SMART agents as a drop-in alternative to IDM at https://github.com/shgd95/InteractiveClosedLoop.
The Role of Social Learning and Collective Norm Formation in Fostering Cooperation in LLM Multi-Agent Systems
A growing body of multi-agent studies with Large Language Models (LLMs) explores how norms and cooperation emerge in mixed-motive scenarios, where pursuing individual gain can undermine the collective good. While prior work has explored these dynamics in both richly contextualized simulations and simplified game-theoretic environments, most LLM systems featuring common-pool resource (CPR) games provide agents with explicit reward functions directly tied to their actions. In contrast, human cooperation often emerges without full visibility into payoffs and population, relying instead on heuristics, communication, and punishment. We introduce a CPR simulation framework that removes explicit reward signals and embeds cultural-evolutionary mechanisms: social learning (adopting strategies and beliefs from successful peers) and norm-based punishment, grounded in Ostrom's principles of resource governance. Agents also individually learn from the consequences of harvesting, monitoring, and punishing via environmental feedback, enabling norms to emerge endogenously. We establish the validity of our simulation by reproducing key findings from existing studies on human behavior. Building on this, we examine norm evolution across a $2\times2$ grid of environmental and social initialisations (resource-rich vs. resource-scarce; altruistic vs. selfish) and benchmark how agentic societies comprised of different LLMs perform under these conditions. Our results reveal systematic model differences in sustaining cooperation and norm formation, positioning the framework as a rigorous testbed for studying emergent norms in mixed-motive LLM societies. Such analysis can inform the design of AI systems deployed in social and organizational contexts, where alignment with cooperative norms is critical for stability, fairness, and effective governance of AI-mediated environments.
Disaster Management in the Era of Agentic AI Systems: A Vision for Collective Human-Machine Intelligence for Augmented Resilience
The escalating frequency and severity of disasters routinely overwhelm traditional response capabilities, exposing critical vulnerability in disaster management. Current practices are hindered by fragmented data streams, siloed technologies, resource constraints, and the erosion of institutional memory, which collectively impede timely and effective decision making. This study introduces Disaster Copilot, a vision for a multi-agent artificial intelligence system designed to overcome these systemic challenges by unifying specialized AI tools within a collaborative framework. The proposed architecture utilizes a central orchestrator to coordinate diverse sub-agents, each specializing in critical domains such as predictive risk analytics, situational awareness, and impact assessment. By integrating multi-modal data, the system delivers a holistic, real-time operational picture and serve as the essential AI backbone required to advance Disaster Digital Twins from passive models to active, intelligent environments. Furthermore, it ensures functionality in resource-limited environments through on-device orchestration and incorporates mechanisms to capture institutional knowledge, mitigating the impact of staff turnover. We detail the system architecture and propose a three-phased roadmap emphasizing the parallel growth of technology, organizational capacity, and human-AI teaming. Disaster Copilot offers a transformative vision, fostering collective human-machine intelligence to build more adaptive, data-driven and resilient communities.
ABMax: A JAX-based Agent-based Modeling Framework
Agent-based modeling (ABM) is a principal approach for studying complex systems. By decomposing a system into simpler, interacting agents, agent-based modeling (ABM) allows researchers to observe the emergence of complex phenomena. High-performance array computing libraries like JAX can help scale such computational models to a large number of agents by using automatic vectorization and just-in-time (JIT) compilation. One of the caveats of using JAX to achieve such scaling is that the shapes of arrays used in the computational model should remain immutable throughout the simulation. In the context of agent-based modeling (ABM), this can pose constraints on certain agent manipulation operations that require flexible data structures. A subset of which is represented by the ability to update a dynamically selected number of agents by applying distinct changes to them during a simulation. To this effect, we introduce ABMax, an ABM framework based on JAX that implements multiple just-in-time (JIT) compilable algorithms to provide this functionality. On the canonical predation model benchmark, ABMax achieves runtime performance comparable to state-of-the-art implementations. Further, we show that this functionality can also be vectorized, making it possible to run many similar agent-based models in parallel. We also present two examples in the form of a traffic-flow model and a financial market model to show the use case of ABMax
comment: 8 pages, 7 figures, 4 tables, 2 algorithms
Measuring and Mitigating Identity Bias in Multi-Agent Debate via Anonymization
Multi-agent debate (MAD) aims to improve large language model (LLM) reasoning by letting multiple agents exchange answers and then aggregate their opinions. Yet recent studies reveal that agents are not neutral: they are prone to identity-driven sycophancy and self-bias, uncritically adopting a peer's view or stubbornly adhering to their own prior output, undermining the reliability of debate. In this work, we present the first principled framework that joins sycophancy and self-bias to mitigate and quantify identity bias in MAD. First, we formalize the debate dynamics as an identity-weighted Bayesian update process. Second, we propose response anonymization: by removing identity markers from prompts, agents cannot distinguish "self" from "peer", which forces equal weights on agent identity, thereby reducing bias. Third, we define the Identity Bias Coefficient (IBC), a principled metric that measures how often an agent follows a peer versus itself. Empirical studies across multiple models, datasets and debate rounds confirm that identity bias is widespread, with sycophancy far more common than self-bias. Our findings highlight the need to "mask" identity to ensure that MAD systems reason based on content rather than source identity. Code is released in https://github.com/deeplearning-wisc/MAD-identity-bias.
RareAgent: Self-Evolving Reasoning for Drug Repurposing in Rare Diseases
Computational drug repurposing for rare diseases is especially challenging when no prior associations exist between drugs and target diseases. Therefore, knowledge graph completion and message-passing GNNs have little reliable signal to learn and propagate, resulting in poor performance. We present RareAgent, a self-evolving multi-agent system that reframes this task from passive pattern recognition to active evidence-seeking reasoning. RareAgent organizes task-specific adversarial debates in which agents dynamically construct evidence graphs from diverse perspectives to support, refute, or entail hypotheses. The reasoning strategies are analyzed post hoc in a self-evolutionary loop, producing textual feedback that refines agent policies, while successful reasoning paths are distilled into transferable heuristics to accelerate future investigations. Comprehensive evaluations reveal that RareAgent improves the indication AUPRC by 18.1% over reasoning baselines and provides a transparent reasoning chain consistent with clinical evidence.
Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics
We present Ax-Prover, a multi-agent system for automated theorem proving in Lean that can solve problems across diverse scientific domains and operate either autonomously or collaboratively with human experts. To achieve this, Ax-Prover approaches scientific problem solving through formal proof generation, a process that demands both creative reasoning and strict syntactic rigor. Ax-Prover meets this challenge by equipping Large Language Models (LLMs), which provide knowledge and reasoning, with Lean tools via the Model Context Protocol (MCP), which ensure formal correctness. To evaluate its performance as an autonomous prover, we benchmark our approach against frontier LLMs and specialized prover models on two public math benchmarks and on two Lean benchmarks we introduce in the fields of abstract algebra and quantum theory. On public datasets, Ax-Prover is competitive with state-of-the-art provers, while it largely outperforms them on the new benchmarks. This shows that, unlike specialized systems that struggle to generalize, our tool-based agentic theorem prover approach offers a generalizable methodology for formal verification across diverse scientific domains. Furthermore, we demonstrate Ax-Prover's assistant capabilities in a practical use case, showing how it enabled an expert mathematician to formalize the proof of a complex cryptography theorem.
Internet of Agents: Fundamentals, Applications, and Challenges
With the rapid proliferation of large language models and vision-language models, AI agents have evolved from isolated, task-specific systems into autonomous, interactive entities capable of perceiving, reasoning, and acting without human intervention. As these agents proliferate across virtual and physical environments, from virtual assistants to embodied robots, the need for a unified, agent-centric infrastructure becomes paramount. In this survey, we introduce the Internet of Agents (IoA) as a foundational framework that enables seamless interconnection, dynamic discovery, and collaborative orchestration among heterogeneous agents at scale. We begin by presenting a general IoA architecture, highlighting its hierarchical organization, distinguishing features relative to the traditional Internet, and emerging applications. Next, we analyze the key operational enablers of IoA, including capability notification and discovery, adaptive communication protocols, dynamic task matching, consensus and conflict-resolution mechanisms, and incentive models. Finally, we identify open research directions toward building resilient and trustworthy IoA ecosystems.
comment: 25 pages,10 figures, 10 tables. Accepted by IEEE TCCN in Oct. 2025
RADAR: A Risk-Aware Dynamic Multi-Agent Framework for LLM Safety Evaluation via Role-Specialized Collaboration
Existing safety evaluation methods for large language models (LLMs) suffer from inherent limitations, including evaluator bias and detection failures arising from model homogeneity, which collectively undermine the robustness of risk evaluation processes. This paper seeks to re-examine the risk evaluation paradigm by introducing a theoretical framework that reconstructs the underlying risk concept space. Specifically, we decompose the latent risk concept space into three mutually exclusive subspaces: the explicit risk subspace (encompassing direct violations of safety guidelines), the implicit risk subspace (capturing potential malicious content that requires contextual reasoning for identification), and the non-risk subspace. Furthermore, we propose RADAR, a multi-agent collaborative evaluation framework that leverages multi-round debate mechanisms through four specialized complementary roles and employs dynamic update mechanisms to achieve self-evolution of risk concept distributions. This approach enables comprehensive coverage of both explicit and implicit risks while mitigating evaluator bias. To validate the effectiveness of our framework, we construct an evaluation dataset comprising 800 challenging cases. Extensive experiments on our challenging testset and public benchmarks demonstrate that RADAR significantly outperforms baseline evaluation methods across multiple dimensions, including accuracy, stability, and self-evaluation risk sensitivity. Notably, RADAR achieves a 28.87% improvement in risk identification accuracy compared to the strongest baseline evaluation method.
Inferring Foresightedness in Dynamic Noncooperative Games
Dynamic game theory is an increasingly popular tool for modeling multi-agent, e.g. human-robot, interactions. Game-theoretic models presume that each agent wishes to minimize a private cost function that depends on others' actions. These games typically evolve over a fixed time horizon, specifying how far into the future each agent plans. In practical settings, however, decision-makers may vary in foresightedness, or how much they care about their current cost in relation to their past and future costs. We conjecture that quantifying and estimating each agent's foresightedness from online data will enable safer and more efficient interactions with other agents. To this end, we frame this inference problem as an inverse dynamic game. We consider a specific objective function parametrization that smoothly interpolates myopic and farsighted planning. Games of this form are readily transformed into parametric mixed complementarity problems; we exploit the directional differentiability of solutions to these problems with respect to their hidden parameters to solve for agents' foresightedness. We conduct three experiments: one with synthetically generated delivery robot motion, one with real-world data involving people walking, biking, and driving vehicles, and one using high-fidelity simulators. The results of these experiments demonstrate that explicitly inferring agents' foresightedness enables game-theoretic models to make 33% more accurate models for agents' behavior.
Where Did It All Go Wrong? A Hierarchical Look into Multi-Agent Error Attribution
Error attribution in Large Language Model (LLM) multi-agent systems presents a significant challenge in debugging and improving collaborative AI systems. Current approaches to pinpointing agent and step level failures in interaction traces - whether using all-at-once evaluation, step-by-step analysis, or binary search - fall short when analyzing complex patterns, struggling with both accuracy and consistency. We present ECHO (Error attribution through Contextual Hierarchy and Objective consensus analysis), a novel algorithm that combines hierarchical context representation, objective analysis-based evaluation, and consensus voting to improve error attribution accuracy. Our approach leverages a positional-based leveling of contextual understanding while maintaining objective evaluation criteria, ultimately reaching conclusions through a consensus mechanism. Experimental results demonstrate that ECHO outperforms existing methods across various multi-agent interaction scenarios, showing particular strength in cases involving subtle reasoning errors and complex interdependencies. Our findings suggest that leveraging these concepts of structured, hierarchical context representation combined with consensus-based objective decision-making, provides a more robust framework for error attribution in multi-agent systems.
Cooperative Bargaining Games Without Utilities: Mediated Solutions from Direction Oracles
Cooperative bargaining games are widely used to model resource allocation and conflict resolution. Traditional solutions assume the mediator can access agents utility function values and gradients. However, there is an increasing number of settings, such as human AI interactions, where utility values may be inaccessible or incomparable due to unknown, nonaffine transformations. To model such settings, we consider that the mediator has access only to agents most preferred directions, i.e., normalized utility gradients in the decision space. To this end, we propose a cooperative bargaining algorithm where a mediator has access to only the direction oracle of each agent. We prove that unlike popular approaches such as the Nash and Kalai Smorodinsky bargaining solutions, our approach is invariant to monotonic nonaffine transformations, and that under strong convexity and smoothness assumptions, this approach enjoys global asymptotic convergence to Pareto stationary solutions. Moreover, we show that the bargaining solutions found by our algorithm also satisfy the axioms of symmetry and (under slightly stronger conditions) independence of irrelevant alternatives, which are popular in the literature. Finally, we conduct experiments in two domains, multi agent formation assignment and mediated stock portfolio allocation, which validate these theoretic results. All code for our experiments can be found at https://github.com/suryakmurthy/dibs_bargaining.
Systems and Control (CS)
RDD: Retrieval-Based Demonstration Decomposer for Planner Alignment in Long-Horizon Tasks NeurIPS 2025
To tackle long-horizon tasks, recent hierarchical vision-language-action (VLAs) frameworks employ vision-language model (VLM)-based planners to decompose complex manipulation tasks into simpler sub-tasks that low-level visuomotor policies can easily handle. Typically, the VLM planner is finetuned to learn to decompose a target task. This finetuning requires target task demonstrations segmented into sub-tasks by either human annotation or heuristic rules. However, the heuristic subtasks can deviate significantly from the training data of the visuomotor policy, which degrades task performance. To address these issues, we propose a Retrieval-based Demonstration Decomposer (RDD) that automatically decomposes demonstrations into sub-tasks by aligning the visual features of the decomposed sub-task intervals with those from the training data of the low-level visuomotor policies. Our method outperforms the state-of-the-art sub-task decomposer on both simulation and real-world tasks, demonstrating robustness across diverse settings. Code and more results are available at rdd-neurips.github.io.
comment: 39th Conference on Neural Information Processing Systems (NeurIPS 2025); Project Website: rdd-neurips.github.io
CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier Functions
Reinforcement learning (RL), while powerful and expressive, can often prioritize performance at the expense of safety. Yet safety violations can lead to catastrophic outcomes in real-world deployments. Control Barrier Functions (CBFs) offer a principled method to enforce dynamic safety -- traditionally deployed \emph{online} via safety filters. While the result is safe behavior, the fact that the RL policy does not have knowledge of the CBF can lead to conservative behaviors. This paper proposes CBF-RL, a framework for generating safe behaviors with RL by enforcing CBFs \emph{in training}. CBF-RL has two key attributes: (1) minimally modifying a nominal RL policy to encode safety constraints via a CBF term, (2) and safety filtering of the policy rollouts in training. Theoretically, we prove that continuous-time safety filters can be deployed via closed-form expressions on discrete-time roll-outs. Practically, we demonstrate that CBF-RL internalizes the safety constraints in the learned policy -- both enforcing safer actions and biasing towards safer rewards -- enabling safe deployment without the need for an online safety filter. We validate our framework through ablation studies on navigation tasks and on the Unitree G1 humanoid robot, where CBF-RL enables safer exploration, faster convergence, and robust performance under uncertainty, enabling the humanoid robot to avoid obstacles and climb stairs safely in real-world settings without a runtime safety filter.
comment: 8 pages
Architecture Is All You Need: Diversity-Enabled Sweet Spots for Robust Humanoid Locomotion
Robust humanoid locomotion in unstructured environments requires architectures that balance fast low-level stabilization with slower perceptual decision-making. We show that a simple layered control architecture (LCA), a proprioceptive stabilizer running at high rate, coupled with a compact low-rate perceptual policy, enables substantially more robust performance than monolithic end-to-end designs, even when using minimal perception encoders. Through a two-stage training curriculum (blind stabilizer pretraining followed by perceptual fine-tuning), we demonstrate that layered policies consistently outperform one-stage alternatives in both simulation and hardware. On a Unitree G1 humanoid, our approach succeeds across stair and ledge tasks where one-stage perceptual policies fail. These results highlight that architectural separation of timescales, rather than network scale or complexity, is the key enabler for robust perception-conditioned locomotion.
comment: 8 pages
Further Results on Safety-Critical Stabilization of Force-Controlled Nonholonomic Mobile Robots
In this paper, we address the stabilization problem for force-controlled nonholonomic mobile robots under safety-critical constraints. We propose a continuous, time-invariant control law based on the gamma m-quadratic programming (gamma m-QP) framework, which unifies control Lyapunov functions (CLFs) and control barrier functions (CBFs) to enforce both stability and safety in the closed-loop system. For the first time, we construct a global, time-invariant, strict Lyapunov function for the closed-loop nonholonomic mobile robot system with a nominal stabilization controller in polar coordinates; this strict Lyapunov function then serves as the CLF in the QP design. Next, by exploiting the inherent cascaded structure of the vehicle dynamics, we develop a CBF for the mobile robot via an integrator backstepping procedure. Our main results guarantee both asymptotic stability and safety for the closed-loop system. Both the simulation and experimental results are presented to illustrate the effectiveness and performance of our approach.
Through-the-Earth Magnetic Induction Communication and Networking: A Comprehensive Survey
Magnetic induction (MI) communication (MIC) has emerged as a promising candidate for underground communication networks due to its excellent penetration capabilities. Integration with Space-Air-Ground-Underground (SAGUI) networks in next-generation mobile communication systems requires a well-defined network architecture. A recent discovery in MIC research, MI fast fading, remains in its early stages and presents unique challenges. This paper provides a comprehensive survey on through-the-earth (TTE) MIC, covering MI applications, channel modeling, point-to-point MIC design, relay techniques, network frameworks, and emerging technologies. We compare various MIC applications to highlight TTE-specific challenges and review the principles of channel modeling, addressing both MI slow fading and MI fast fading, along with its potential impact on existing MIC theories. We conduct a fine-grained decomposition of MI channel power gain into four distinct physical parameters, and propose a novel geometric model to analyze MI fast fading. We also summarize MI relay techniques, examine crosstalk effects in relay and high-density networks, and explore key research tasks within the OSI framework for a holistic MI network protocol in SAGUI. To bridge the gaps identified, we propose a MIC framework that supports TCP/IP and Linux, enabling full implementation of existing and emerging MIC solutions. This framework empowers researchers to leverage Linux resources and deep learning platforms for accelerated development of MIC in SAGUI networks. Remaining research challenges, open issues, and promising novel techniques are further identified to advance MIC research.
comment: This work has been accepted by the IEEE Communications Surveys & Tutorials (COMST) for publication.The final published version will be available on IEEE Xplore
Dynamic-Key-Aware Co-Simulation Framework for Next Generation of SCADA Systems Encrypted by Quantum-Key-Distribution Techniques
To address growing cybersecurity challenges in modern power dispatch systems, this paper proposes a multi-layer modeling and optimization framework for SCADA systems enhanced with quantum key distribution (QKD). While most existing applications of QKD in the power sector focus on building secure point-to-point communication tunnels, they rarely consider the system-level coupling between key dynamics and control scheduling. In contrast, our approach integrates quantum key generation, consumption, inventory prediction, and control latency into a unified model, enabling key-aware reconfiguration of SCADA control chains based on task security demands and real-time resource constraints. To resolve conflicts in key resource allocation between transmission system operators (TSOs) and distribution system operators (DSOs), we formulate a bi-level Stackelberg game and transform it into a mathematical program with complementarity constraints (MPCC). We further develop an efficient Level Decomposition-Complementarity Pruning (LD-CP) algorithm to solve the problem. To support reproducible evaluation, we build an end-to-end co-simulation platform that integrates physical-layer disruptions via OpenQKD-Sim, Q3P/IEC-104 protocol stack binding, and real-time control-chain monitoring through Grafana. Experimental results on the IEEE 39- and 118-bus systems show that our method increases task success rate by 25%, reduces peak frequency deviation by 70%, and improves key utilization to 83%. This work lays the foundation for future quantum-secure control systems in power grid operations.
Improved Voltage Regulation with Optimal Design of Decentralized Volt-VAr Control
Integration of distributed energy resources has created a need for autonomous, dynamic voltage regulation. Decentralized Volt-VAr Control (VVC) of grid-connected inverters presents a unique opportunity for voltage management but, if designed poorly, can lead to unstable behavior when in feedback with the grid. We model the grid-VVC closed-loop dynamics with a linearized power flow approach, leveraging historical data, which shows improvement over the commonly used LinDistFlow model. This model is used to design VVC slopes by minimizing steady-state voltage deviation from the nominal value, subject to a non-convex spectral radius stability constraint, which has not been previously implemented within this context. We compare this constraint to existing convex restrictions and demonstrate, through simulations on a realistic feeder, that using the spectral radius results in more effective voltage regulation.
A Human-Vector Susceptible--Infected--Susceptible Model for Analyzing and Controlling the Spread of Vector-Borne Diseases
We propose an epidemic model for the spread of vector-borne diseases. The model, which is built extending the classical susceptible-infected-susceptible model, accounts for two populations -- humans and vectors -- and for cross-contagion between the two species, whereby humans become infected upon interaction with carrier vectors, and vectors become carriers after interaction with infected humans. We formulate the model as a system of ordinary differential equations and leverage monotone systems theory to rigorously characterize the epidemic dynamics. Specifically, we characterize the global asymptotic behavior of the disease, determining conditions for quick eradication of the disease (i.e., for which all trajectories converge to a disease-free equilibrium), or convergence to a (unique) endemic equilibrium. Then, we incorporate two control actions: namely, vector control and incentives to adopt protection measures. Using the derived mathematical tools, we assess the impact of these two control actions and determine the optimal control policy.
comment: To appear in the Proceedings of the 2025 European Control Conference (ECC)
High-Resolution PTDF-Based Planning of Storage and Transmission Under High Renewables
Transmission Expansion Planning (TEP) optimizes power grid upgrades and investments to ensure reliable, efficient, and cost-effective electricity delivery while addressing grid constraints. To support growing demand and renewable energy integration, energy storage is emerging as a pivotal asset that provides temporal flexibility and alleviates congestion. This paper develops a multiperiod, two-stage PTDF formulation that co-optimizes transmission upgrades and storage siting/sizing. To ensure scalability, a trust-region, multicut Benders scheme warm-started from per-representative-day optima is proposed. Applied to a 2,000-bus synthetic Texas system under high-renewable projections, the method attains final optimality gaps below 1% and yields a plan with storage at about 180 nodes (32% of peak renewable capacity). These results demonstrate that the proposed PTDF-based methodology efficiently handles large distributed storage fleets, demonstrating scalability at high spatial resolution
A Deep State-Space Model Compression Method using Upper Bound on Output Error
We study deep state-space models (Deep SSMs) that contain linear-quadratic-output (LQO) systems as internal blocks and present a compression method with a provable output error guarantee. We first derive an upper bound on the output error between two Deep SSMs and show that the bound can be expressed via the $h^2$-error norms between the layerwise LQO systems, thereby providing a theoretical justification for existing model order reduction (MOR)-based compression. Building on this bound, we formulate an optimization problem in terms of the $h^2$-error norm and develop a gradient-based MOR method. On the IMDb task from the Long Range Arena benchmark, we demonstrate that our compression method achieves strong performance. Moreover, unlike prior approaches, we reduce roughly 80% of trainable parameters without retraining, with only a 4-5% performance drop.
Stability Criteria and Motor Performance in Delayed Haptic Dyadic Interactions Mediated by Robots
This paper establishes analytical stability criteria for robot-mediated human-human (dyadic) interaction systems, focusing on haptic communication under network-induced time delays. Through frequency-domain analysis supported by numerical simulations, we identify both delay-independent and delay-dependent stability criteria. The delay-independent criterion guarantees stability irrespective of the delay, whereas the delay-dependent criterion is characterised by a maximum tolerable delay before instability occurs. The criteria demonstrate dependence on controller and robot dynamic parameters, where increasing stiffness reduces the maximum tolerable delay in a non-linear manner, thereby heightening system vulnerability. The proposed criteria can be generalised to a wide range of robot-mediated interactions and serve as design guidelines for stable remote dyadic systems. Experiments with robots performing human-like movements further illustrate the correlation between stability and motor performance. The findings of this paper suggest the prerequisites for effective delay-compensation strategies.
RoboANKLE: Design, Development, and Functional Evaluation of a Robotic Ankle with a Motorized Compliant Unit
This study presents a powered transtibial prosthesis with complete push-off assistance, RoboANKLE. The design aims to fulfill specific requirements, such as a sufficient range of motion (RoM) while providing the necessary torque for achieving natural ankle motion in daily activities. Addressing the challenges faced in designing active transtibial prostheses, such as maintaining energetic autonomy and minimizing weight, is vital for the study. With this aim, we try to imitate the human ankle by providing extensive push-off assistance to achieve a natural-like torque profile. Thus, Energy Store and Extended Release mechanism (ESER) is employed with a novel Extra Energy Storage (EES) mechanism. Kinematic and kinetic analyses are carried out to determine the design parameters and assess the design performance. Subsequently, a Computer-Aided Design (CAD) model is built and used in comprehensive dynamic and structural analyses. These analyses are used for the design performance evaluation and determine the forces and torques applied to the prosthesis, which aids in optimizing the design for minimal weight via structural analysis and topology optimization. The design of the prototype is then finalized and manufactured for experimental evaluation to validate the design and functionality. The prototype is realized with a mass of 1.92 kg and dimensions of 261x107x420 mm. The Functional evaluations of the RoboANKLE revealed that it is capable of achieving the natural maximum dorsi-flexion angle with 95% accuracy. Also, Thanks to the implemented mechanisms, the results show that RoboANKLE can generate 57% higher than the required torque for natural walking. The result of the power generation capacity of the RoboANKLE is 10% more than the natural power during the gait cycle.
Prescribed Performance Control of Deformable Object Manipulation in Spatial Latent Space
Manipulating three-dimensional (3D) deformable objects presents significant challenges for robotic systems due to their infinite-dimensional state space and complex deformable dynamics. This paper proposes a novel model-free approach for shape control with constraints imposed on key points. Unlike existing methods that rely on feature dimensionality reduction, the proposed controller leverages the coordinates of key points as the feature vector, which are extracted from the deformable object's point cloud using deep learning methods. This approach not only reduces the dimensionality of the feature space but also retains the spatial information of the object. By extracting key points, the manipulation of deformable objects is simplified into a visual servoing problem, where the shape dynamics are described using a deformation Jacobian matrix. To enhance control accuracy, a prescribed performance control method is developed by integrating barrier Lyapunov functions (BLF) to enforce constraints on the key points. The stability of the closed-loop system is rigorously analyzed and verified using the Lyapunov method. Experimental results further demonstrate the effectiveness and robustness of the proposed method.
A Comparative Study of Oscillatory Perturbations in Car-Following Models
As connected and autonomous vehicles become more widespread, platooning has emerged as a key strategy to improve road capacity, reduce fuel consumption, and enhance traffic flow. However, the benefits of platoons strongly depend on their ability to maintain stability. Instability can lead to unsafe spacing and increased energy usage. In this work, we study platoon instability and analyze the root cause of its occurrence, as well as its impacts on the following vehicle. To achieve this, we propose a comparative study between different car-following models such as the Intelligent Driver Model (IDM), the Optimal Velocity Model (OVM), the General Motors Model (GMM), and the Cooperative Adaptive Cruise Control (CACC). In our approach, we introduce a disruption in the model by varying the velocity of the leading vehicle to visualize the behavior of the following vehicles. To evaluate the dynamic response of each model, we introduce controlled perturbations in the velocity of the leading vehicle, specifically, sinusoidal oscillations and discrete velocity changes. The resulting vehicle trajectories and variations in inter-vehicle spacing are analyzed to assess the robustness of each model to disturbance propagation. The findings offer insight into model sensitivity, stability characteristics, and implications for designing resilient platooning control strategies.
Two Roads to Koopman Operator Theory for Control: Infinite Input Sequences and Operator Families
The Koopman operator, originally defined for dynamical systems without input, has inspired many applications in control. Yet, the theoretical foundations underpinning this progress in control remain underdeveloped. This paper investigates the theoretical structure and connections between two extensions of Koopman theory to control: (i) Koopman operator via infinite input sequences and (ii) the Koopman control family. Although these frameworks encode system information in fundamentally different ways, we show that under certain conditions on the function spaces they operate on, they are equivalent. The equivalence is both in terms of the actions of the Koopman-based formulations in each framework as well as the function values on the system trajectories. Our analysis provides constructive tools to translate between the frameworks, offering a unified perspective for Koopman methods in control.
Tail-Optimized Caching for LLM Inference
Prompt caching is critical for reducing latency and cost in LLM inference: OpenAI and Anthropic report up to 50-90% cost savings through prompt reuse. Despite its widespread success, little is known about what constitutes an optimal prompt caching policy, particularly when optimizing tail latency, a metric of central importance to practitioners. The widely used Least Recently Used (LRU) policy can perform arbitrarily poor on this metric, as it is oblivious to the heterogeneity of conversation lengths. To address this gap, we propose Tail-Optimized LRU, a simple two-line modification that reallocates KV cache capacity to prioritize high-latency conversations by evicting cache entries that are unlikely to affect future turns. Though the implementation is simple, we prove its optimality under a natural stochastic model of conversation dynamics, providing the first theoretical justification for LRU in this setting, a result that may be of independent interest to the caching community. Experimentally, on real conversation data WildChat, Tail-Optimized LRU achieves up to 27.5% reduction in P90 tail Time to First Token latency and 23.9% in P95 tail latency compared to LRU, along with up to 38.9% decrease in SLO violations of 200ms. We believe this provides a practical and theoretically grounded option for practitioners seeking to optimize tail latency in real-world LLM deployments.
Sparsity-exploiting Gaussian Process for Robust Transient Learning of Power System Dynamics
Advances in leveraging Gaussian processes (GP) have enabled learning and inferring dynamic grid behavior from scarce PMU measurements. However, real measurements can be corrupted by various random and targeted threats, leading to inaccurate and meaningless results. This paper develops robust transient learning to overcome this challenge by exploiting the sparse corruption patterns in the data flow. Specifically, we integrate sparse optimization with method of moments (MoM) to make learning robust to a sparse distribution of data corruptions; then, we optimize sparse weights to identify corrupted meter locations. To improve inference speed on large-scale systems, we further adopt K-medoid clustering of locations to develop dimension reduction (DR) and aggregate representation (AR) heuristics. Experimental results demonstrate robustness against random large errors, targeted false data injections, and local PMU clock drifts. On a 1354-bus system, inference turns out to be 18x faster using DR and 400x faster when further combined with AR heuristics.
comment: This manuscript has been submitted to PESGM2026
Exploring a New Design Paradigm for Omnidirectional MAVs for Minimal Actuation and Internal Force Elimination: Theoretical Framework and Control
This paper presents a novel concept for achieving omnidirectionality in a multirotor aerial vehicle (MAV) that uses only 6 inputs and ensures no internal forces at the equilibria. The concept integrates a single actively-tilting propeller along with 3 pendulum-like links, each carrying a propeller, connected by passive universal joints to the main body. We show that this design ensures omnidirectionality while minimizing the internal forces and without resorting to overactuation (i.e., more than 6 inputs). A detailed dynamic model of the multi-link MAV is first developed. Afterwards, the analysis identifies the equilibrium configurations and illustrates that a forced equilibrium exists for every pose of the MAV's main platform. In order to render this equilibrium asymptotically stable for the closed-loop system, a geometric nonlinear controller is constructed using dynamic feedback linearization and backstepping techniques with the main platform configuration error being the left-trivialized error on SE(3). The stability of the closed-loop system is then investigated by employing standard Lyapunov arguments on the zero dynamics. We conclude by providing numerical simulations validating the proposed approach. They demonstrate the MAV capability to perform decoupled attitude and translational motions under non-zero initial conditions, parametric uncertainty, and actuators noise.
Q-EnergyDEX: A Zero-Trust Distributed Energy Trading Framework Driven by Quantum Key Distribution and Blockchain
The rapid decentralization and digitalization of local electricity markets have introduced new cyber-physical vulnerabilities, including key leakage, data tampering, and identity spoofing. Existing blockchain-based solutions provide transparency and traceability but still depend on classical cryptographic primitives that are vulnerable to quantum attacks. To address these challenges, this paper proposes Q-EnergyDEX, a zero-trust distributed energy trading framework driven by quantum key distribution and blockchain. The framework integrates physical-layer quantum randomness with market-level operations, providing an end-to-end quantum-secured infrastructure. A cloud-based Quantum Key Management Service continuously generates verifiable entropy and regulates key generation through a rate-adaptive algorithm to sustain high-quality randomness. A symmetric authentication protocol (Q-SAH) establishes secure and low-latency sessions, while the quantum-aided consensus mechanism (PoR-Lite) achieves probabilistic ledger finality within a few seconds. Furthermore, a Stackelberg-constrained bilateral auction couples market clearing with entropy availability, ensuring both economic efficiency and cryptographic security. Simulation results show that Q-EnergyDEX maintains robust key stability and near-optimal social welfare, demonstrating its feasibility for large-scale decentralized energy markets.
A predictive modular approach to constraint satisfaction under uncertainty - with application to glycosylation in continuous monoclonal antibody biosimilar production
The paper proposes a modular-based approach to constraint handling in process optimization and control. This is partly motivated by the recent interest in learning-based methods, e.g., within bioproduction, for which constraint handling under uncertainty is a challenge. The proposed constraint handler, called predictive filter, is combined with an adaptive constraint margin and a constraint violation cost monitor to minimize the cost of violating soft constraints due to model uncertainty and disturbances. The module can be combined with any controller and is based on minimally modifying the controller output, in a least squares sense, such that constraints are satisfied within the considered horizon. The proposed method is computationally efficient and suitable for real-time applications. The effectiveness of the method is illustrated through a realistic simulation case study of glycosylation constraint satisfaction in continuous monoclonal antibody biosimilar production using Chinese hamster ovary cells, for which the metabolic network model consists of 23 extracellular metabolites and 126 reactions.
Revolution-Spaced Output-Feedback Model Predictive Control for Station Keeping on Near-Rectilinear Halo Orbits
We develop a model predictive control (MPC) policy for station keeping on a Near-Rectilinear Halo Orbit (NRHO). The proposed policy achieves full-state tracking of a reference NRHO via a multiple-maneuver control horizon, each spaced one revolution apart to abide by typical mission operation requirements. We prove that the proposed policy is recursively feasible, and perform numerical evaluation in an output-feedback setting by incorporating a navigation filter and realistic operational uncertainties, where the proposed MPC is compared against the state-of-the-art station-keeping algorithm adopted for the Gateway. Our approach successfully maintains the spacecraft in the vicinity of the reference NRHO at a similar cumulative cost as existing station-keeping methods without encountering phase deviation issues, a common drawback of existing methods with one maneuver per revolution.
comment: 8 pages, 6 figures
Offline and Online Use of Interval and Set-Based Approaches for Control and State Estimation: A Selection of Methodological Approaches and Their Application
Control and state estimation procedures need to be robust against imprecisely known parameters, uncertainty in initial conditions, and external disturbances. Interval methods and other set-based techniques form the basis for the implementation of powerful approaches that can be used to identify parameters of dynamic system models in the presence of the aforementioned types of uncertainty. Moreover, they are applicable to a verified feasibility and stability analysis of controllers and state estimators. In addition to these approaches which are typically used offline for analysis of system models designed with classical floating point procedures, interval and set-based methods have also been developed in recent years, which allow to directly solve the associated design tasks and to implement reliable techniques that are applicable online, i.e., during system operation. The latter approaches include set-based model predictive control, online parameter adaptation techniques for nonlinear variable-structure and backstepping controllers, interval observers, and fault diagnosis techniques. This paper provides an overview of the methodological background and reviews numerous practical applications for which interval and other set-valued approaches have been employed successfully.
No-Regret Learning in Stackelberg Games with an Application to Electric Ride-Hailing
We consider the problem of efficiently learning to play single-leader multi-follower Stackelberg games when the leader lacks knowledge of the lower-level game. Such games arise in hierarchical decision-making problems involving self-interested agents. For example, in electric ride-hailing markets, a central authority aims to learn optimal charging prices to shape fleet distributions and charging patterns of ride-hailing companies. Existing works typically apply gradient-based methods to find the leader's optimal strategy. Such methods are impractical as they require that the followers share private utility information with the leader. Instead, we treat the lower-level game as a black box, assuming only that the followers' interactions approximate a Nash equilibrium while the leader observes the realized cost of the resulting approximation. Under kernel-based regularity assumptions on the leader's cost function, we develop a no-regret algorithm that converges to an $\epsilon$-Stackelberg equilibrium in $O(\sqrt{T})$ rounds. Finally, we validate our approach through a numerical case study on optimal pricing in electric ride-hailing markets.
comment: 8 pages, 2 figures, 1 table
Hierarchical Fuel-Cell Airpath Control: an Efficiency-Aware MIMO Control Approach Combined with a Novel Constraint-Enforcing Reference Governor
This paper presents a hierarchical multivariable control and constraint management approach for an air supply system for a proton exchange membrane fuel cell (PEMFC) system. The control objectives are to track desired compressor mass airflow and cathode inlet pressure, maintain a minimum oxygen excess ratio (OER), and run the system at maximum net efficiency. A multi-input multi-output (MIMO) internal model controller (IMC) is designed and simulated to track flow and pressure set-points, which showed high performance despite strongly coupled plant dynamics. A new set-point map is generated to compute the most efficient cathode inlet pressure from the stack current load. To enforce OER constraints, a novel reference governor (RG) with the ability to govern multiple references (the cascade RG) and the ability to speed up as well as slow down a reference signal (the cross-section RG) is developed and tested. Compared with a single-input single-output (SISO) air-flow control approach, the proposed MIMO control approach shows up to 7.36 percent lower hydrogen fuel consumption. Compared to a traditional load governor, the novel cascaded cross-section RG (CC-RG) shows up to 3.68 percent less mean absolute percent error (MAPE) on net power tracking and greatly improved worst-case OER on realistic drive-cycle simulations. Control development and validations were conducted on two fuel cell system (FCS) models, a nonlinear open-source model and a proprietary Ford high-fidelity model
comment: Accepted for publication in IEEE Transactions on Control Systems Technology. This version incorporates all peer-review revisions
Offline Reinforcement Learning via Inverse Optimization
Inspired by the recent successes of Inverse Optimization (IO) across various application domains, we propose a novel offline Reinforcement Learning (ORL) algorithm for continuous state and action spaces, leveraging the convex loss function called ``sub-optimality loss" from the IO literature. To mitigate the distribution shift commonly observed in ORL problems, we further employ a robust and non-causal Model Predictive Control (MPC) expert steering a nominal model of the dynamics using in-hindsight information stemming from the model mismatch. Unlike the existing literature, our robust MPC expert enjoys an exact and tractable convex reformulation. In the second part of this study, we show that the IO hypothesis class, trained by the proposed convex loss function, enjoys ample expressiveness and achieves competitive performance comparing with the state-of-the-art (SOTA) methods in the low-data regime of the MuJoCo benchmark while utilizing three orders of magnitude fewer parameters, thereby requiring significantly fewer computational resources. To facilitate the reproducibility of our results, we provide an open-source package implementing the proposed algorithms and the experiments.
comment: preprint
Composite learning backstepping control with guaranteed exponential stability and robustness
Adaptive backstepping control provides a feasible solution to achieve asymptotic tracking for mismatched uncertain nonlinear systems. However, the closed-loop stability depends on high-gain feedback generated by nonlinear damping terms, and closed-loop exponential stability with parameter convergence involves a stringent condition named persistent excitation (PE). This paper proposes a composite learning backstepping control (CLBC) strategy based on modular backstepping and high-order tuners to compensate for the transient process of parameter estimation and achieve closed-loop exponential stability without the nonlinear damping terms and the PE condition. A novel composite learning mechanism is designed to maximize the staged exciting strength for parameter estimation, such that parameter convergence can be achieved under a condition of interval excitation (IE) or even partial IE that is strictly weaker than PE. An extra prediction error is employed in the adaptive law to ensure the transient performance without nonlinear damping terms. The exponential stability of the closed-loop system is proved rigorously under the partial IE or IE condition. Simulations have demonstrated the effectiveness and superiority of the proposed method in both parameter estimation and control compared to state-of-the-art methods.
comment: This work has been submitted to the IEEE for possible publication
Strategy Templates for Almost-Sure and Positive Winning of Stochastic Parity Games towards Permissive and Resilient Control
Stochastic games are fundamental in various applications, including the control of cyber-physical systems (CPS), where both controller and environment are modeled as players. Traditional algorithms typically aim to determine a single winning strategy to develop a controller. However, in CPS control and other domains, permissive controllers are essential, as they enable the system to adapt when additional constraints arise and remain resilient to runtime changes. This work generalizes the concept of (permissive winning) strategy templates, originally introduced by Anand et al. at TACAS and CAV 2023 for deterministic games, to incorporate stochastic games. These templates capture an infinite number of winning strategies, allowing for efficient strategy adaptation to system changes. We focus on two winning criteria (almost-sure and positive winning) and five winning objectives (safety, reachability, B\"uchi, co-B\"uchi, and parity). Our contributions include algorithms for constructing templates for each winning criterion and objective and a novel approach for extracting a winning strategy from a given template. Discussions on comparisons between templates and between strategy extraction methods are provided.
comment: For the conference version published at ICTAC 2024 see: arXiv:2409.08607v1
A Weighted Predict-and-Optimize Framework for Power System Operation Considering Varying Impacts of Uncertainty
Prediction deviations of different uncertainties have varying impacts on downstream decision-making. Improving the prediction accuracy of critical uncertainties with significant impacts on decision-making quality yields better optimization results. Motivated by this observation, this paper proposes a novel weighted predict-and-optimize (WPO) framework for decision-making under multiple uncertainties. Specifically, we incorporate an uncertainty-aware weighting mechanism into the predictive model to capture the relative impact of each uncertainty on specific optimization tasks, and introduce a problem-driven prediction loss (PDPL) to quantify the suboptimality of the weighted predictions relative to perfect predictions in downstream optimization. By optimizing the uncertainty weights to minimize the PDPL, the proposed WPO framework enables adaptive assessment of uncertainty impacts and joint learning of prediction and optimization. Furthermore, to facilitate weight optimization, we develop a surrogate model that establishes a direct mapping between the uncertainty weights and the PDPL, where enhanced graph convolutional networks and multi-task learning are adopted for efficient surrogate model construction and training. Numerical experiments on the modified IEEE 33-bus and 123-bus systems demonstrate that the proposed WPO framework outperforms the traditional predict-then-optimize paradigm, reducing the PDPL by an average of 55% within acceptable computational time.
comment: This is a paper submitted to IEEE TRANSACTIONS ON Power Systems
Differentiable-by-design Nonlinear Optimization for Model Predictive Control
Nonlinear optimization-based control policies, such as those those arising in nonlinear Model Predictive Control, have seen remarkable success in recent years. These policies require solving computationally demanding nonlinear optimization programs online at each time-step. The resulting solution map, viewed as a function of the measured state of the system and design parameters, may not be differentiable, which poses significant challenges if the control policy is embedded in a gradient-based policy optimization scheme. We propose a principled way to regularize the nonlinear optimization problem, obtaining a surrogate derivative even if when the original problem is not differentiable. The surrogate problem is differentiable by design and its solution map coincides with the solution of the unregularized problem. We demonstrate the effectiveness of our approach in a free-final-time optimal control problem and a receding-horizon nonlinear MPC example.
Time-causal and time-recursive wavelets
When to apply wavelet analysis to real-time temporal signals, where the future cannot be accessed, it is essential to base all the steps in the signal processing pipeline on computational mechanisms that are truly time-causal. This paper describes how a time-causal wavelet analysis can be performed based on concepts developed in the area of temporal scale-space theory, originating from a complete classification of temporal smoothing kernels that guarantee non-creation of new structures from finer to coarser temporal scale levels. By necessity, convolution with truncated exponential kernels in cascade constitutes the only permissable class of kernels, as well as their temporal derivatives as a natural complement to fulfil the admissibility conditions of wavelet representations. For a particular way of choosing the time constants in the resulting infinite convolution of truncated exponential kernels, to ensure temporal scale covariance and thus self-similarity over temporal scales, we describe how mother wavelets can be chosen as temporal derivatives of the resulting time-causal limit kernel. By developing connections between wavelet theory and scale-space theory, we characterize and quantify how the continuous scaling properties transfer to the discrete implementation, demonstrating how the proposed time-causal wavelet representation can reflect the duration of locally dominant temporal structures in the input signals. We propose that this notion of time-causal wavelet analysis could be a valuable tool for signal processing tasks, where streams of signals are to be processed in real time, specifically for signals that may contain local variations over a rich span of temporal scales, or more generally for analysing physical or biophysical temporal phenomena, where a fully time-causal analysis is called for to be physically realistic.
comment: 23 pages, 8 figures
Inferring Foresightedness in Dynamic Noncooperative Games
Dynamic game theory is an increasingly popular tool for modeling multi-agent, e.g. human-robot, interactions. Game-theoretic models presume that each agent wishes to minimize a private cost function that depends on others' actions. These games typically evolve over a fixed time horizon, specifying how far into the future each agent plans. In practical settings, however, decision-makers may vary in foresightedness, or how much they care about their current cost in relation to their past and future costs. We conjecture that quantifying and estimating each agent's foresightedness from online data will enable safer and more efficient interactions with other agents. To this end, we frame this inference problem as an inverse dynamic game. We consider a specific objective function parametrization that smoothly interpolates myopic and farsighted planning. Games of this form are readily transformed into parametric mixed complementarity problems; we exploit the directional differentiability of solutions to these problems with respect to their hidden parameters to solve for agents' foresightedness. We conduct three experiments: one with synthetically generated delivery robot motion, one with real-world data involving people walking, biking, and driving vehicles, and one using high-fidelity simulators. The results of these experiments demonstrate that explicitly inferring agents' foresightedness enables game-theoretic models to make 33% more accurate models for agents' behavior.
PowerChain: A Verifiable Agentic AI System for Automating Distribution Grid Analyses
Rapid electrification and decarbonization are increasing the complexity of distribution grid (DG) operation and planning, necessitating advanced computational analyses to ensure reliability and resilience. These analyses depend on disparate workflows comprising complex models, function calls, and data pipelines that require substantial expert knowledge and remain difficult to automate. Workforce and budget constraints further limit utilities' ability to apply such analyses at scale. To address this gap, we build an agentic system PowerChain, which is capable of autonomously performing complex grid analyses. Existing agentic AI systems are typically developed in a bottom-up manner with customized context for predefined analysis tasks; therefore, they do not generalize to tasks that the agent has never seen. In comparison, to generalize to unseen DG analysis tasks, PowerChain dynamically generates structured context by leveraging supervisory signals from self-contained power systems tools (e.g., GridLAB-D) and an optimized set of expert-annotated and verified reasoning trajectories. For complex DG tasks defined in natural language, empirical results on real utility data demonstrate that PowerChain achieves up to a 144/% improvement in performance over baselines.
Physiology-informed layered sensing for intelligent human-exoskeleton interaction
Wearable exoskeletons hold transformative promise for restoring mobility across diverse users with muscular weakness or other impairments. However, their translation beyond laboratory environments remains limited by sensing systems that capture movement but not underlying physiology. Here, we present a soft, lightweight smart leg sleeve that achieves anatomically aligned, layered multimodal sensing by integrating textile-based surface electromyography (sEMG) electrodes, ultrasensitive textile strain sensors, and inertial measurement units (IMUs). Each sensing modality targets a distinct physiological layer: IMUs track joint kinematics at the skeletal level, sEMG monitors muscle activation at the muscular level, and strain sensors detect skin deformation at the cutaneous level. Together, these sensors provide real-time perception to support three core objectives: controlling personalized assistance, optimizing user effort, and safeguarding against injury risks. The system is skin-conformal, mechanically compliant, and seamlessly integrated with a custom exoskeleton ($<20$~g total sensor and electronics weight). We demonstrate: (1) accurate ankle joint moment estimation (RMSE = 0.13~Nm/kg), (2) real-time classification of metabolic trends (accuracy = 97.1\%), and (3) injury risk detection within 100~ms (recall = 0.96), all validated on unseen users using a leave-one-subject-out protocol. This work establishes a physiology-aligned sensing architecture that reframes exoskeleton perception from motion tracking to real-time physiological decoding, offering a pathway towards intelligent, adaptive, and personalized wearable robotics.
comment: 21 pages, 5 figures, 43 references
End-to-End Learning Framework for Solving Non-Markovian Optimal Control
Integer-order calculus often falls short in capturing the long-range dependencies and memory effects found in many real-world processes. Fractional calculus addresses these gaps via fractional-order integrals and derivatives, but fractional-order dynamical systems pose substantial challenges in system identification and optimal control due to the lack of standard control methodologies. In this paper, we theoretically derive the optimal control via linear quadratic regulator (LQR) for fractional-order linear time-invariant (FOLTI) systems and develop an end-to-end deep learning framework based on this theoretical foundation. Our approach establishes a rigorous mathematical model, derives analytical solutions, and incorporates deep learning to achieve data-driven optimal control of FOLTI systems. Our key contributions include: (i) proposing an innovative system identification method control strategy for FOLTI systems, (ii) developing the first end-to-end data-driven learning framework, Fractional-Order Learning for Optimal Control (FOLOC), that learns control policies from observed trajectories, and (iii) deriving a theoretical analysis of sample complexity to quantify the number of samples required for accurate optimal control in complex real-world problems. Experimental results indicate that our method accurately approximates fractional-order system behaviors without relying on Gaussian noise assumptions, pointing to promising avenues for advanced optimal control.
Autonomous Cyber Resilience via a Co-Evolutionary Arms Race within a Fortified Digital Twin Sandbox SP
The convergence of Information Technology and Operational Technology has exposed Industrial Control Systems to adaptive, intelligent adversaries that render static defenses obsolete. This paper introduces the Adversarial Resilience Co-evolution (ARC) framework, addressing the "Trinity of Trust" comprising model fidelity, data integrity, and analytical resilience. ARC establishes a co-evolutionary arms race within a Fortified Secure Digital Twin (F-SCDT), where a Deep Reinforcement Learning "Red Agent" autonomously discovers attack paths while an ensemble-based "Blue Agent" is continuously hardened against these threats. Experimental validation on the Tennessee Eastman Process (TEP) and Secure Water Treatment (SWaT) testbeds demonstrates superior performance in detecting novel attacks, with F1-scores improving from 0.65 to 0.89 and detection latency reduced from over 1200 seconds to 210 seconds. A comprehensive ablation study reveals that the co-evolutionary process itself contributes a 27% performance improvement. By integrating Explainable AI and proposing a Federated ARC architecture, this work presents a necessary paradigm shift toward dynamic, self-improving security for critical infrastructure.
comment: 6 pages, 2 figures, 4 equations, 1 algorithm, 3 tables, to be published in ISPACS 2025, unabridged version exists as arXiv:2506.20102v1
Systems and Control (EESS)
RDD: Retrieval-Based Demonstration Decomposer for Planner Alignment in Long-Horizon Tasks NeurIPS 2025
To tackle long-horizon tasks, recent hierarchical vision-language-action (VLAs) frameworks employ vision-language model (VLM)-based planners to decompose complex manipulation tasks into simpler sub-tasks that low-level visuomotor policies can easily handle. Typically, the VLM planner is finetuned to learn to decompose a target task. This finetuning requires target task demonstrations segmented into sub-tasks by either human annotation or heuristic rules. However, the heuristic subtasks can deviate significantly from the training data of the visuomotor policy, which degrades task performance. To address these issues, we propose a Retrieval-based Demonstration Decomposer (RDD) that automatically decomposes demonstrations into sub-tasks by aligning the visual features of the decomposed sub-task intervals with those from the training data of the low-level visuomotor policies. Our method outperforms the state-of-the-art sub-task decomposer on both simulation and real-world tasks, demonstrating robustness across diverse settings. Code and more results are available at rdd-neurips.github.io.
comment: 39th Conference on Neural Information Processing Systems (NeurIPS 2025); Project Website: rdd-neurips.github.io
CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier Functions
Reinforcement learning (RL), while powerful and expressive, can often prioritize performance at the expense of safety. Yet safety violations can lead to catastrophic outcomes in real-world deployments. Control Barrier Functions (CBFs) offer a principled method to enforce dynamic safety -- traditionally deployed \emph{online} via safety filters. While the result is safe behavior, the fact that the RL policy does not have knowledge of the CBF can lead to conservative behaviors. This paper proposes CBF-RL, a framework for generating safe behaviors with RL by enforcing CBFs \emph{in training}. CBF-RL has two key attributes: (1) minimally modifying a nominal RL policy to encode safety constraints via a CBF term, (2) and safety filtering of the policy rollouts in training. Theoretically, we prove that continuous-time safety filters can be deployed via closed-form expressions on discrete-time roll-outs. Practically, we demonstrate that CBF-RL internalizes the safety constraints in the learned policy -- both enforcing safer actions and biasing towards safer rewards -- enabling safe deployment without the need for an online safety filter. We validate our framework through ablation studies on navigation tasks and on the Unitree G1 humanoid robot, where CBF-RL enables safer exploration, faster convergence, and robust performance under uncertainty, enabling the humanoid robot to avoid obstacles and climb stairs safely in real-world settings without a runtime safety filter.
comment: 8 pages
Architecture Is All You Need: Diversity-Enabled Sweet Spots for Robust Humanoid Locomotion
Robust humanoid locomotion in unstructured environments requires architectures that balance fast low-level stabilization with slower perceptual decision-making. We show that a simple layered control architecture (LCA), a proprioceptive stabilizer running at high rate, coupled with a compact low-rate perceptual policy, enables substantially more robust performance than monolithic end-to-end designs, even when using minimal perception encoders. Through a two-stage training curriculum (blind stabilizer pretraining followed by perceptual fine-tuning), we demonstrate that layered policies consistently outperform one-stage alternatives in both simulation and hardware. On a Unitree G1 humanoid, our approach succeeds across stair and ledge tasks where one-stage perceptual policies fail. These results highlight that architectural separation of timescales, rather than network scale or complexity, is the key enabler for robust perception-conditioned locomotion.
comment: 8 pages
Further Results on Safety-Critical Stabilization of Force-Controlled Nonholonomic Mobile Robots
In this paper, we address the stabilization problem for force-controlled nonholonomic mobile robots under safety-critical constraints. We propose a continuous, time-invariant control law based on the gamma m-quadratic programming (gamma m-QP) framework, which unifies control Lyapunov functions (CLFs) and control barrier functions (CBFs) to enforce both stability and safety in the closed-loop system. For the first time, we construct a global, time-invariant, strict Lyapunov function for the closed-loop nonholonomic mobile robot system with a nominal stabilization controller in polar coordinates; this strict Lyapunov function then serves as the CLF in the QP design. Next, by exploiting the inherent cascaded structure of the vehicle dynamics, we develop a CBF for the mobile robot via an integrator backstepping procedure. Our main results guarantee both asymptotic stability and safety for the closed-loop system. Both the simulation and experimental results are presented to illustrate the effectiveness and performance of our approach.
Through-the-Earth Magnetic Induction Communication and Networking: A Comprehensive Survey
Magnetic induction (MI) communication (MIC) has emerged as a promising candidate for underground communication networks due to its excellent penetration capabilities. Integration with Space-Air-Ground-Underground (SAGUI) networks in next-generation mobile communication systems requires a well-defined network architecture. A recent discovery in MIC research, MI fast fading, remains in its early stages and presents unique challenges. This paper provides a comprehensive survey on through-the-earth (TTE) MIC, covering MI applications, channel modeling, point-to-point MIC design, relay techniques, network frameworks, and emerging technologies. We compare various MIC applications to highlight TTE-specific challenges and review the principles of channel modeling, addressing both MI slow fading and MI fast fading, along with its potential impact on existing MIC theories. We conduct a fine-grained decomposition of MI channel power gain into four distinct physical parameters, and propose a novel geometric model to analyze MI fast fading. We also summarize MI relay techniques, examine crosstalk effects in relay and high-density networks, and explore key research tasks within the OSI framework for a holistic MI network protocol in SAGUI. To bridge the gaps identified, we propose a MIC framework that supports TCP/IP and Linux, enabling full implementation of existing and emerging MIC solutions. This framework empowers researchers to leverage Linux resources and deep learning platforms for accelerated development of MIC in SAGUI networks. Remaining research challenges, open issues, and promising novel techniques are further identified to advance MIC research.
comment: This work has been accepted by the IEEE Communications Surveys & Tutorials (COMST) for publication.The final published version will be available on IEEE Xplore
Dynamic-Key-Aware Co-Simulation Framework for Next Generation of SCADA Systems Encrypted by Quantum-Key-Distribution Techniques
To address growing cybersecurity challenges in modern power dispatch systems, this paper proposes a multi-layer modeling and optimization framework for SCADA systems enhanced with quantum key distribution (QKD). While most existing applications of QKD in the power sector focus on building secure point-to-point communication tunnels, they rarely consider the system-level coupling between key dynamics and control scheduling. In contrast, our approach integrates quantum key generation, consumption, inventory prediction, and control latency into a unified model, enabling key-aware reconfiguration of SCADA control chains based on task security demands and real-time resource constraints. To resolve conflicts in key resource allocation between transmission system operators (TSOs) and distribution system operators (DSOs), we formulate a bi-level Stackelberg game and transform it into a mathematical program with complementarity constraints (MPCC). We further develop an efficient Level Decomposition-Complementarity Pruning (LD-CP) algorithm to solve the problem. To support reproducible evaluation, we build an end-to-end co-simulation platform that integrates physical-layer disruptions via OpenQKD-Sim, Q3P/IEC-104 protocol stack binding, and real-time control-chain monitoring through Grafana. Experimental results on the IEEE 39- and 118-bus systems show that our method increases task success rate by 25%, reduces peak frequency deviation by 70%, and improves key utilization to 83%. This work lays the foundation for future quantum-secure control systems in power grid operations.
Improved Voltage Regulation with Optimal Design of Decentralized Volt-VAr Control
Integration of distributed energy resources has created a need for autonomous, dynamic voltage regulation. Decentralized Volt-VAr Control (VVC) of grid-connected inverters presents a unique opportunity for voltage management but, if designed poorly, can lead to unstable behavior when in feedback with the grid. We model the grid-VVC closed-loop dynamics with a linearized power flow approach, leveraging historical data, which shows improvement over the commonly used LinDistFlow model. This model is used to design VVC slopes by minimizing steady-state voltage deviation from the nominal value, subject to a non-convex spectral radius stability constraint, which has not been previously implemented within this context. We compare this constraint to existing convex restrictions and demonstrate, through simulations on a realistic feeder, that using the spectral radius results in more effective voltage regulation.
A Human-Vector Susceptible--Infected--Susceptible Model for Analyzing and Controlling the Spread of Vector-Borne Diseases
We propose an epidemic model for the spread of vector-borne diseases. The model, which is built extending the classical susceptible-infected-susceptible model, accounts for two populations -- humans and vectors -- and for cross-contagion between the two species, whereby humans become infected upon interaction with carrier vectors, and vectors become carriers after interaction with infected humans. We formulate the model as a system of ordinary differential equations and leverage monotone systems theory to rigorously characterize the epidemic dynamics. Specifically, we characterize the global asymptotic behavior of the disease, determining conditions for quick eradication of the disease (i.e., for which all trajectories converge to a disease-free equilibrium), or convergence to a (unique) endemic equilibrium. Then, we incorporate two control actions: namely, vector control and incentives to adopt protection measures. Using the derived mathematical tools, we assess the impact of these two control actions and determine the optimal control policy.
comment: To appear in the Proceedings of the 2025 European Control Conference (ECC)
High-Resolution PTDF-Based Planning of Storage and Transmission Under High Renewables
Transmission Expansion Planning (TEP) optimizes power grid upgrades and investments to ensure reliable, efficient, and cost-effective electricity delivery while addressing grid constraints. To support growing demand and renewable energy integration, energy storage is emerging as a pivotal asset that provides temporal flexibility and alleviates congestion. This paper develops a multiperiod, two-stage PTDF formulation that co-optimizes transmission upgrades and storage siting/sizing. To ensure scalability, a trust-region, multicut Benders scheme warm-started from per-representative-day optima is proposed. Applied to a 2,000-bus synthetic Texas system under high-renewable projections, the method attains final optimality gaps below 1% and yields a plan with storage at about 180 nodes (32% of peak renewable capacity). These results demonstrate that the proposed PTDF-based methodology efficiently handles large distributed storage fleets, demonstrating scalability at high spatial resolution
A Deep State-Space Model Compression Method using Upper Bound on Output Error
We study deep state-space models (Deep SSMs) that contain linear-quadratic-output (LQO) systems as internal blocks and present a compression method with a provable output error guarantee. We first derive an upper bound on the output error between two Deep SSMs and show that the bound can be expressed via the $h^2$-error norms between the layerwise LQO systems, thereby providing a theoretical justification for existing model order reduction (MOR)-based compression. Building on this bound, we formulate an optimization problem in terms of the $h^2$-error norm and develop a gradient-based MOR method. On the IMDb task from the Long Range Arena benchmark, we demonstrate that our compression method achieves strong performance. Moreover, unlike prior approaches, we reduce roughly 80% of trainable parameters without retraining, with only a 4-5% performance drop.
Stability Criteria and Motor Performance in Delayed Haptic Dyadic Interactions Mediated by Robots
This paper establishes analytical stability criteria for robot-mediated human-human (dyadic) interaction systems, focusing on haptic communication under network-induced time delays. Through frequency-domain analysis supported by numerical simulations, we identify both delay-independent and delay-dependent stability criteria. The delay-independent criterion guarantees stability irrespective of the delay, whereas the delay-dependent criterion is characterised by a maximum tolerable delay before instability occurs. The criteria demonstrate dependence on controller and robot dynamic parameters, where increasing stiffness reduces the maximum tolerable delay in a non-linear manner, thereby heightening system vulnerability. The proposed criteria can be generalised to a wide range of robot-mediated interactions and serve as design guidelines for stable remote dyadic systems. Experiments with robots performing human-like movements further illustrate the correlation between stability and motor performance. The findings of this paper suggest the prerequisites for effective delay-compensation strategies.
RoboANKLE: Design, Development, and Functional Evaluation of a Robotic Ankle with a Motorized Compliant Unit
This study presents a powered transtibial prosthesis with complete push-off assistance, RoboANKLE. The design aims to fulfill specific requirements, such as a sufficient range of motion (RoM) while providing the necessary torque for achieving natural ankle motion in daily activities. Addressing the challenges faced in designing active transtibial prostheses, such as maintaining energetic autonomy and minimizing weight, is vital for the study. With this aim, we try to imitate the human ankle by providing extensive push-off assistance to achieve a natural-like torque profile. Thus, Energy Store and Extended Release mechanism (ESER) is employed with a novel Extra Energy Storage (EES) mechanism. Kinematic and kinetic analyses are carried out to determine the design parameters and assess the design performance. Subsequently, a Computer-Aided Design (CAD) model is built and used in comprehensive dynamic and structural analyses. These analyses are used for the design performance evaluation and determine the forces and torques applied to the prosthesis, which aids in optimizing the design for minimal weight via structural analysis and topology optimization. The design of the prototype is then finalized and manufactured for experimental evaluation to validate the design and functionality. The prototype is realized with a mass of 1.92 kg and dimensions of 261x107x420 mm. The Functional evaluations of the RoboANKLE revealed that it is capable of achieving the natural maximum dorsi-flexion angle with 95% accuracy. Also, Thanks to the implemented mechanisms, the results show that RoboANKLE can generate 57% higher than the required torque for natural walking. The result of the power generation capacity of the RoboANKLE is 10% more than the natural power during the gait cycle.
Prescribed Performance Control of Deformable Object Manipulation in Spatial Latent Space
Manipulating three-dimensional (3D) deformable objects presents significant challenges for robotic systems due to their infinite-dimensional state space and complex deformable dynamics. This paper proposes a novel model-free approach for shape control with constraints imposed on key points. Unlike existing methods that rely on feature dimensionality reduction, the proposed controller leverages the coordinates of key points as the feature vector, which are extracted from the deformable object's point cloud using deep learning methods. This approach not only reduces the dimensionality of the feature space but also retains the spatial information of the object. By extracting key points, the manipulation of deformable objects is simplified into a visual servoing problem, where the shape dynamics are described using a deformation Jacobian matrix. To enhance control accuracy, a prescribed performance control method is developed by integrating barrier Lyapunov functions (BLF) to enforce constraints on the key points. The stability of the closed-loop system is rigorously analyzed and verified using the Lyapunov method. Experimental results further demonstrate the effectiveness and robustness of the proposed method.
A Comparative Study of Oscillatory Perturbations in Car-Following Models
As connected and autonomous vehicles become more widespread, platooning has emerged as a key strategy to improve road capacity, reduce fuel consumption, and enhance traffic flow. However, the benefits of platoons strongly depend on their ability to maintain stability. Instability can lead to unsafe spacing and increased energy usage. In this work, we study platoon instability and analyze the root cause of its occurrence, as well as its impacts on the following vehicle. To achieve this, we propose a comparative study between different car-following models such as the Intelligent Driver Model (IDM), the Optimal Velocity Model (OVM), the General Motors Model (GMM), and the Cooperative Adaptive Cruise Control (CACC). In our approach, we introduce a disruption in the model by varying the velocity of the leading vehicle to visualize the behavior of the following vehicles. To evaluate the dynamic response of each model, we introduce controlled perturbations in the velocity of the leading vehicle, specifically, sinusoidal oscillations and discrete velocity changes. The resulting vehicle trajectories and variations in inter-vehicle spacing are analyzed to assess the robustness of each model to disturbance propagation. The findings offer insight into model sensitivity, stability characteristics, and implications for designing resilient platooning control strategies.
Two Roads to Koopman Operator Theory for Control: Infinite Input Sequences and Operator Families
The Koopman operator, originally defined for dynamical systems without input, has inspired many applications in control. Yet, the theoretical foundations underpinning this progress in control remain underdeveloped. This paper investigates the theoretical structure and connections between two extensions of Koopman theory to control: (i) Koopman operator via infinite input sequences and (ii) the Koopman control family. Although these frameworks encode system information in fundamentally different ways, we show that under certain conditions on the function spaces they operate on, they are equivalent. The equivalence is both in terms of the actions of the Koopman-based formulations in each framework as well as the function values on the system trajectories. Our analysis provides constructive tools to translate between the frameworks, offering a unified perspective for Koopman methods in control.
Tail-Optimized Caching for LLM Inference
Prompt caching is critical for reducing latency and cost in LLM inference: OpenAI and Anthropic report up to 50-90% cost savings through prompt reuse. Despite its widespread success, little is known about what constitutes an optimal prompt caching policy, particularly when optimizing tail latency, a metric of central importance to practitioners. The widely used Least Recently Used (LRU) policy can perform arbitrarily poor on this metric, as it is oblivious to the heterogeneity of conversation lengths. To address this gap, we propose Tail-Optimized LRU, a simple two-line modification that reallocates KV cache capacity to prioritize high-latency conversations by evicting cache entries that are unlikely to affect future turns. Though the implementation is simple, we prove its optimality under a natural stochastic model of conversation dynamics, providing the first theoretical justification for LRU in this setting, a result that may be of independent interest to the caching community. Experimentally, on real conversation data WildChat, Tail-Optimized LRU achieves up to 27.5% reduction in P90 tail Time to First Token latency and 23.9% in P95 tail latency compared to LRU, along with up to 38.9% decrease in SLO violations of 200ms. We believe this provides a practical and theoretically grounded option for practitioners seeking to optimize tail latency in real-world LLM deployments.
Sparsity-exploiting Gaussian Process for Robust Transient Learning of Power System Dynamics
Advances in leveraging Gaussian processes (GP) have enabled learning and inferring dynamic grid behavior from scarce PMU measurements. However, real measurements can be corrupted by various random and targeted threats, leading to inaccurate and meaningless results. This paper develops robust transient learning to overcome this challenge by exploiting the sparse corruption patterns in the data flow. Specifically, we integrate sparse optimization with method of moments (MoM) to make learning robust to a sparse distribution of data corruptions; then, we optimize sparse weights to identify corrupted meter locations. To improve inference speed on large-scale systems, we further adopt K-medoid clustering of locations to develop dimension reduction (DR) and aggregate representation (AR) heuristics. Experimental results demonstrate robustness against random large errors, targeted false data injections, and local PMU clock drifts. On a 1354-bus system, inference turns out to be 18x faster using DR and 400x faster when further combined with AR heuristics.
comment: This manuscript has been submitted to PESGM2026
Exploring a New Design Paradigm for Omnidirectional MAVs for Minimal Actuation and Internal Force Elimination: Theoretical Framework and Control
This paper presents a novel concept for achieving omnidirectionality in a multirotor aerial vehicle (MAV) that uses only 6 inputs and ensures no internal forces at the equilibria. The concept integrates a single actively-tilting propeller along with 3 pendulum-like links, each carrying a propeller, connected by passive universal joints to the main body. We show that this design ensures omnidirectionality while minimizing the internal forces and without resorting to overactuation (i.e., more than 6 inputs). A detailed dynamic model of the multi-link MAV is first developed. Afterwards, the analysis identifies the equilibrium configurations and illustrates that a forced equilibrium exists for every pose of the MAV's main platform. In order to render this equilibrium asymptotically stable for the closed-loop system, a geometric nonlinear controller is constructed using dynamic feedback linearization and backstepping techniques with the main platform configuration error being the left-trivialized error on SE(3). The stability of the closed-loop system is then investigated by employing standard Lyapunov arguments on the zero dynamics. We conclude by providing numerical simulations validating the proposed approach. They demonstrate the MAV capability to perform decoupled attitude and translational motions under non-zero initial conditions, parametric uncertainty, and actuators noise.
Q-EnergyDEX: A Zero-Trust Distributed Energy Trading Framework Driven by Quantum Key Distribution and Blockchain
The rapid decentralization and digitalization of local electricity markets have introduced new cyber-physical vulnerabilities, including key leakage, data tampering, and identity spoofing. Existing blockchain-based solutions provide transparency and traceability but still depend on classical cryptographic primitives that are vulnerable to quantum attacks. To address these challenges, this paper proposes Q-EnergyDEX, a zero-trust distributed energy trading framework driven by quantum key distribution and blockchain. The framework integrates physical-layer quantum randomness with market-level operations, providing an end-to-end quantum-secured infrastructure. A cloud-based Quantum Key Management Service continuously generates verifiable entropy and regulates key generation through a rate-adaptive algorithm to sustain high-quality randomness. A symmetric authentication protocol (Q-SAH) establishes secure and low-latency sessions, while the quantum-aided consensus mechanism (PoR-Lite) achieves probabilistic ledger finality within a few seconds. Furthermore, a Stackelberg-constrained bilateral auction couples market clearing with entropy availability, ensuring both economic efficiency and cryptographic security. Simulation results show that Q-EnergyDEX maintains robust key stability and near-optimal social welfare, demonstrating its feasibility for large-scale decentralized energy markets.
A predictive modular approach to constraint satisfaction under uncertainty - with application to glycosylation in continuous monoclonal antibody biosimilar production
The paper proposes a modular-based approach to constraint handling in process optimization and control. This is partly motivated by the recent interest in learning-based methods, e.g., within bioproduction, for which constraint handling under uncertainty is a challenge. The proposed constraint handler, called predictive filter, is combined with an adaptive constraint margin and a constraint violation cost monitor to minimize the cost of violating soft constraints due to model uncertainty and disturbances. The module can be combined with any controller and is based on minimally modifying the controller output, in a least squares sense, such that constraints are satisfied within the considered horizon. The proposed method is computationally efficient and suitable for real-time applications. The effectiveness of the method is illustrated through a realistic simulation case study of glycosylation constraint satisfaction in continuous monoclonal antibody biosimilar production using Chinese hamster ovary cells, for which the metabolic network model consists of 23 extracellular metabolites and 126 reactions.
Revolution-Spaced Output-Feedback Model Predictive Control for Station Keeping on Near-Rectilinear Halo Orbits
We develop a model predictive control (MPC) policy for station keeping on a Near-Rectilinear Halo Orbit (NRHO). The proposed policy achieves full-state tracking of a reference NRHO via a multiple-maneuver control horizon, each spaced one revolution apart to abide by typical mission operation requirements. We prove that the proposed policy is recursively feasible, and perform numerical evaluation in an output-feedback setting by incorporating a navigation filter and realistic operational uncertainties, where the proposed MPC is compared against the state-of-the-art station-keeping algorithm adopted for the Gateway. Our approach successfully maintains the spacecraft in the vicinity of the reference NRHO at a similar cumulative cost as existing station-keeping methods without encountering phase deviation issues, a common drawback of existing methods with one maneuver per revolution.
comment: 8 pages, 6 figures
Offline and Online Use of Interval and Set-Based Approaches for Control and State Estimation: A Selection of Methodological Approaches and Their Application
Control and state estimation procedures need to be robust against imprecisely known parameters, uncertainty in initial conditions, and external disturbances. Interval methods and other set-based techniques form the basis for the implementation of powerful approaches that can be used to identify parameters of dynamic system models in the presence of the aforementioned types of uncertainty. Moreover, they are applicable to a verified feasibility and stability analysis of controllers and state estimators. In addition to these approaches which are typically used offline for analysis of system models designed with classical floating point procedures, interval and set-based methods have also been developed in recent years, which allow to directly solve the associated design tasks and to implement reliable techniques that are applicable online, i.e., during system operation. The latter approaches include set-based model predictive control, online parameter adaptation techniques for nonlinear variable-structure and backstepping controllers, interval observers, and fault diagnosis techniques. This paper provides an overview of the methodological background and reviews numerous practical applications for which interval and other set-valued approaches have been employed successfully.
No-Regret Learning in Stackelberg Games with an Application to Electric Ride-Hailing
We consider the problem of efficiently learning to play single-leader multi-follower Stackelberg games when the leader lacks knowledge of the lower-level game. Such games arise in hierarchical decision-making problems involving self-interested agents. For example, in electric ride-hailing markets, a central authority aims to learn optimal charging prices to shape fleet distributions and charging patterns of ride-hailing companies. Existing works typically apply gradient-based methods to find the leader's optimal strategy. Such methods are impractical as they require that the followers share private utility information with the leader. Instead, we treat the lower-level game as a black box, assuming only that the followers' interactions approximate a Nash equilibrium while the leader observes the realized cost of the resulting approximation. Under kernel-based regularity assumptions on the leader's cost function, we develop a no-regret algorithm that converges to an $\epsilon$-Stackelberg equilibrium in $O(\sqrt{T})$ rounds. Finally, we validate our approach through a numerical case study on optimal pricing in electric ride-hailing markets.
comment: 8 pages, 2 figures, 1 table
Hierarchical Fuel-Cell Airpath Control: an Efficiency-Aware MIMO Control Approach Combined with a Novel Constraint-Enforcing Reference Governor
This paper presents a hierarchical multivariable control and constraint management approach for an air supply system for a proton exchange membrane fuel cell (PEMFC) system. The control objectives are to track desired compressor mass airflow and cathode inlet pressure, maintain a minimum oxygen excess ratio (OER), and run the system at maximum net efficiency. A multi-input multi-output (MIMO) internal model controller (IMC) is designed and simulated to track flow and pressure set-points, which showed high performance despite strongly coupled plant dynamics. A new set-point map is generated to compute the most efficient cathode inlet pressure from the stack current load. To enforce OER constraints, a novel reference governor (RG) with the ability to govern multiple references (the cascade RG) and the ability to speed up as well as slow down a reference signal (the cross-section RG) is developed and tested. Compared with a single-input single-output (SISO) air-flow control approach, the proposed MIMO control approach shows up to 7.36 percent lower hydrogen fuel consumption. Compared to a traditional load governor, the novel cascaded cross-section RG (CC-RG) shows up to 3.68 percent less mean absolute percent error (MAPE) on net power tracking and greatly improved worst-case OER on realistic drive-cycle simulations. Control development and validations were conducted on two fuel cell system (FCS) models, a nonlinear open-source model and a proprietary Ford high-fidelity model
comment: Accepted for publication in IEEE Transactions on Control Systems Technology. This version incorporates all peer-review revisions
Offline Reinforcement Learning via Inverse Optimization
Inspired by the recent successes of Inverse Optimization (IO) across various application domains, we propose a novel offline Reinforcement Learning (ORL) algorithm for continuous state and action spaces, leveraging the convex loss function called ``sub-optimality loss" from the IO literature. To mitigate the distribution shift commonly observed in ORL problems, we further employ a robust and non-causal Model Predictive Control (MPC) expert steering a nominal model of the dynamics using in-hindsight information stemming from the model mismatch. Unlike the existing literature, our robust MPC expert enjoys an exact and tractable convex reformulation. In the second part of this study, we show that the IO hypothesis class, trained by the proposed convex loss function, enjoys ample expressiveness and achieves competitive performance comparing with the state-of-the-art (SOTA) methods in the low-data regime of the MuJoCo benchmark while utilizing three orders of magnitude fewer parameters, thereby requiring significantly fewer computational resources. To facilitate the reproducibility of our results, we provide an open-source package implementing the proposed algorithms and the experiments.
comment: preprint
Composite learning backstepping control with guaranteed exponential stability and robustness
Adaptive backstepping control provides a feasible solution to achieve asymptotic tracking for mismatched uncertain nonlinear systems. However, the closed-loop stability depends on high-gain feedback generated by nonlinear damping terms, and closed-loop exponential stability with parameter convergence involves a stringent condition named persistent excitation (PE). This paper proposes a composite learning backstepping control (CLBC) strategy based on modular backstepping and high-order tuners to compensate for the transient process of parameter estimation and achieve closed-loop exponential stability without the nonlinear damping terms and the PE condition. A novel composite learning mechanism is designed to maximize the staged exciting strength for parameter estimation, such that parameter convergence can be achieved under a condition of interval excitation (IE) or even partial IE that is strictly weaker than PE. An extra prediction error is employed in the adaptive law to ensure the transient performance without nonlinear damping terms. The exponential stability of the closed-loop system is proved rigorously under the partial IE or IE condition. Simulations have demonstrated the effectiveness and superiority of the proposed method in both parameter estimation and control compared to state-of-the-art methods.
comment: This work has been submitted to the IEEE for possible publication
Strategy Templates for Almost-Sure and Positive Winning of Stochastic Parity Games towards Permissive and Resilient Control
Stochastic games are fundamental in various applications, including the control of cyber-physical systems (CPS), where both controller and environment are modeled as players. Traditional algorithms typically aim to determine a single winning strategy to develop a controller. However, in CPS control and other domains, permissive controllers are essential, as they enable the system to adapt when additional constraints arise and remain resilient to runtime changes. This work generalizes the concept of (permissive winning) strategy templates, originally introduced by Anand et al. at TACAS and CAV 2023 for deterministic games, to incorporate stochastic games. These templates capture an infinite number of winning strategies, allowing for efficient strategy adaptation to system changes. We focus on two winning criteria (almost-sure and positive winning) and five winning objectives (safety, reachability, B\"uchi, co-B\"uchi, and parity). Our contributions include algorithms for constructing templates for each winning criterion and objective and a novel approach for extracting a winning strategy from a given template. Discussions on comparisons between templates and between strategy extraction methods are provided.
comment: For the conference version published at ICTAC 2024 see: arXiv:2409.08607v1
A Weighted Predict-and-Optimize Framework for Power System Operation Considering Varying Impacts of Uncertainty
Prediction deviations of different uncertainties have varying impacts on downstream decision-making. Improving the prediction accuracy of critical uncertainties with significant impacts on decision-making quality yields better optimization results. Motivated by this observation, this paper proposes a novel weighted predict-and-optimize (WPO) framework for decision-making under multiple uncertainties. Specifically, we incorporate an uncertainty-aware weighting mechanism into the predictive model to capture the relative impact of each uncertainty on specific optimization tasks, and introduce a problem-driven prediction loss (PDPL) to quantify the suboptimality of the weighted predictions relative to perfect predictions in downstream optimization. By optimizing the uncertainty weights to minimize the PDPL, the proposed WPO framework enables adaptive assessment of uncertainty impacts and joint learning of prediction and optimization. Furthermore, to facilitate weight optimization, we develop a surrogate model that establishes a direct mapping between the uncertainty weights and the PDPL, where enhanced graph convolutional networks and multi-task learning are adopted for efficient surrogate model construction and training. Numerical experiments on the modified IEEE 33-bus and 123-bus systems demonstrate that the proposed WPO framework outperforms the traditional predict-then-optimize paradigm, reducing the PDPL by an average of 55% within acceptable computational time.
comment: This is a paper submitted to IEEE TRANSACTIONS ON Power Systems
Differentiable-by-design Nonlinear Optimization for Model Predictive Control
Nonlinear optimization-based control policies, such as those those arising in nonlinear Model Predictive Control, have seen remarkable success in recent years. These policies require solving computationally demanding nonlinear optimization programs online at each time-step. The resulting solution map, viewed as a function of the measured state of the system and design parameters, may not be differentiable, which poses significant challenges if the control policy is embedded in a gradient-based policy optimization scheme. We propose a principled way to regularize the nonlinear optimization problem, obtaining a surrogate derivative even if when the original problem is not differentiable. The surrogate problem is differentiable by design and its solution map coincides with the solution of the unregularized problem. We demonstrate the effectiveness of our approach in a free-final-time optimal control problem and a receding-horizon nonlinear MPC example.
Time-causal and time-recursive wavelets
When to apply wavelet analysis to real-time temporal signals, where the future cannot be accessed, it is essential to base all the steps in the signal processing pipeline on computational mechanisms that are truly time-causal. This paper describes how a time-causal wavelet analysis can be performed based on concepts developed in the area of temporal scale-space theory, originating from a complete classification of temporal smoothing kernels that guarantee non-creation of new structures from finer to coarser temporal scale levels. By necessity, convolution with truncated exponential kernels in cascade constitutes the only permissable class of kernels, as well as their temporal derivatives as a natural complement to fulfil the admissibility conditions of wavelet representations. For a particular way of choosing the time constants in the resulting infinite convolution of truncated exponential kernels, to ensure temporal scale covariance and thus self-similarity over temporal scales, we describe how mother wavelets can be chosen as temporal derivatives of the resulting time-causal limit kernel. By developing connections between wavelet theory and scale-space theory, we characterize and quantify how the continuous scaling properties transfer to the discrete implementation, demonstrating how the proposed time-causal wavelet representation can reflect the duration of locally dominant temporal structures in the input signals. We propose that this notion of time-causal wavelet analysis could be a valuable tool for signal processing tasks, where streams of signals are to be processed in real time, specifically for signals that may contain local variations over a rich span of temporal scales, or more generally for analysing physical or biophysical temporal phenomena, where a fully time-causal analysis is called for to be physically realistic.
comment: 23 pages, 8 figures
Inferring Foresightedness in Dynamic Noncooperative Games
Dynamic game theory is an increasingly popular tool for modeling multi-agent, e.g. human-robot, interactions. Game-theoretic models presume that each agent wishes to minimize a private cost function that depends on others' actions. These games typically evolve over a fixed time horizon, specifying how far into the future each agent plans. In practical settings, however, decision-makers may vary in foresightedness, or how much they care about their current cost in relation to their past and future costs. We conjecture that quantifying and estimating each agent's foresightedness from online data will enable safer and more efficient interactions with other agents. To this end, we frame this inference problem as an inverse dynamic game. We consider a specific objective function parametrization that smoothly interpolates myopic and farsighted planning. Games of this form are readily transformed into parametric mixed complementarity problems; we exploit the directional differentiability of solutions to these problems with respect to their hidden parameters to solve for agents' foresightedness. We conduct three experiments: one with synthetically generated delivery robot motion, one with real-world data involving people walking, biking, and driving vehicles, and one using high-fidelity simulators. The results of these experiments demonstrate that explicitly inferring agents' foresightedness enables game-theoretic models to make 33% more accurate models for agents' behavior.
PowerChain: A Verifiable Agentic AI System for Automating Distribution Grid Analyses
Rapid electrification and decarbonization are increasing the complexity of distribution grid (DG) operation and planning, necessitating advanced computational analyses to ensure reliability and resilience. These analyses depend on disparate workflows comprising complex models, function calls, and data pipelines that require substantial expert knowledge and remain difficult to automate. Workforce and budget constraints further limit utilities' ability to apply such analyses at scale. To address this gap, we build an agentic system PowerChain, which is capable of autonomously performing complex grid analyses. Existing agentic AI systems are typically developed in a bottom-up manner with customized context for predefined analysis tasks; therefore, they do not generalize to tasks that the agent has never seen. In comparison, to generalize to unseen DG analysis tasks, PowerChain dynamically generates structured context by leveraging supervisory signals from self-contained power systems tools (e.g., GridLAB-D) and an optimized set of expert-annotated and verified reasoning trajectories. For complex DG tasks defined in natural language, empirical results on real utility data demonstrate that PowerChain achieves up to a 144/% improvement in performance over baselines.
Physiology-informed layered sensing for intelligent human-exoskeleton interaction
Wearable exoskeletons hold transformative promise for restoring mobility across diverse users with muscular weakness or other impairments. However, their translation beyond laboratory environments remains limited by sensing systems that capture movement but not underlying physiology. Here, we present a soft, lightweight smart leg sleeve that achieves anatomically aligned, layered multimodal sensing by integrating textile-based surface electromyography (sEMG) electrodes, ultrasensitive textile strain sensors, and inertial measurement units (IMUs). Each sensing modality targets a distinct physiological layer: IMUs track joint kinematics at the skeletal level, sEMG monitors muscle activation at the muscular level, and strain sensors detect skin deformation at the cutaneous level. Together, these sensors provide real-time perception to support three core objectives: controlling personalized assistance, optimizing user effort, and safeguarding against injury risks. The system is skin-conformal, mechanically compliant, and seamlessly integrated with a custom exoskeleton ($<20$~g total sensor and electronics weight). We demonstrate: (1) accurate ankle joint moment estimation (RMSE = 0.13~Nm/kg), (2) real-time classification of metabolic trends (accuracy = 97.1\%), and (3) injury risk detection within 100~ms (recall = 0.96), all validated on unseen users using a leave-one-subject-out protocol. This work establishes a physiology-aligned sensing architecture that reframes exoskeleton perception from motion tracking to real-time physiological decoding, offering a pathway towards intelligent, adaptive, and personalized wearable robotics.
comment: 21 pages, 5 figures, 43 references
Autonomous Cyber Resilience via a Co-Evolutionary Arms Race within a Fortified Digital Twin Sandbox SP
The convergence of Information Technology and Operational Technology has exposed Industrial Control Systems to adaptive, intelligent adversaries that render static defenses obsolete. This paper introduces the Adversarial Resilience Co-evolution (ARC) framework, addressing the "Trinity of Trust" comprising model fidelity, data integrity, and analytical resilience. ARC establishes a co-evolutionary arms race within a Fortified Secure Digital Twin (F-SCDT), where a Deep Reinforcement Learning "Red Agent" autonomously discovers attack paths while an ensemble-based "Blue Agent" is continuously hardened against these threats. Experimental validation on the Tennessee Eastman Process (TEP) and Secure Water Treatment (SWaT) testbeds demonstrates superior performance in detecting novel attacks, with F1-scores improving from 0.65 to 0.89 and detection latency reduced from over 1200 seconds to 210 seconds. A comprehensive ablation study reveals that the co-evolutionary process itself contributes a 27% performance improvement. By integrating Explainable AI and proposing a Federated ARC architecture, this work presents a necessary paradigm shift toward dynamic, self-improving security for critical infrastructure.
comment: 6 pages, 2 figures, 4 equations, 1 algorithm, 3 tables, to be published in ISPACS 2025, unabridged version exists as arXiv:2506.20102v1
End-to-End Learning Framework for Solving Non-Markovian Optimal Control
Integer-order calculus often falls short in capturing the long-range dependencies and memory effects found in many real-world processes. Fractional calculus addresses these gaps via fractional-order integrals and derivatives, but fractional-order dynamical systems pose substantial challenges in system identification and optimal control due to the lack of standard control methodologies. In this paper, we theoretically derive the optimal control via linear quadratic regulator (LQR) for fractional-order linear time-invariant (FOLTI) systems and develop an end-to-end deep learning framework based on this theoretical foundation. Our approach establishes a rigorous mathematical model, derives analytical solutions, and incorporates deep learning to achieve data-driven optimal control of FOLTI systems. Our key contributions include: (i) proposing an innovative system identification method control strategy for FOLTI systems, (ii) developing the first end-to-end data-driven learning framework, Fractional-Order Learning for Optimal Control (FOLOC), that learns control policies from observed trajectories, and (iii) deriving a theoretical analysis of sample complexity to quantify the number of samples required for accurate optimal control in complex real-world problems. Experimental results indicate that our method accurately approximates fractional-order system behaviors without relying on Gaussian noise assumptions, pointing to promising avenues for advanced optimal control.
Robotics
MimicKit: A Reinforcement Learning Framework for Motion Imitation and Control
MimicKit is an open-source framework for training motion controllers using motion imitation and reinforcement learning. The codebase provides implementations of commonly-used motion-imitation techniques and RL algorithms. This framework is intended to support research and applications in computer graphics and robotics by providing a unified training framework, along with standardized environment, agent, and data structures. The codebase is designed to be modular and easily configurable, enabling convenient modification and extension to new characters and tasks. The open-source codebase is available at: https://github.com/xbpeng/MimicKit.
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy
We introduce InternVLA-M1, a unified framework for spatial grounding and robot control that advances instruction-following robots toward scalable, general-purpose intelligence. Its core idea is spatially guided vision-language-action training, where spatial grounding serves as the critical link between instructions and robot actions. InternVLA-M1 employs a two-stage pipeline: (i) spatial grounding pre-training on over 2.3M spatial reasoning data to determine ``where to act'' by aligning instructions with visual, embodiment-agnostic positions, and (ii) spatially guided action post-training to decide ``how to act'' by generating embodiment-aware actions through plug-and-play spatial prompting. This spatially guided training recipe yields consistent gains: InternVLA-M1 outperforms its variant without spatial guidance by +14.6% on SimplerEnv Google Robot, +17% on WidowX, and +4.3% on LIBERO Franka, while demonstrating stronger spatial reasoning capability in box, point, and trace prediction. To further scale instruction following, we built a simulation engine to collect 244K generalizable pick-and-place episodes, enabling a 6.2% average improvement across 200 tasks and 3K+ objects. In real-world clustered pick-and-place, InternVLA-M1 improved by 7.3%, and with synthetic co-training, achieved +20.6% on unseen objects and novel configurations. Moreover, in long-horizon reasoning-intensive scenarios, it surpassed existing works by over 10%. These results highlight spatially guided training as a unifying principle for scalable and resilient generalist robots. Code and models are available at https://github.com/InternRobotics/InternVLA-M1.
comment: Technical report
Simplicial Embeddings Improve Sample Efficiency in Actor-Critic Agents
Recent works have proposed accelerating the wall-clock training time of actor-critic methods via the use of large-scale environment parallelization; unfortunately, these can sometimes still require large number of environment interactions to achieve a desired level of performance. Noting that well-structured representations can improve the generalization and sample efficiency of deep reinforcement learning (RL) agents, we propose the use of simplicial embeddings: lightweight representation layers that constrain embeddings to simplicial structures. This geometric inductive bias results in sparse and discrete features that stabilize critic bootstrapping and strengthen policy gradients. When applied to FastTD3, FastSAC, and PPO, simplicial embeddings consistently improve sample efficiency and final performance across a variety of continuous- and discrete-control environments, without any loss in runtime speed.
Hierarchical Discrete Lattice Assembly: An Approach for the Digital Fabrication of Scalable Macroscale Structures SC
Although digital fabrication processes at the desktop scale have become proficient and prolific, systems aimed at producing larger-scale structures are still typically complex, expensive, and unreliable. In this work, we present an approach for the fabrication of scalable macroscale structures using simple robots and interlocking lattice building blocks. A target structure is first voxelized so that it can be populated with an architected lattice. These voxels are then grouped into larger interconnected blocks, which are produced using standard digital fabrication processes, leveraging their capability to produce highly complex geometries at a small scale. These blocks, on the size scale of tens of centimeters, are then fed to mobile relative robots that are able to traverse over the structure and place new blocks to form structures on the meter scale. To facilitate the assembly of large structures, we introduce a live digital twin simulation tool for controlling and coordinating assembly robots that enables both global planning for a target structure and live user design, interaction, or intervention. To improve assembly throughput, we introduce a new modular assembly robot, designed for hierarchical voxel handling. We validate this system by demonstrating the voxelization, hierarchical blocking, path planning, and robotic fabrication of a set of meter-scale objects.
comment: In ACM Symposium on Computational Fabrication (SCF '25), November 20-21, 2025, Cambridge, MA, USA. ACM, New York, NY, USA, 15 pages
On Your Own: Pro-level Autonomous Drone Racing in Uninstrumented Arenas
Drone technology is proliferating in many industries, including agriculture, logistics, defense, infrastructure, and environmental monitoring. Vision-based autonomy is one of its key enablers, particularly for real-world applications. This is essential for operating in novel, unstructured environments where traditional navigation methods may be unavailable. Autonomous drone racing has become the de facto benchmark for such systems. State-of-the-art research has shown that autonomous systems can surpass human-level performance in racing arenas. However, direct applicability to commercial and field operations is still limited as current systems are often trained and evaluated in highly controlled environments. In our contribution, the system's capabilities are analyzed within a controlled environment -- where external tracking is available for ground-truth comparison -- but also demonstrated in a challenging, uninstrumented environment -- where ground-truth measurements were never available. We show that our approach can match the performance of professional human pilots in both scenarios. We also publicly release the data from the flights carried out by our approach and a world-class human pilot.
LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models
Visual-Language-Action (VLA) models report impressive success rates on robotic manipulation benchmarks, yet these results may mask fundamental weaknesses in robustness. We perform a systematic vulnerability analysis by introducing controlled perturbations across seven dimensions: objects layout, camera viewpoints, robot initial states, language instructions, light conditions, background textures and sensor noise. We comprehensively analyzed multiple state-of-the-art models and revealed consistent brittleness beneath apparent competence. Our analysis exposes critical weaknesses: models exhibit extreme sensitivity to perturbation factors, including camera viewpoints and robot initial states, with performance dropping from 95% to below 30% under modest perturbations. Surprisingly, models are largely insensitive to language variations, with further experiments revealing that models tend to ignore language instructions completely. Our findings challenge the assumption that high benchmark scores equate to true competency and highlight the need for evaluation practices that assess reliability under realistic variation.
A Modular Object Detection System for Humanoid Robots Using YOLO
Within the field of robotics, computer vision remains a significant barrier to progress, with many tasks hindered by inefficient vision systems. This research proposes a generalized vision module leveraging YOLOv9, a state-of-the-art framework optimized for computationally constrained environments like robots. The model is trained on a dataset tailored to the FIRA robotics Hurocup. A new vision module is implemented in ROS1 using a virtual environment to enable YOLO compatibility. Performance is evaluated using metrics such as frames per second (FPS) and Mean Average Precision (mAP). Performance is then compared to the existing geometric framework in static and dynamic contexts. The YOLO model achieved comparable precision at a higher computational cost then the geometric model, while providing improved robustness.
comment: 7 Figures, 5 tables. This article was presented at FIRA Summit 2025. It will be updated for journal submission
Characterizing Lidar Point-Cloud Adversities Using a Vector Field Visualization
In this paper we introduce a visualization methodology to aid a human analyst in classifying adversity modes that impact lidar scan matching. Our methodology is intended for offline rather than real-time analysis. The method generates a vector-field plot that characterizes local discrepancies between a pair of registered point clouds. The vector field plot reveals patterns that would be difficult for the analyst to extract from raw point-cloud data. After introducing our methodology, we apply the process to two proof-of-concept examples: one a simulation study and the other a field experiment. For both data sets, a human analyst was able to reason about a series of adversity mechanisms and iteratively remove those mechanisms from the raw data, to help focus attention on progressively smaller discrepancies.
comment: This is the preprint version of the paper published in: Proceedings of the 37th International Technical Meeting of the Satellite Division of The Institute of Navigation (ION GNSS+ 2024), September 2024 The final version is available at https://doi.org/10.33012/2024.19864
Efficient Force and Stiffness Prediction in Robotic Produce Handling with a Piezoresistive Pressure Sensor
Properly handling delicate produce with robotic manipulators is a major part of the future role of automation in agricultural harvesting and processing. Grasping with the correct amount of force is crucial in not only ensuring proper grip on the object, but also to avoid damaging or bruising the product. In this work, a flexible pressure sensor that is both low cost and easy to fabricate is integrated with robotic grippers for working with produce of varying shapes, sizes, and stiffnesses. The sensor is successfully integrated with both a rigid robotic gripper, as well as a pneumatically actuated soft finger. Furthermore, an algorithm is proposed for accelerated estimation of the steady-state value of the sensor output based on the transient response data, to enable real-time applications. The sensor is shown to be effective in incorporating feedback to correctly grasp objects of unknown sizes and stiffnesses. At the same time, the sensor provides estimates for these values which can be utilized for identification of qualities such as ripeness levels and bruising. It is also shown to be able to provide force feedback for objects of variable stiffnesses. This enables future use not only for produce identification, but also for tasks such as quality control and selective distribution based on ripeness levels.
comment: For supplementary videos, see https://drive.google.com/drive/folders/1jol-_z6gaUfjpL1Qi7EG420usTbVSodv?usp=sharing
PlanarMesh: Building Compact 3D Meshes from LiDAR using Incremental Adaptive Resolution Reconstruction
Building an online 3D LiDAR mapping system that produces a detailed surface reconstruction while remaining computationally efficient is a challenging task. In this paper, we present PlanarMesh, a novel incremental, mesh-based LiDAR reconstruction system that adaptively adjusts mesh resolution to achieve compact, detailed reconstructions in real-time. It introduces a new representation, planar-mesh, which combines plane modeling and meshing to capture both large surfaces and detailed geometry. The planar-mesh can be incrementally updated considering both local surface curvature and free-space information from sensor measurements. We employ a multi-threaded architecture with a Bounding Volume Hierarchy (BVH) for efficient data storage and fast search operations, enabling real-time performance. Experimental results show that our method achieves reconstruction accuracy on par with, or exceeding, state-of-the-art techniques-including truncated signed distance functions, occupancy mapping, and voxel-based meshing-while producing smaller output file sizes (10 times smaller than raw input and more than 5 times smaller than mesh-based methods) and maintaining real-time performance (around 2 Hz for a 64-beam sensor).
Active Tactile Exploration for Rigid Body Pose and Shape Estimation
General robot manipulation requires the handling of previously unseen objects. Learning a physically accurate model at test time can provide significant benefits in data efficiency, predictability, and reuse between tasks. Tactile sensing can compliment vision with its robustness to occlusion, but its temporal sparsity necessitates careful online exploration to maintain data efficiency. Direct contact can also cause an unrestrained object to move, requiring both shape and location estimation. In this work, we propose a learning and exploration framework that uses only tactile data to simultaneously determine the shape and location of rigid objects with minimal robot motion. We build on recent advances in contact-rich system identification to formulate a loss function that penalizes physical constraint violation without introducing the numerical stiffness inherent in rigid-body contact. Optimizing this loss, we can learn cuboid and convex polyhedral geometries with less than 10s of randomly collected data after first contact. Our exploration scheme seeks to maximize Expected Information Gain and results in significantly faster learning in both simulated and real-robot experiments. More information can be found at https://dairlab.github.io/activetactile
comment: 8 pages, 6 figures
Development of an Intuitive GUI for Non-Expert Teleoperation of Humanoid Robots
The operation of humanoid robotics is an essential field of research with many practical and competitive applications. Many of these systems, however, do not invest heavily in developing a non-expert-centered graphical user interface (GUI) for operation. The focus of this research is to develop a scalable GUI that is tailored to be simple and intuitive so non-expert operators can control the robot through a FIRA-regulated obstacle course. Using common practices from user interface development (UI) and understanding concepts described in human-robot interaction (HRI) and other related concepts, we will develop a new interface with the goal of a non-expert teleoperation system.
comment: 9 Figure. Presented at FIRA Summit 2025, Daegu, S. Korea
Hoecken-D Hand: A Novel Robotic Hand for Linear Parallel Pinching and Self-Adaptive Grasping IROS
This paper presents the Hoecken-D Hand, an underactuated robotic gripper that combines a modified Hoecken linkage with a differential spring mechanism to achieve both linear parallel pinching and a mid-stroke transition to adaptive envelope. The original Hoecken linkage is reconfigured by replacing one member with differential links, preserving straight-line guidance while enabling contact-triggered reconfiguration without additional actuators. A double-parallelogram arrangement maintains fingertip parallelism during conventional pinching, whereas the differential mechanism allows one finger to wrap inward upon encountering an obstacle, improving stability on irregular or thin objects. The mechanism can be driven by a single linear actuator, minimizing complexity and cost; in our prototype, each finger is driven by its own linear actuator for simplicity. We perform kinematic modeling and force analysis to characterize grasp performance, including simulated grasping forces and spring-opening behavior under varying geometric parameters. The design was prototyped using PLA-based 3D printing, achieving a linear pinching span of approximately 200 mm. Preliminary tests demonstrate reliable grasping in both modes across a wide range of object geometries, highlighting the Hoecken-D Hand as a compact, adaptable, and cost-effective solution for manipulation in unstructured environments.
comment: Accepted by IEEE International Conference on Robotics and Biomimetics (IROS) 2025, Hangzhou, China. This version includes updated contact information
Accelerated Feature Detectors for Visual SLAM: A Comparative Study of FPGA vs GPU
Feature detection is a common yet time-consuming module in Simultaneous Localization and Mapping (SLAM) implementations, which are increasingly deployed on power-constrained platforms, such as drones. Graphics Processing Units (GPUs) have been a popular accelerator for computer vision in general, and feature detection and SLAM in particular. On the other hand, System-on-Chips (SoCs) with integrated Field Programmable Gate Array (FPGA) are also widely available. This paper presents the first study of hardware-accelerated feature detectors considering a Visual SLAM (V-SLAM) pipeline. We offer new insights by comparing the best GPU-accelerated FAST, Harris, and SuperPoint implementations against the FPGA-accelerated counterparts on modern SoCs (Nvidia Jetson Orin and AMD Versal). The evaluation shows that when using a non-learning-based feature detector such as FAST and Harris, their GPU implementations, and the GPU-accelerated V-SLAM can achieve better run-time performance and energy efficiency than the FAST and Harris FPGA implementations as well as the FPGA-accelerated V-SLAM. However, when considering a learning-based detector such as SuperPoint, its FPGA implementation can achieve better run-time performance and energy efficiency (up to 3.1$\times$ and 1.4$\times$ improvements, respectively) than the GPU implementation. The FPGA-accelerated V-SLAM can also achieve comparable run-time performance compared to the GPU-accelerated V-SLAM, with better FPS in 2 out of 5 dataset sequences. When considering the accuracy, the results show that the GPU-accelerated V-SLAM is more accurate than the FPGA-accelerated V-SLAM in general. Last but not least, the use of hardware acceleration for feature detection could further improve the performance of the V-SLAM pipeline by having the global bundle adjustment module invoked less frequently without sacrificing accuracy.
comment: 12 pages, 7 figures
A Novel Robot Hand with Hoeckens Linkages and Soft Phalanges for Scooping and Self-Adaptive Grasping in Environmental Constraints IROS
This paper presents a novel underactuated adaptive robotic hand, Hockens-A Hand, which integrates the Hoeckens mechanism, a double-parallelogram linkage, and a specialized four-bar linkage to achieve three adaptive grasping modes: parallel pinching, asymmetric scooping, and enveloping grasping. Hockens-A Hand requires only a single linear actuator, leveraging passive mechanical intelligence to ensure adaptability and compliance in unstructured environments. Specifically, the vertical motion of the Hoeckens mechanism introduces compliance, the double-parallelogram linkage ensures line contact at the fingertip, and the four-bar amplification system enables natural transitions between different grasping modes. Additionally, the inclusion of a mesh-textured silicone phalanx further enhances the ability to envelop objects of various shapes and sizes. This study employs detailed kinematic analysis to optimize the push angle and design the linkage lengths for optimal performance. Simulations validated the design by analyzing the fingertip motion and ensuring smooth transitions between grasping modes. Furthermore, the grasping force was analyzed using power equations to enhance the understanding of the system's performance.Experimental validation using a 3D-printed prototype demonstrates the three grasping modes of the hand in various scenarios under environmental constraints, verifying its grasping stability and broad applicability.
comment: Accepted by IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025, Hangzhou. This version includes updated contact information
Bridge the Gap: Enhancing Quadruped Locomotion with Vertical Ground Perturbations
Legged robots, particularly quadrupeds, excel at navigating rough terrains, yet their performance under vertical ground perturbations, such as those from oscillating surfaces, remains underexplored. This study introduces a novel approach to enhance quadruped locomotion robustness by training the Unitree Go2 robot on an oscillating bridge - a 13.24-meter steel-and-concrete structure with a 2.0 Hz eigenfrequency designed to perturb locomotion. Using Reinforcement Learning (RL) with the Proximal Policy Optimization (PPO) algorithm in a MuJoCo simulation, we trained 15 distinct locomotion policies, combining five gaits (trot, pace, bound, free, default) with three training conditions: rigid bridge and two oscillating bridge setups with differing height regulation strategies (relative to bridge surface or ground). Domain randomization ensured zero-shot transfer to the real-world bridge. Our results demonstrate that policies trained on the oscillating bridge exhibit superior stability and adaptability compared to those trained on rigid surfaces. Our framework enables robust gait patterns even without prior bridge exposure. These findings highlight the potential of simulation-based RL to improve quadruped locomotion during dynamic ground perturbations, offering insights for designing robots capable of traversing vibrating environments.
Through the Lens of Doubt: Robust and Efficient Uncertainty Estimation for Visual Place Recognition
Visual Place Recognition (VPR) enables robots and autonomous vehicles to identify previously visited locations by matching current observations against a database of known places. However, VPR systems face significant challenges when deployed across varying visual environments, lighting conditions, seasonal changes, and viewpoints changes. Failure-critical VPR applications, such as loop closure detection in simultaneous localization and mapping (SLAM) pipelines, require robust estimation of place matching uncertainty. We propose three training-free uncertainty metrics that estimate prediction confidence by analyzing inherent statistical patterns in similarity scores from any existing VPR method. Similarity Distribution (SD) quantifies match distinctiveness by measuring score separation between candidates; Ratio Spread (RS) evaluates competitive ambiguity among top-scoring locations; and Statistical Uncertainty (SU) is a combination of SD and RS that provides a unified metric that generalizes across datasets and VPR methods without requiring validation data to select the optimal metric. All three metrics operate without additional model training, architectural modifications, or computationally expensive geometric verification. Comprehensive evaluation across nine state-of-the-art VPR methods and six benchmark datasets confirms that our metrics excel at discriminating between correct and incorrect VPR matches, and consistently outperform existing approaches while maintaining negligible computational overhead, making it deployable for real-time robotic applications across varied environmental conditions with improved precision-recall performance.
Physics-Informed Neural Network Modeling of Vehicle Collision Dynamics in Precision Immobilization Technique Maneuvers
Accurate prediction of vehicle collision dynamics is crucial for advanced safety systems and post-impact control applications, yet existing methods face inherent trade-offs among computational efficiency, prediction accuracy, and data requirements. This paper proposes a dual Physics-Informed Neural Network framework addressing these challenges through two complementary networks. The first network integrates Gaussian Mixture Models with PINN architecture to learn impact force distributions from finite element analysis data while enforcing momentum conservation and energy consistency constraints. The second network employs an adaptive PINN with dynamic constraint weighting to predict post-collision vehicle dynamics, featuring an adaptive physics guard layer that prevents unrealistic predictions whil e preserving data-driven learning capabilities. The framework incorporates uncertainty quantification through time-varying parameters and enables rapid adaptation via fine-tuning strategies. Validation demonstrates significant improvements: the impact force model achieves relative errors below 15.0% for force prediction on finite element analysis (FEA) datasets, while the vehicle dynamics model reduces average trajectory prediction error by 63.6% compared to traditional four-degree-of-freedom models in scaled vehicle experiments. The integrated system maintains millisecond-level computational efficiency suitable for real-time applications while providing probabilistic confidence bounds essential for safety-critical control. Comprehensive validation through FEA simulation, dynamic modeling, and scaled vehicle experiments confirms the framework's effectiveness for Precision Immobilization Technique scenarios and general collision dynamics prediction.
Real-Time Knee Angle Prediction Using EMG and Kinematic Data with an Attention-Based CNN-LSTM Network and Transfer Learning Across Multiple Datasets
Electromyography (EMG) signals are widely used for predicting body joint angles through machine learning (ML) and deep learning (DL) methods. However, these approaches often face challenges such as limited real-time applicability, non-representative test conditions, and the need for large datasets to achieve optimal performance. This paper presents a transfer-learning framework for knee joint angle prediction that requires only a few gait cycles from new subjects. Three datasets - Georgia Tech, the University of California Irvine (UCI), and the Sharif Mechatronic Lab Exoskeleton (SMLE) - containing four EMG channels relevant to knee motion were utilized. A lightweight attention-based CNN-LSTM model was developed and pre-trained on the Georgia Tech dataset, then transferred to the UCI and SMLE datasets. The proposed model achieved Normalized Mean Absolute Errors (NMAE) of 6.8 percent and 13.7 percent for one-step and 50-step predictions on abnormal subjects using EMG inputs alone. Incorporating historical knee angles reduced the NMAE to 3.1 percent and 3.5 percent for normal subjects, and to 2.8 percent and 7.5 percent for abnormal subjects. When further adapted to the SMLE exoskeleton with EMG, kinematic, and interaction force inputs, the model achieved 1.09 percent and 3.1 percent NMAE for one- and 50-step predictions, respectively. These results demonstrate robust performance and strong generalization for both short- and long-term rehabilitation scenarios.
A New Perspective on Transformers in Online Reinforcement Learning for Continuous Control
Despite their effectiveness and popularity in offline or model-based reinforcement learning (RL), transformers remain underexplored in online model-free RL due to their sensitivity to training setups and model design decisions such as how to structure the policy and value networks, share components, or handle temporal information. In this paper, we show that transformers can be strong baselines for continuous control in online model-free RL. We investigate key design questions: how to condition inputs, share components between actor and critic, and slice sequential data for training. Our experiments reveal stable architectural and training strategies enabling competitive performance across fully and partially observable tasks, and in both vector- and image-based settings. These findings offer practical guidance for applying transformers in online RL.
Adversarial Fine-tuning in Offline-to-Online Reinforcement Learning for Robust Robot Control
Offline reinforcement learning enables sample-efficient policy acquisition without risky online interaction, yet policies trained on static datasets remain brittle under action-space perturbations such as actuator faults. This study introduces an offline-to-online framework that trains policies on clean data and then performs adversarial fine-tuning, where perturbations are injected into executed actions to induce compensatory behavior and improve resilience. A performance-aware curriculum further adjusts the perturbation probability during training via an exponential-moving-average signal, balancing robustness and stability throughout the learning process. Experiments on continuous-control locomotion tasks demonstrate that the proposed method consistently improves robustness over offline-only baselines and converges faster than training from scratch. Matching the fine-tuning and evaluation conditions yields the strongest robustness to action-space perturbations, while the adaptive curriculum strategy mitigates the degradation of nominal performance observed with the linear curriculum strategy. Overall, the results show that adversarial fine-tuning enables adaptive and robust control under uncertain environments, bridging the gap between offline efficiency and online adaptability.
comment: 16 pages, 8 figures
MODUR: A Modular Dual-reconfigurable Robot
Modular Self-Reconfigurable Robot (MSRR) systems are a class of robots capable of forming higher-level robotic systems by altering the topological relationships between modules, offering enhanced adaptability and robustness in various environments. This paper presents a novel MSRR called MODUR, featuring dual-level reconfiguration capabilities designed to integrate reconfigurable mechanisms into MSRR. Specifically, MODUR can perform high-level self-reconfiguration among modules to create different configurations, while each module is also able to change its shape to execute basic motions. The design of MODUR primarily includes a compact connector and scissor linkage groups that provide actuation, forming a parallel mechanism capable of achieving both connector motion decoupling and adjacent position migration capabilities. Furthermore, the workspace, considering the interdependent connectors, is comprehensively analyzed, laying a theoretical foundation for the design of the module's basic motion. Finally, the motion of MODUR is validated through a series of experiments.
Tactile-Conditioned Diffusion Policy for Force-Aware Robotic Manipulation
Contact-rich manipulation depends on applying the correct grasp forces throughout the manipulation task, especially when handling fragile or deformable objects. Most existing imitation learning approaches often treat visuotactile feedback only as an additional observation, leaving applied forces as an uncontrolled consequence of gripper commands. In this work, we present Force-Aware Robotic Manipulation (FARM), an imitation learning framework that integrates high-dimensional tactile data to infer tactile-conditioned force signals, which in turn define a matching force-based action space. We collect human demonstrations using a modified version of the handheld Universal Manipulation Interface (UMI) gripper that integrates a GelSight Mini visual tactile sensor. For deploying the learned policies, we developed an actuated variant of the UMI gripper with geometry matching our handheld version. During policy rollouts, the proposed FARM diffusion policy jointly predicts robot pose, grip width, and grip force. FARM outperforms several baselines across three tasks with distinct force requirements -- high-force, low-force, and dynamic force adaptation -- demonstrating the advantages of its two key components: leveraging force-grounded, high-dimensional tactile observations and a force-based control space. The codebase and design files are open-sourced and available at https://tactile-farm.github.io .
DAMM-LOAM: Degeneracy Aware Multi-Metric LiDAR Odometry and Mapping IROS
LiDAR Simultaneous Localization and Mapping (SLAM) systems are essential for enabling precise navigation and environmental reconstruction across various applications. Although current point-to-plane ICP algorithms perform effec- tively in structured, feature-rich environments, they struggle in scenarios with sparse features, repetitive geometric structures, and high-frequency motion. This leads to degeneracy in 6- DOF pose estimation. Most state-of-the-art algorithms address these challenges by incorporating additional sensing modalities, but LiDAR-only solutions continue to face limitations under such conditions. To address these issues, we propose a novel Degeneracy-Aware Multi-Metric LiDAR Odometry and Map- ping (DAMM-LOAM) module. Our system improves mapping accuracy through point cloud classification based on surface normals and neighborhood analysis. Points are classified into ground, walls, roof, edges, and non-planar points, enabling accurate correspondences. A Degeneracy-based weighted least squares-based ICP algorithm is then applied for accurate odom- etry estimation. Additionally, a Scan Context based back-end is implemented to support robust loop closures. DAMM-LOAM demonstrates significant improvements in odometry accuracy, especially in indoor environments such as long corridors
comment: Accepted at IROS Active Perception Workshop
ALOHA2 Robot Kitchen Application Scenario Reproduction Report
ALOHA2 is an enhanced version of the dual-arm teleoperated robot ALOHA, featuring higher performance and robustness compared to the original design, while also being more ergonomic. Like ALOHA, ALOHA2 consists of two grippers and two ViperX 6-DoF arms, as well as two smaller WidowX arms. Users control the follower mechanical arms by operating the leader mechanical arms through back-driving. The device also includes cameras that generate images from multiple viewpoints, allowing for RGB data collection during teleoperation. The robot is mounted on a 48-inch x 30-inch table, equipped with an aluminum frame that provides additional mounting points for cameras and gravity compensation systems.
RoboHiMan: A Hierarchical Evaluation Paradigm for Compositional Generalization in Long-Horizon Manipulation
Enabling robots to flexibly schedule and compose learned skills for novel long-horizon manipulation under diverse perturbations remains a core challenge. Early explorations with end-to-end VLA models show limited success, as these models struggle to generalize beyond the training distribution. Hierarchical approaches, where high-level planners generate subgoals for low-level policies, bring certain improvements but still suffer under complex perturbations, revealing limited capability in skill composition. However, existing benchmarks primarily emphasize task completion in long-horizon settings, offering little insight into compositional generalization, robustness, and the interplay between planning and execution. To systematically investigate these gaps, we propose RoboHiMan, a hierarchical evaluation paradigm for compositional generalization in long-horizon manipulation. RoboHiMan introduces HiMan-Bench, a benchmark of atomic and compositional tasks under diverse perturbations, supported by a multi-level training dataset for analyzing progressive data scaling, and proposes three evaluation paradigms (vanilla, decoupled, coupled) that probe the necessity of skill composition and reveal bottlenecks in hierarchical architectures. Experiments highlight clear capability gaps across representative models and architectures, pointing to directions for advancing models better suited to real-world long-horizon manipulation tasks. Videos and open-source code can be found on our project website: https://chenyt31.github.io/robo-himan.github.io/.
comment: Under review. These first two authors contributed equally to this work
Safe Driving in Occluded Environments
Ensuring safe autonomous driving in the presence of occlusions poses a significant challenge in its policy design. While existing model-driven control techniques based on set invariance can handle visible risks, occlusions create latent risks in which safety-critical states are not observable. Data-driven techniques also struggle to handle latent risks because direct mappings from risk-critical objects in sensor inputs to safe actions cannot be learned without visible risk-critical objects. Motivated by these challenges, in this paper, we propose a probabilistic safety certificate for latent risk. Our key technical enabler is the application of probabilistic invariance: It relaxes the strict observability requirements imposed by set-invariance methods that demand the knowledge of risk-critical states. The proposed techniques provide linear action constraints that confine the latent risk probability within tolerance. Such constraints can be integrated into model predictive controllers or embedded in data-driven policies to mitigate latent risks. The proposed method is tested using the CARLA simulator and compared with a few existing techniques. The theoretical and empirical analysis jointly demonstrate that the proposed methods assure long-term safety in real-time control in occluded environments without being overly conservative and with transparency to exposed risks.
DriveCritic: Towards Context-Aware, Human-Aligned Evaluation for Autonomous Driving with Vision-Language Models
Benchmarking autonomous driving planners to align with human judgment remains a critical challenge, as state-of-the-art metrics like the Extended Predictive Driver Model Score (EPDMS) lack context awareness in nuanced scenarios. To address this, we introduce DriveCritic, a novel framework featuring two key contributions: the DriveCritic dataset, a curated collection of challenging scenarios where context is critical for correct judgment and annotated with pairwise human preferences, and the DriveCritic model, a Vision-Language Model (VLM) based evaluator. Fine-tuned using a two-stage supervised and reinforcement learning pipeline, the DriveCritic model learns to adjudicate between trajectory pairs by integrating visual and symbolic context. Experiments show DriveCritic significantly outperforms existing metrics and baselines in matching human preferences and demonstrates strong context awareness. Overall, our work provides a more reliable, human-aligned foundation to evaluating autonomous driving systems.
comment: 9 pages, 3 figures
VLA-0: Building State-of-the-Art VLAs with Zero Modification
Vision-Language-Action models (VLAs) hold immense promise for enabling generalist robot manipulation. However, the best way to build them remains an open question. Current approaches often add complexity, such as modifying the existing vocabulary of a Vision-Language Model (VLM) with action tokens or introducing special action heads. Curiously, the simplest strategy of representing actions directly as text has remained largely unexplored. This work introduces VLA-0 to investigate this idea. We find that VLA-0 is not only effective; it is surprisingly powerful. With the right design, VLA-0 outperforms more involved models. On LIBERO, a popular benchmark for evaluating VLAs, VLA-0 outperforms all existing methods trained on the same robotic data, including $\pi_0.5$-KI, OpenVLA-OFT and SmolVLA. Furthermore, without large-scale robotics-specific training, it outperforms methods trained on large-scale robotic data, like $\pi_0.5$-KI, $\pi_0$, GR00T-N1 and MolmoAct. These findings also translate to the real world, where VLA-0 outperforms SmolVLA, a VLA model pre-trained on large-scale real data. This paper summarizes our unexpected findings and spells out the specific techniques required to unlock the high performance of this simple yet potent VLA design. Visual results, code, and trained models are provided here: https://vla0.github.io/.
ViTacGen: Robotic Pushing with Vision-to-Touch Generation
Robotic pushing is a fundamental manipulation task that requires tactile feedback to capture subtle contact forces and dynamics between the end-effector and the object. However, real tactile sensors often face hardware limitations such as high costs and fragility, and deployment challenges involving calibration and variations between different sensors, while vision-only policies struggle with satisfactory performance. Inspired by humans' ability to infer tactile states from vision, we propose ViTacGen, a novel robot manipulation framework designed for visual robotic pushing with vision-to-touch generation in reinforcement learning to eliminate the reliance on high-resolution real tactile sensors, enabling effective zero-shot deployment on visual-only robotic systems. Specifically, ViTacGen consists of an encoder-decoder vision-to-touch generation network that generates contact depth images, a standardized tactile representation, directly from visual image sequence, followed by a reinforcement learning policy that fuses visual-tactile data with contrastive learning based on visual and generated tactile observations. We validate the effectiveness of our approach in both simulation and real world experiments, demonstrating its superior performance and achieving a success rate of up to 86\%.
Partial Feedback Linearization Control of a Cable-Suspended Multirotor Platform for Stabilization of an Attached Load IROS
In this work, we present a novel control approach based on partial feedback linearization (PFL) for the stabilization of a suspended aerial platform with an attached load. Such systems are envisioned for various applications in construction sites involving cranes, such as the holding and transportation of heavy objects. Our proposed control approach considers the underactuation of the whole system while utilizing its coupled dynamics for stabilization. We demonstrate using numerical stability analysis that these coupled terms are crucial for the stabilization of the complete system. We also carried out robustness analysis of the proposed approach in the presence of external wind disturbances, sensor noise, and uncertainties in system dynamics. As our envisioned target application involves cranes in outdoor construction sites, our control approaches rely on only onboard sensors, thus making it suitable for such applications. We carried out extensive simulation studies and experimental tests to validate our proposed control approach.
comment: Accepted for IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025
Optimistic Reinforcement Learning-Based Skill Insertions for Task and Motion Planning
Task and motion planning (TAMP) for robotics manipulation necessitates long-horizon reasoning involving versatile actions and skills. While deterministic actions can be crafted by sampling or optimizing with certain constraints, planning actions with uncertainty, i.e., probabilistic actions, remains a challenge for TAMP. On the contrary, Reinforcement Learning (RL) excels in acquiring versatile, yet short-horizon, manipulation skills that are robust with uncertainties. In this letter, we design a method that integrates RL skills into TAMP pipelines. Besides the policy, a RL skill is defined with data-driven logical components that enable the skill to be deployed by symbolic planning. A plan refinement sub-routine is designed to further tackle the inevitable effect uncertainties. In the experiments, we compare our method with baseline hierarchical planning from both TAMP and RL fields and illustrate the strength of the method. The results show that by embedding RL skills, we extend the capability of TAMP to domains with probabilistic skills, and improve the planning efficiency compared to the previous methods.
Adaptive Obstacle-Aware Task Assignment and Planning for Heterogeneous Robot Teaming
Multi-Agent Task Assignment and Planning (MATP) has attracted growing attention but remains challenging in terms of scalability, spatial reasoning, and adaptability in obstacle-rich environments. To address these challenges, we propose OATH: Adaptive Obstacle-Aware Task Assignment and Planning for Heterogeneous Robot Teaming, which advances MATP by introducing a novel obstacle-aware strategy for task assignment. First, we develop an adaptive Halton sequence map, the first known application of Halton sampling with obstacle-aware adaptation in MATP, which adjusts sampling density based on obstacle distribution. Second, we propose a cluster-auction-selection framework that integrates obstacle-aware clustering with weighted auctions and intra-cluster task selection. These mechanisms jointly enable effective coordination among heterogeneous robots while maintaining scalability and near-optimal allocation performance. In addition, our framework leverages an LLM to interpret human instructions and directly guide the planner in real time. We validate OATH in NVIDIA Isaac Sim, showing substantial improvements in task assignment quality, scalability, adaptability to dynamic changes, and overall execution performance compared to state-of-the-art MATP baselines. A project website is available at https://llm-oath.github.io/.
comment: 16 pages, 11 figures, 4 tables
Spatially Intelligent Patrol Routes for Concealed Emitter Localization by Robot Swarms
This paper introduces a method for designing spatially intelligent robot swarm behaviors to localize concealed radio emitters. We use differential evolution to generate geometric patrol routes that localize unknown signals independently of emitter parameters, a key challenge in electromagnetic surveillance. Patrol shape and antenna type are shown to influence information gain, which in turn determines the effective triangulation coverage. We simulate a four-robot swarm across eight configurations, assigning pre-generated patrol routes based on a specified patrol shape and sensing capability (antenna type: omnidirectional or directional). An emitter is placed within the map for each trial, with randomized position, transmission power and frequency. Results show that omnidirectional localization success rates are driven primarily by source location rather than signal properties, with failures occurring most often when sources are placed in peripheral areas of the map. Directional antennas are able to overcome this limitation due to their higher gain and directivity, with an average detection success rate of 98.75% compared to 80.25% for omnidirectional. Average localization errors range from 1.01-1.30 m for directional sensing and 1.67-1.90 m for omnidirectional sensing; while directional sensing also benefits from shorter patrol edges. These results demonstrate that a swarm's ability to predict electromagnetic phenomena is directly dependent on its physical interaction with the environment. Consequently, spatial intelligence, realized here through optimized patrol routes and antenna selection, is a critical design consideration for effective robotic surveillance.
A Diffusion-Refined Planner with Reinforcement Learning Priors for Confined-Space Parking
The growing demand for parking has increased the need for automated parking planning methods that can operate reliably in confined spaces. In restricted and complex environments, high-precision maneuvers are required to achieve a high success rate in planning, yet existing approaches often rely on explicit action modeling, which faces challenges when accurately modeling the optimal action distribution. In this paper, we propose DRIP, a diffusion-refined planner anchored in reinforcement learning (RL) prior action distribution, in which an RL-pretrained policy provides prior action distributions to regularize the diffusion training process. During the inference phase the denoising process refines these coarse priors into more precise action distributions. By steering the denoising trajectory through the reinforcement learning prior distribution during training, the diffusion model inherits a well-informed initialization, resulting in more accurate action modeling, a higher planning success rate, and reduced inference steps. We evaluate our approach across parking scenarios with varying degrees of spatial constraints. Experimental results demonstrate that our method significantly improves planning performance in confined-space parking environments while maintaining strong generalization in common scenarios.
EReLiFM: Evidential Reliability-Aware Residual Flow Meta-Learning for Open-Set Domain Generalization under Noisy Labels
Open-Set Domain Generalization (OSDG) aims to enable deep learning models to recognize unseen categories in new domains, which is crucial for real-world applications. Label noise hinders open-set domain generalization by corrupting source-domain knowledge, making it harder to recognize known classes and reject unseen ones. While existing methods address OSDG under Noisy Labels (OSDG-NL) using hyperbolic prototype-guided meta-learning, they struggle to bridge domain gaps, especially with limited clean labeled data. In this paper, we propose Evidential Reliability-Aware Residual Flow Meta-Learning (EReLiFM). We first introduce an unsupervised two-stage evidential loss clustering method to promote label reliability awareness. Then, we propose a residual flow matching mechanism that models structured domain- and category-conditioned residuals, enabling diverse and uncertainty-aware transfer paths beyond interpolation-based augmentation. During this meta-learning process, the model is optimized such that the update direction on the clean set maximizes the loss decrease on the noisy set, using pseudo labels derived from the most confident predicted class for supervision. Experimental results show that EReLiFM outperforms existing methods on OSDG-NL, achieving state-of-the-art performance. The source code is available at https://github.com/KPeng9510/ERELIFM.
comment: The source code is available at https://github.com/KPeng9510/ERELIFM
MTIL: Encoding Full History with Mamba for Temporal Imitation Learning
Standard imitation learning (IL) methods have achieved considerable success in robotics, yet often rely on the Markov assumption, which falters in long-horizon tasks where history is crucial for resolving perceptual ambiguity. This limitation stems not only from a conceptual gap but also from a fundamental computational barrier: prevailing architectures like Transformers are often constrained by quadratic complexity, rendering the processing of long, high-dimensional observation sequences infeasible. To overcome this dual challenge, we introduce Mamba Temporal Imitation Learning (MTIL). Our approach represents a new paradigm for robotic learning, which we frame as a practical synthesis of World Model and Dynamical System concepts. By leveraging the linear-time recurrent dynamics of State Space Models (SSMs), MTIL learns an implicit, action-oriented world model that efficiently encodes the entire trajectory history into a compressed, evolving state. This allows the policy to be conditioned on a comprehensive temporal context, transcending the confines of Markovian approaches. Through extensive experiments on simulated benchmarks (ACT, Robomimic, LIBERO) and on challenging real-world tasks, MTIL demonstrates superior performance against SOTA methods like ACT and Diffusion Policy, particularly in resolving long-term temporal ambiguities. Our findings not only affirm the necessity of full temporal context but also validate MTIL as a powerful and a computationally feasible approach for learning long-horizon, non-Markovian behaviors from high-dimensional observations.
comment: Published in IEEE Robotics and Automation Letters (RA-L), 2025. 8 pages, 5 figures
Hybrid Terrain-Aware Path Planning: Integrating VD-RRT* Exploration and VD-D* Lite Repair
Autonomous ground vehicles operating off-road must plan curvature-feasible paths while accounting for spatially varying soil strength and slope hazards in real time. We present a continuous state--cost metric that combines a Bekker pressure--sinkage model with elevation-derived slope and attitude penalties. The resulting terrain cost field is analytic, bounded, and monotonic in soil modulus and slope, ensuring well-posed discretization and stable updates under sensor noise. This metric is evaluated on a lattice with exact steering primitives: Dubins and Reeds--Shepp motions for differential drive and time-parameterized bicycle arcs for Ackermann steering. Global exploration is performed using Vehicle-Dynamics RRT\(^{*}\), while local repair is managed by Vehicle-Dynamics D\(^{*}\) Lite, enabling millisecond-scale replanning without heuristic smoothing. By separating the terrain--vehicle model from the planner, the framework provides a reusable basis for deterministic, sampling-based, or learning-driven planning in deformable terrain. Hardware trials on an off-road platform demonstrate real-time navigation across soft soil and slope transitions, supporting reliable autonomy in unstructured environments.
Product Digital Twin Supporting End-of-life Phase of Electric Vehicle Batteries Utilizing Product-Process-Resource Asset Network
In a circular economy, products in their end-of-life phase should be either remanufactured or recycled. Both of these processes are crucial for sustainability and environmental conservation. However, manufacturers frequently do not support these processes enough in terms of not sharing relevant data about the products nor their (re-)manufacturing processes. This paper proposes to accompany each product with a digital twin technology, specifically the Product Digital Twin (PDT), which can carry information for facilitating and optimizing production and remanufacturing processes. This paper introduces a knowledge representation called Bi-Flow Product-Process-Resource Asset Network (Bi-PAN). Bi-PAN extends a well-proven Product-Process-Resource Asset Network (PAN) paradigm by integrating both assembly and disassembly workflows into a single information model. Such networks enable capturing relevant relationships across products, production resources, manufacturing processes, and specific production operations that have to be done in the manufacturing phase of a product. The proposed approach is demonstrated in a use-case of disassembling electric vehicle (EV) batteries. By utilizing PDTs with Bi-PAN knowledge models, challenges associated with disassembling of EV batteries can be solved flexibly and efficiently for various battery types, enhancing the sustainability of the EV battery life-cycle management.
comment: \copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
More than A Point: Capturing Uncertainty with Adaptive Affordance Heatmaps for Spatial Grounding in Robotic Tasks
Many language-guided robotic systems rely on collapsing spatial reasoning into discrete points, making them brittle to perceptual noise and semantic ambiguity. To address this challenge, we propose RoboMAP, a framework that represents spatial targets as continuous, adaptive affordance heatmaps. This dense representation captures the uncertainty in spatial grounding and provides richer information for downstream policies, thereby significantly enhancing task success and interpretability. RoboMAP surpasses the previous state-of-the-art on a majority of grounding benchmarks with up to a 50x speed improvement, and achieves an 82\% success rate in real-world manipulation. Across extensive simulated and physical experiments, it demonstrates robust performance and shows strong zero-shot generalization to navigation. More details and videos can be found at https://robo-map.github.io.
comment: More details and videos can be found at https://robo-map.github.io
Flattening Hierarchies with Policy Bootstrapping NeurIPS 2025
Offline goal-conditioned reinforcement learning (GCRL) is a promising approach for pretraining generalist policies on large datasets of reward-free trajectories, akin to the self-supervised objectives used to train foundation models for computer vision and natural language processing. However, scaling GCRL to longer horizons remains challenging due to the combination of sparse rewards and discounting, which obscures the comparative advantages of primitive actions with respect to distant goals. Hierarchical RL methods achieve strong empirical results on long-horizon goal-reaching tasks, but their reliance on modular, timescale-specific policies and subgoal generation introduces significant additional complexity and hinders scaling to high-dimensional goal spaces. In this work, we introduce an algorithm to train a flat (non-hierarchical) goal-conditioned policy by bootstrapping on subgoal-conditioned policies with advantage-weighted importance sampling. Our approach eliminates the need for a generative model over the (sub)goal space, which we find is key for scaling to high-dimensional control in large state spaces. We further show that existing hierarchical and bootstrapping-based approaches correspond to specific design choices within our derivation. Across a comprehensive suite of state- and pixel-based locomotion and manipulation benchmarks, our method matches or surpasses state-of-the-art offline GCRL algorithms and scales to complex, long-horizon tasks where prior approaches fail. Project page: https://johnlyzhou.github.io/saw/
comment: NeurIPS 2025 (Spotlight, top 3.2%)
QuaDreamer: Controllable Panoramic Video Generation for Quadruped Robots
Panoramic cameras, capturing comprehensive 360-degree environmental data, are suitable for quadruped robots in surrounding perception and interaction with complex environments. However, the scarcity of high-quality panoramic training data-caused by inherent kinematic constraints and complex sensor calibration challenges-fundamentally limits the development of robust perception systems tailored to these embodied platforms. To address this issue, we propose QuaDreamer-the first panoramic data generation engine specifically designed for quadruped robots. QuaDreamer focuses on mimicking the motion paradigm of quadruped robots to generate highly controllable, realistic panoramic videos, providing a data source for downstream tasks. Specifically, to effectively capture the unique vertical vibration characteristics exhibited during quadruped locomotion, we introduce Vertical Jitter Encoding (VJE). VJE extracts controllable vertical signals through frequency-domain feature filtering and provides high-quality prompts. To facilitate high-quality panoramic video generation under jitter signal control, we propose a Scene-Object Controller (SOC) that effectively manages object motion and boosts background jitter control through the attention mechanism. To address panoramic distortions in wide-FoV video generation, we propose the Panoramic Enhancer (PE)-a dual-stream architecture that synergizes frequency-texture refinement for local detail enhancement with spatial-structure correction for global geometric consistency. We further demonstrate that the generated video sequences can serve as training data for the quadruped robot's panoramic visual perception model, enhancing the performance of multi-object tracking in 360-degree scenes. The source code and model weights will be publicly available at https://github.com/losehu/QuaDreamer.
comment: Accepted to CoRL 2025. The source code and model weights will be publicly available at https://github.com/losehu/QuaDreamer
LLM-Enabled In-Context Learning for Data Collection Scheduling in UAV-assisted Sensor Networks
Unmanned Aerial Vehicles (UAVs) are increasingly being utilized in various private and commercial applications, e.g., traffic control, parcel delivery, and Search and Rescue (SAR) missions. Machine Learning (ML) methods used in UAV-Assisted Sensor Networks (UASNETs) and, especially, in Deep Reinforcement Learning (DRL) face challenges such as complex and lengthy model training, gaps between simulation and reality, and low sampling efficiency, which conflict with the urgency of emergencies, such as SAR missions. In this paper, an In-Context Learning (ICL)-Data Collection Scheduling (ICLDC) system is proposed as an alternative to DRL in emergencies. The UAV collects sensory data and transmits it to a Large Language Model (LLM), which creates a task description in natural language. From this description, the UAV receives a data collection schedule that must be executed. A verifier ensures safe UAV operations by evaluating the schedules generated by the LLM and overriding unsafe schedules based on predefined rules. The system continuously adapts by incorporating feedback into the task descriptions and using this for future decisions. This method is tested against jailbreaking attacks, where the task description is manipulated to undermine network performance, highlighting the vulnerability of LLMs to such attacks. The proposed ICLDC significantly reduces cumulative packet loss compared to both the DQN and Maximum Channel Gain baselines. ICLDC presents a promising direction for intelligent scheduling and control in UASNETs.
Dual-Regularized Riccati Recursions for Interior-Point Optimal Control
We derive closed-form extensions of Riccati's recursions (both sequential and parallel) for solving dual-regularized LQR problems. We show how these methods can be used to solve general constrained, non-convex, discrete-time optimal control problems via a regularized interior point method, while guaranteeing that each step is a descent direction of an Augmented Barrier-Lagrangian merit function. We provide MIT-licensed implementations of our methods in C++ and JAX.
Extended Friction Models for the Physics Simulation of Servo Actuators
Accurate physical simulation is crucial for the development and validation of control algorithms in robotic systems. Recent works in Reinforcement Learning (RL) take notably advantage of extensive simulations to produce efficient robot control. State-of-the-art servo actuator models generally fail at capturing the complex friction dynamics of these systems. This limits the transferability of simulated behaviors to real-world applications. In this work, we present extended friction models that allow to more accurately simulate servo actuator dynamics. We propose a comprehensive analysis of various friction models, present a method for identifying model parameters using recorded trajectories from a pendulum test bench, and demonstrate how these models can be integrated into physics engines. The proposed friction models are validated on four distinct servo actuators and tested on 2R manipulators, showing significant improvements in accuracy over the standard Coulomb-Viscous model. Our results highlight the importance of considering advanced friction effects in the simulation of servo actuators to enhance the realism and reliability of robotic simulations.
Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision
Robot vision has greatly benefited from advancements in multimodal fusion techniques and vision-language models (VLMs). We adopt a task-oriented perspective to systematically review the applications and advancements of multimodal fusion methods and VLMs in the field of robot vision. For semantic scene understanding tasks, we categorize fusion approaches into encoder-decoder frameworks, attention-based architectures, and graph neural networks. Meanwhile, we also analyze the architectural characteristics and practical implementations of these fusion strategies in key tasks such as simultaneous localization and mapping (SLAM), 3D object detection, navigation, and manipulation. We compare the evolutionary paths and applicability of VLMs based on large language models (LLMs) with traditional multimodal fusion methods.Additionally, we conduct an in-depth analysis of commonly used datasets, evaluating their applicability and challenges in real-world robotic scenarios. Building on this analysis, we identify key challenges in current research, including cross-modal alignment, efficient fusion, real-time deployment, and domain adaptation. We propose future directions such as self-supervised learning for robust multimodal representations, structured spatial memory and environment modeling to enhance spatial intelligence, and the integration of adversarial robustness and human feedback mechanisms to enable ethically aligned system deployment. Through a comprehensive review, comparative analysis, and forward-looking discussion, we provide a valuable reference for advancing multimodal perception and interaction in robotic vision. A comprehensive list of studies in this survey is available at https://github.com/Xiaofeng-Han-Res/MF-RV.
comment: 27 pages, 11 figures. Accepted to Information Fusion. Final journal version: volume 126 (Part B), February 2026
Towards Proprioception-Aware Embodied Planning for Dual-Arm Humanoid Robots
In recent years, Multimodal Large Language Models (MLLMs) have demonstrated the ability to serve as high-level planners, enabling robots to follow complex human instructions. However, their effectiveness, especially in long-horizon tasks involving dual-arm humanoid robots, remains limited. This limitation arises from two main challenges: (i) the absence of simulation platforms that systematically support task evaluation and data collection for humanoid robots, and (ii) the insufficient embodiment awareness of current MLLMs, which hinders reasoning about dual-arm selection logic and body positions during planning. To address these issues, we present DualTHOR, a new dual-arm humanoid simulator, with continuous transition and a contingency mechanism. Building on this platform, we propose Proprio-MLLM, a model that enhances embodiment awareness by incorporating proprioceptive information with motion-based position embedding and a cross-spatial encoder. Experiments show that, while existing MLLMs struggle in this environment, Proprio-MLLM achieves an average improvement of 19.75% in planning performance. Our work provides both an essential simulation platform and an effective model to advance embodied intelligence in humanoid robotics. The code is available at https://anonymous.4open.science/r/DualTHOR-5F3B.
GARField: Addressing the visual Sim-to-Real gap in garment manipulation with mesh-attached radiance fields
While humans intuitively manipulate garments and other textile items swiftly and accurately, it is a significant challenge for robots. A factor crucial to human performance is the ability to imagine, a priori, the intended result of the manipulation intents and hence develop predictions on the garment pose. That ability allows us to plan from highly obstructed states, adapt our plans as we collect more information and react swiftly to unforeseen circumstances. Conversely, robots struggle to establish such intuitions and form tight links between plans and observations. We can partly attribute this to the high cost of obtaining densely labelled data for textile manipulation, both in quality and quantity. The problem of data collection is a long-standing issue in data-based approaches to garment manipulation. As of today, generating high-quality and labelled garment manipulation data is mainly attempted through advanced data capture procedures that create simplified state estimations from real-world observations. However, this work proposes a novel approach to the problem by generating real-world observations from object states. To achieve this, we present GARField (Garment Attached Radiance Field), the first differentiable rendering architecture, to our knowledge, for data generation from simulated states stored as triangle meshes. Code is available on https://ddonatien.github.io/garfield-website/
comment: Project site: https://ddonatien.github.io/garfield-website/
Geometric Backstepping Control of Omnidirectional Tiltrotors Incorporating Servo-Rotor Dynamics for Robustness against Sudden Disturbances
This work presents a geometric backstepping controller for a variable-tilt omnidirectional multirotor that explicitly accounts for both servo and rotor dynamics. Considering actuator dynamics is essential for more effective and reliable operation, particularly during aggressive flight maneuvers or recovery from sudden disturbances. While prior studies have investigated actuator-aware control for conventional and fixed-tilt multirotors, these approaches rely on linear relationships between actuator input and wrench, which cannot capture the nonlinearities induced by variable tilt angles. In this work, we exploit the cascade structure between the rigid-body dynamics of the multirotor and its nonlinear actuator dynamics to design the proposed backstepping controller and establish exponential stability of the overall system. Furthermore, we reveal parametric uncertainty in the actuator model through experiments, and we demonstrate that the proposed controller remains robust against such uncertainty. The controller was compared against a baseline that does not account for actuator dynamics across three experimental scenarios: fast translational tracking, rapid rotational tracking, and recovery from sudden disturbance. The proposed method consistently achieved better tracking performance, and notably, while the baseline diverged and crashed during the fastest translational trajectory tracking and the recovery experiment, the proposed controller maintained stability and successfully completed the tasks, thereby demonstrating its effectiveness.
Robust Statistics vs. Machine Learning vs. Bayesian Inference: Insights into Handling Faulty GNSS Measurements in Field Robotics IROS2025
This paper presents research findings on handling faulty measurements (i.e., outliers) of global navigation satellite systems (GNSS) for vehicle localization under adverse signal conditions in field applications, where raw GNSS data are frequently corrupted due to environmental interference such as multipath, signal blockage, or non-line-of-sight conditions. In this context, we investigate three strategies applied specifically to GNSS pseudorange observations: robust statistics for error mitigation, machine learning for faulty measurement prediction, and Bayesian inference for noise distribution approximation. Since previous studies have provided limited insight into the theoretical foundations and practical evaluations of these three methodologies within a unified problem statement (i.e., state estimation using ranging sensors), we conduct extensive experiments using real-world sensor data collected in diverse urban environments. Our goal is to examine both established techniques and newly proposed methods, thereby advancing the understanding of how to handle faulty range measurements, such as GNSS, for robust, long-term vehicle localization. In addition to presenting successful results, this work highlights critical observations and open questions to motivate future research in robust state estimation.
comment: Accepted to the 2nd Workshop on Safety of Intelligent and Autonomous Vehicles: Formal Methods vs. Machine Learning approaches for reliable navigation (SIAV-FM2L) at IEEE IROS2025
EO-1: Interleaved Vision-Text-Action Pretraining for General Robot Control
The human ability to seamlessly perform multimodal reasoning and physical interaction in the open world is a core goal for general-purpose embodied intelligent systems. Recent vision-language-action (VLA) models, which are co-trained on large-scale robot and visual-text data, have demonstrated notable progress in general robot control. However, they still fail to achieve human-level flexibility in interleaved reasoning and interaction. In this work, introduce EO-Robotics, consists of EO-1 model and EO-Data1.5M dataset. EO-1 is a unified embodied foundation model that achieves superior performance in multimodal embodied reasoning and robot control through interleaved vision-text-action pre-training. The development of EO-1 is based on two key pillars: (i) a unified architecture that processes multimodal inputs indiscriminately (image, text, video, and action), and (ii) a massive, high-quality multimodal embodied reasoning dataset, EO-Data1.5M, which contains over 1.5 million samples with emphasis on interleaved vision-text-action comprehension. EO-1 is trained through synergies between auto-regressive decoding and flow matching denoising on EO-Data1.5M, enabling seamless robot action generation and multimodal embodied reasoning. Extensive experiments demonstrate the effectiveness of interleaved vision-text-action learning for open-world understanding and generalization, validated through a variety of long-horizon, dexterous manipulation tasks across multiple embodiments. This paper details the architecture of EO-1, the data construction strategy of EO-Data1.5M, and the training methodology, offering valuable insights for developing advanced embodied foundation models.
USIM and U0: A Vision-Language-Action Dataset and Model for General Underwater Robots
Underwater environments present unique challenges for robotic operation, including complex hydrodynamics, limited visibility, and constrained communication. Although data-driven approaches have advanced embodied intelligence in terrestrial robots and enabled task-specific autonomous underwater robots, developing underwater intelligence capable of autonomously performing multiple tasks remains highly challenging, as large-scale, high-quality underwater datasets are still scarce. To address these limitations, we introduce USIM, a simulation-based multi-task Vision-Language-Action (VLA) dataset for underwater robots. USIM comprises over 561K frames from 1,852 trajectories, totaling approximately 15.6 hours of BlueROV2 interactions across 20 tasks in 9 diverse scenarios, ranging from visual navigation to mobile manipulation. Building upon this dataset, we propose U0, a VLA model for general underwater robots, which integrates binocular vision and other sensor modalities through multimodal fusion, and further incorporates a convolution-attention-based perception focus enhancement module (CAP) to improve spatial understanding and mobile manipulation. Across tasks such as inspection, obstacle avoidance, scanning, and dynamic tracking, the framework achieves a success rate of 80%, while in challenging mobile manipulation tasks, it reduces the distance to the target by 21.2% compared with baseline methods, demonstrating its effectiveness. USIM and U0 show that VLA models can be effectively applied to underwater robotic applications, providing a foundation for scalable dataset construction, improved task autonomy, and the practical realization of intelligent general underwater robots.
comment: Project Page: https://vincentgu2000.github.io/u0project/
Hi-Drive: Hierarchical POMDP Planning for Safe Autonomous Driving in Diverse Urban Environments
Uncertainties in dynamic road environments pose significant challenges for behavior and trajectory planning in autonomous driving. This paper introduces Hi-Drive, a hierarchical planning algorithm addressing uncertainties at both behavior and trajectory levels using a hierarchical Partially Observable Markov Decision Process (POMDP) formulation. Hi-Drive employs driver models to represent uncertain behavioral intentions of other vehicles and uses their parameters to infer hidden driving styles. By treating driver models as high-level decision-making actions, our approach effectively manages the exponential complexity inherent in POMDPs. To further enhance safety and robustness, Hi-Drive integrates a trajectory optimization based on importance sampling, refining trajectories using a comprehensive analysis of critical agents. Evaluations on real-world urban driving datasets demonstrate that Hi-Drive significantly outperforms state-of-the-art planning-based and learning-based methods across diverse urban driving situations in real-world benchmarks.
A Verification Methodology for Safety Assurance of Robotic Autonomous Systems
Autonomous robots deployed in shared human environments, such as agricultural settings, require rigorous safety assurance to meet both functional reliability and regulatory compliance. These systems must operate in dynamic, unstructured environments, interact safely with humans, and respond effectively to a wide range of potential hazards. This paper presents a verification workflow for the safety assurance of an autonomous agricultural robot, covering the entire development life-cycle, from concept study and design to runtime verification. The outlined methodology begins with a systematic hazard analysis and risk assessment to identify potential risks and derive corresponding safety requirements. A formal model of the safety controller is then developed to capture its behaviour and verify that the controller satisfies the specified safety properties with respect to these requirements. The proposed approach is demonstrated on a field robot operating in an agricultural setting. The results show that the methodology can be effectively used to verify safety-critical properties and facilitate the early identification of design issues, contributing to the development of safer robots and autonomous systems.
comment: In Proc. of the 26th TAROS (Towards Autonomous Robotic Systems) Conference, York, UK, August, 2025
RealEngine: Simulating Autonomous Driving in Realistic Context
Driving simulation plays a crucial role in developing reliable driving agents by providing controlled, evaluative environments. To enable meaningful assessments, a high-quality driving simulator must satisfy several key requirements: multi-modal sensing capabilities (e.g., camera and LiDAR) with realistic scene rendering to minimize observational discrepancies; closed-loop evaluation to support free-form trajectory behaviors; highly diverse traffic scenarios for thorough evaluation; multi-agent cooperation to capture interaction dynamics; and high computational efficiency to ensure affordability and scalability. However, existing simulators and benchmarks fail to comprehensively meet these fundamental criteria. To bridge this gap, this paper introduces RealEngine, a novel driving simulation framework that holistically integrates 3D scene reconstruction and novel view synthesis techniques to achieve realistic and flexible closed-loop simulation in the driving context. By leveraging real-world multi-modal sensor data, RealEngine reconstructs background scenes and foreground traffic participants separately, allowing for highly diverse and realistic traffic scenarios through flexible scene composition. This synergistic fusion of scene reconstruction and view synthesis enables photorealistic rendering across multiple sensor modalities, ensuring both perceptual fidelity and geometric accuracy. Building upon this environment, RealEngine supports three essential driving simulation categories: non-reactive simulation, safety testing, and multi-agent interaction, collectively forming a reliable and comprehensive benchmark for evaluating the real-world performance of driving agents.
A Hierarchical Bin Packing Framework with Dual Manipulators via Heuristic Search and Deep Reinforcement Learning
We address the bin packing problem (BPP), which aims to maximize bin utilization when packing a variety of items. The offline problem, where the complete information about the item set and their sizes is known in advance, is proven to be NP-hard. The semi-online and online variants are even more challenging, as full information about incoming items is unavailable. While existing methods have tackled both 2D and 3D BPPs, the 2D BPP remains underexplored in terms of fully maximizing utilization. We propose a hierarchical approach for solving the 2D online and semi-online BPP by combining deep reinforcement learning (RL) with heuristic search. The heuristic search selects which item to pack or unpack, determines the packing order, and chooses the orientation of each item, while the RL agent decides the precise position within the bin. Our method is capable of handling diverse scenarios, including repacking, varying levels of item information, differing numbers of accessible items, and coordination of dual manipulators. Experimental results demonstrate that our approach achieves near-optimal utilization across various practical scenarios, largely due to its repacking capability. In addition, the algorithm is evaluated in a physics-based simulation environment, where execution time is measured to assess its real-world performance.
Inland-LOAM: Voxel-Based Structural Semantic LiDAR Odometry and Mapping for Inland Waterway Navigation
Accurate geospatial information is crucial for safe, autonomous Inland Waterway Transport (IWT), as existing charts (IENC) lack real-time detail and conventional LiDAR SLAM fails in waterway environments. These challenges lead to vertical drift and non-semantic maps, hindering autonomous navigation. This paper introduces Inland-LOAM, a LiDAR SLAM framework for waterways. It uses an improved feature extraction and a water surface planar constraint to mitigate vertical drift. A novel pipeline transforms 3D point clouds into structured 2D semantic maps using voxel-based geometric analysis, enabling real-time computation of navigational parameters like bridge clearances. An automated module extracts shorelines and exports them into a lightweight, IENC-compatible format. Evaluations on a real-world dataset show Inland-LOAM achieves superior localization accuracy over state-of-the-art methods. The generated semantic maps and shorelines align with real-world conditions, providing reliable data for enhanced situational awareness. The code and dataset will be publicly available
EmbodiedCoder: Parameterized Embodied Mobile Manipulation via Modern Coding Model
Recent advances in control robot methods, from end-to-end vision-language-action frameworks to modular systems with predefined primitives, have advanced robots' ability to follow natural language instructions. Nonetheless, many approaches still struggle to scale to diverse environments, as they often rely on large annotated datasets and offer limited interpretability.In this work, we introduce EmbodiedCoder, a training-free framework for open-world mobile robot manipulation that leverages coding models to directly generate executable robot trajectories. By grounding high-level instructions in code, EmbodiedCoder enables flexible object geometry parameterization and manipulation trajectory synthesis without additional data collection or fine-tuning.This coding-based paradigm provides a transparent and generalizable way to connect perception with manipulation. Experiments on real mobile robots show that EmbodiedCoder achieves robust performance across diverse long-term tasks and generalizes effectively to novel objects and environments.Our results demonstrate an interpretable approach for bridging high-level reasoning and low-level control, moving beyond fixed primitives toward versatile robot intelligence. See the project page at: https://embodiedcoder.github.io/EmbodiedCoder/
comment: Demo Page: https://embodiedcoder.github.io/EmbodiedCoder/
Tiny Learning-Based MPC for Multirotors: Solver-Aware Learning for Efficient Embedded Predictive Control
Tiny aerial robots hold great promise for applications such as environmental monitoring and search-and-rescue, yet face significant control challenges due to limited onboard computing power and nonlinear dynamics. Model Predictive Control (MPC) enables agile trajectory tracking and constraint handling but depends on an accurate dynamics model. While existing Learning-Based (LB) MPC methods, such as Gaussian Process (GP) MPC, enhance performance by learning residual dynamics, their high computational cost restricts onboard deployment on tiny robots. This paper introduces Tiny LB MPC, a co-designed MPC framework and optimization solver for resource-constrained micro multirotor platforms. The proposed approach achieves 100 Hz control on a Crazyflie 2.1 equipped with a Teensy 4.0 microcontroller, demonstrating a 43% average improvement in tracking performance over existing embedded MPC methods under model uncertainty, and achieving the first onboard implementation of LB MPC on a 53 g multirotor.
Ctrl-World: A Controllable Generative World Model for Robot Manipulation
Generalist robot policies can now perform a wide range of manipulation skills, but evaluating and improving their ability with unfamiliar objects and instructions remains a significant challenge. Rigorous evaluation requires a large number of real-world rollouts, while systematic improvement demands additional corrective data with expert labels. Both of these processes are slow, costly, and difficult to scale. World models offer a promising, scalable alternative by enabling policies to rollout within imagination space. However, a key challenge is building a controllable world model that can handle multi-step interactions with generalist robot policies. This requires a world model compatible with modern generalist policies by supporting multi-view prediction, fine-grained action control, and consistent long-horizon interactions, which is not achieved by previous works. In this paper, we make a step forward by introducing a controllable multi-view world model that can be used to evaluate and improve the instruction-following ability of generalist robot policies. Our model maintains long-horizon consistency with a pose-conditioned memory retrieval mechanism and achieves precise action control through frame-level action conditioning. Trained on the DROID dataset (95k trajectories, 564 scenes), our model generates spatially and temporally consistent trajectories under novel scenarios and new camera placements for over 20 seconds. We show that our method can accurately rank policy performance without real-world robot rollouts. Moreover, by synthesizing successful trajectories in imagination and using them for supervised fine-tuning, our approach can improve policy success by 44.7\%.
comment: 17 pages
A Faster and More Reliable Middleware for Autonomous Driving Systems
Ensuring safety in high-speed autonomous vehicles requires rapid control loops and tightly bounded delays from perception to actuation. Many open-source autonomy systems rely on ROS 2 middleware; when multiple sensor and control nodes share one compute unit, ROS 2 and its DDS transports add significant (de)serialization, copying, and discovery overheads, shrinking the available time budget. We present Sensor-in-Memory (SIM), a shared-memory transport designed for intra-host pipelines in autonomous vehicles. SIM keeps sensor data in native memory layouts (e.g., cv::Mat, PCL), uses lock-free bounded double buffers that overwrite old data to prioritize freshness, and integrates into ROS 2 nodes with four lines of code. Unlike traditional middleware, SIM operates beside ROS 2 and is optimized for applications where data freshness and minimal latency outweigh guaranteed completeness. SIM provides sequence numbers, a writer heartbeat, and optional checksums to ensure ordering, liveness, and basic integrity. On an NVIDIA Jetson Orin Nano, SIM reduces data-transport latency by up to 98% compared to ROS 2 zero-copy transports such as FastRTPS and Zenoh, lowers mean latency by about 95%, and narrows 95th/99th-percentile tail latencies by around 96%. In tests on a production-ready Level 4 vehicle running Autoware.Universe, SIM increased localization frequency from 7.5 Hz to 9.5 Hz. Applied across all latency-critical modules, SIM cut average perception-to-decision latency from 521.91 ms to 290.26 ms, reducing emergency braking distance at 40 mph (64 km/h) on dry concrete by 13.6 ft (4.14 m).
comment: 8 pages,7 figures, 8 tables
Predictive Preference Learning from Human Interventions NeurIPS 2025
Learning from human involvement aims to incorporate the human subject to monitor and correct agent behavior errors. Although most interactive imitation learning methods focus on correcting the agent's action at the current state, they do not adjust its actions in future states, which may be potentially more hazardous. To address this, we introduce Predictive Preference Learning from Human Interventions (PPL), which leverages the implicit preference signals contained in human interventions to inform predictions of future rollouts. The key idea of PPL is to bootstrap each human intervention into L future time steps, called the preference horizon, with the assumption that the agent follows the same action and the human makes the same intervention in the preference horizon. By applying preference optimization on these future states, expert corrections are propagated into the safety-critical regions where the agent is expected to explore, significantly improving learning efficiency and reducing human demonstrations needed. We evaluate our approach with experiments on both autonomous driving and robotic manipulation benchmarks and demonstrate its efficiency and generality. Our theoretical analysis further shows that selecting an appropriate preference horizon L balances coverage of risky states with label correctness, thereby bounding the algorithmic optimality gap. Demo and code are available at: https://metadriverse.github.io/ppl
comment: NeurIPS 2025 Spotlight. Project page: https://metadriverse.github.io/ppl
BlueME: Robust Underwater Robot-to-Robot Communication Using Compact Magnetoelectric Antennas
We present the design, development, and experimental validation of BlueME, a compact magnetoelectric (ME) antenna array system for underwater robot-to-robot communication. BlueME employs ME antennas operating at their natural mechanical resonance frequency to efficiently transmit and receive very-low-frequency (VLF) electromagnetic signals underwater. We outline the design, simulation, fabrication, and integration of the proposed system on low-power embedded platforms, focusing on portable and scalable applications. For performance evaluation, we deployed BlueME on an autonomous surface vehicle (ASV) and a remotely operated vehicle (ROV) in open-water field trials. Ocean trials demonstrate that BlueME maintains reliable signal transmission at distances beyond 700 meters while consuming only 10 watts of power. Field trials show that the system operates effectively in challenging underwater conditions such as turbidity, obstacles, and multipath interference -- conditions that generally affect acoustics and optics. Our analysis also examines the impact of complete submersion on system performance and identifies key deployment considerations. This work represents the first practical underwater deployment of ME antennas outside the laboratory and implements the largest VLF ME array system to date. BlueME demonstrates significant potential for marine robotics and automation in multi-robot cooperative systems and remote sensor networks.
Seeing through Uncertainty: Robust Task-Oriented Optimization in Visual Navigation
Visual navigation is a fundamental problem in embodied AI, yet practical deployments demand long-horizon planning capabilities to address multi-objective tasks. A major bottleneck is data scarcity: policies learned from limited data often overfit and fail to generalize OOD. Existing neural network-based agents typically increase architectural complexity that paradoxically become counterproductive in the small-sample regime. This paper introduce NeuRO, a integrated learning-to-optimize framework that tightly couples perception networks with downstream task-level robust optimization. Specifically, NeuRO addresses core difficulties in this integration: (i) it transforms noisy visual predictions under data scarcity into convex uncertainty sets using Partially Input Convex Neural Networks (PICNNs) with conformal calibration, which directly parameterize the optimization constraints; and (ii) it reformulates planning under partial observability as a robust optimization problem, enabling uncertainty-aware policies that transfer across environments. Extensive experiments on both unordered and sequential multi-object navigation tasks demonstrate that NeuRO establishes SoTA performance, particularly in generalization to unseen environments. Our work thus presents a significant advancement for developing robust, generalizable autonomous agents.
AI-Agents for Culturally Diverse Online Higher Education Environments
As the global reach of online higher education continues to grow, universities are increasingly accommodating students from diverse cultural backgrounds (Tereshko et al., 2024). This can present a number of challenges including linguistic barriers (Ullah et al., 2021), cultural differences in learning style (Omidvar & Tan, 2012), cultural sensitivity in course design (Nguyen, 2022) and perceived isolation when students feel their perspectives or experiences are not reflected or valued in the learning environment (Hansen-Brown et al., 2022). Ensuring active engagement and reasonable learning outcomes in such a environments requires distance educational systems that are not only adaptive but also culturally resonant (Dalle et al., 2024). Both embodied and virtual AI-Agents have great potential in this regard as they can facilitate personalized learning and adapt their interactions and content delivery to align with students' cultural context. In addition, Generative AI (GAI), such as, Large Language Models (LLMs) can amplify the potential for these culturally aware AI agents to address educational challenges due to their advanced capacity for understanding and generating contextually relevant content (Wang et al., 2024). This chapter reviews existing research and suggests the usage of culturally aware AI-Agents, powered by GAI, to foster engagement and improve learning outcomes in culturally diverse online higher education environments.
Effect of Performance Feedback Timing on Motor Learning for a Surgical Training Task
Objective: Robot-assisted minimally invasive surgery (RMIS) has become the gold standard for a variety of surgical procedures, but the optimal method of training surgeons for RMIS is unknown. We hypothesized that real-time, rather than post-task, error feedback would better increase learning speed and reduce errors. Methods: Forty-two surgical novices learned a virtual version of the ring-on-wire task, a canonical task in RMIS training. We investigated the impact of feedback timing with multi-sensory (haptic and visual) cues in three groups: (1) real-time error feedback, (2) trial replay with error feedback, and (3) no error feedback. Results: Participant performance was evaluated based on the accuracy of ring position and orientation during the task. Participants who received real-time feedback outperformed other groups in ring orientation. Additionally, participants who received feedback in replay outperformed participants who did not receive any error feedback on ring orientation during long, straight path sections. There were no significant differences between groups for ring position overall, but participants who received real-time feedback outperformed the other groups in positional accuracy on tightly curved path sections. Conclusion: The addition of real-time haptic and visual error feedback improves learning outcomes in a virtual surgical task over error feedback in replay or no error feedback at all. Significance: This work demonstrates that multi-sensory error feedback delivered in real time leads to better training outcomes as compared to the same feedback delivered after task completion. This novel method of training may enable surgical trainees to develop skills with greater speed and accuracy.
comment: Held at https://ieeexplore.ieee.org/document/11202637
Opti-Acoustic Scene Reconstruction in Highly Turbid Underwater Environments IROS 2025
Scene reconstruction is an essential capability for underwater robots navigating in close proximity to structures. Monocular vision-based reconstruction methods are unreliable in turbid waters and lack depth scale information. Sonars are robust to turbid water and non-uniform lighting conditions, however, they have low resolution and elevation ambiguity. This work proposes a real-time opti-acoustic scene reconstruction method that is specially optimized to work in turbid water. Our strategy avoids having to identify point features in visual data and instead identifies regions of interest in the data. We then match relevant regions in the image to corresponding sonar data. A reconstruction is obtained by leveraging range data from the sonar and elevation data from the camera image. Experimental comparisons against other vision-based and sonar-based approaches at varying turbidity levels, and field tests conducted in marina environments, validate the effectiveness of the proposed approach. We have made our code open-source to facilitate reproducibility and encourage community engagement.
comment: To appear at IROS 2025 in Hangzhou, China
Systems and Control (CS)
A 0.62 μW/sensor 82 fps Time-to-Digital Impedance Measurement IC with Unified Excitation/Readout Front-end for Large-Scale Piezo-Resistive Sensor Array
This paper presents a fast impedance measurement IC for large-scale piezo-resistive sensor array. It features a unified differential time-to-digital demodulation architecture that readout impedance directly through the excitation circuit. The proposed pre-saturation adaptive bias technique further improves power efficiency. The chip scans 253 sensors in 12.2 ms (82 fps) at 125 kHz, consuming 158 {\mu}W (7.5 nJ/sensor). With loads from 20 {\Omega} to 500 k{\Omega}, it achieves 0.5% error and up to 71.1 dB SNR.
Cryo-CMOS Antenna for Wireless Communications within a Quantum Computer Cryostat
Scaling quantum computers from a few qubits to large numbers remains one of the critical challenges in realizing practical quantum advantage. Multi-core quantum architectures have emerged as a promising solution, enabling scalability through distributed quantum processing units (QPUs) interconnected via classical and quantum links. However, the bottleneck of wired connections persists, as densely packed wired interconnects, both vertically across temperature stages and horizontally within the same layer, introduce spatial constraints, power dissipation, and latency, which could hinder performance as the number of QPUs increases. To overcome these limitations, this work proposes a cryo-compatible on-chip differential dipole antenna operating at 28 GHz to enable short-range wireless communication within a quantum computer cryostat. Temperature-dependent material properties are incorporated to accurately capture antenna behavior at 4 K. Moreover, by embedding the antenna in a realistic cryostat structure, we evaluate the feasibility of antenna operation within the cryogenic environment. The proposed antenna achieves a reflection coefficient of -20.8 dB in free space and -18.38 dB within the cryostat, demonstrating efficient impedance matching.
Efficient Force and Stiffness Prediction in Robotic Produce Handling with a Piezoresistive Pressure Sensor
Properly handling delicate produce with robotic manipulators is a major part of the future role of automation in agricultural harvesting and processing. Grasping with the correct amount of force is crucial in not only ensuring proper grip on the object, but also to avoid damaging or bruising the product. In this work, a flexible pressure sensor that is both low cost and easy to fabricate is integrated with robotic grippers for working with produce of varying shapes, sizes, and stiffnesses. The sensor is successfully integrated with both a rigid robotic gripper, as well as a pneumatically actuated soft finger. Furthermore, an algorithm is proposed for accelerated estimation of the steady-state value of the sensor output based on the transient response data, to enable real-time applications. The sensor is shown to be effective in incorporating feedback to correctly grasp objects of unknown sizes and stiffnesses. At the same time, the sensor provides estimates for these values which can be utilized for identification of qualities such as ripeness levels and bruising. It is also shown to be able to provide force feedback for objects of variable stiffnesses. This enables future use not only for produce identification, but also for tasks such as quality control and selective distribution based on ripeness levels.
comment: For supplementary videos, see https://drive.google.com/drive/folders/1jol-_z6gaUfjpL1Qi7EG420usTbVSodv?usp=sharing
Channel Estimation under Large Doppler Shifts in NOMA-Based Air-Ground Communications
This paper investigates a multiple antenna system with non-orthogonal multiple access (NOMA) for the exchange of air traffic management data between commercial aircraft pilots and ground-based air traffic controllers. While NOMA techniques enhance spectral efficiency, their application to aircraft communications is challenged by the high speed of the aircraft (up to 214 m/s) and the long communication ranges (up to 250 km), resulting in significant Doppler shifts and low signal-to-noise ratios, respectively. To accurately assess these challenges, we employ a realistic geometry-based stochastic air-ground channel model, derived from dedicated flight measurement campaigns. In this paper, multiple aircraft simultaneously transmit data to the ground station. We focus on the channel estimation problem at the ground station under high carrier frequency offsets and the effects of channel aging due to channel's time-varying nature. For the channel estimation problem, we compare the Zadoff-Chu sequences with time-division approach under varying carrier frequency offset pre-compensation accuracies at the aircraft transmitter. For the channel aging problem and performance evaluation of channel estimators, we compute the outage probability for both the zero-forcing detector and the minimum mean squared error detector with successive interference cancellation. The results show that the favorable channel estimator-detector combinations differ between the takeoff & landing phase and the enroute cruise phase of the flight, due to the distinct channel propagation characteristics of each phase.
comment: Submitted to IEEE Conference, 6 pages, 2 Figures
Data-driven learning of feedback maps for explicit robust predictive control: an approximation theoretic view
We establish an algorithm to learn feedback maps from data for a class of robust model predictive control (MPC) problems. The algorithm accounts for the approximation errors due to the learning directly at the synthesis stage, ensuring recursive feasibility by construction. The optimal control problem consists of a linear noisy dynamical system, a quadratic stage and quadratic terminal costs as the objective, and convex constraints on the state, control, and disturbance sequences; the control minimizes and the disturbance maximizes the objective. We proceed via two steps -- (a) Data generation: First, we reformulate the given minmax problem into a convex semi-infinite program and employ recently developed tools to solve it in an exact fashion on grid points of the state space to generate (state, action) data. (b) Learning approximate feedback maps: We employ a couple of approximation schemes that furnish tight approximations within preassigned uniform error bounds on the admissible state space to learn the unknown feedback policy. The stability of the closed-loop system under the approximate feedback policies is also guaranteed under a standard set of hypotheses. Two benchmark numerical examples are provided to illustrate the results.
comment: 27 pages; submitted
Quantifying the Impact of Missing Risk Markets for Decarbonized Power Systems with Long Duration Energy Storage
The transition to a fully decarbonised electricity system depends on integrating new technologies that ensure reliability alongside sustainability. However, missing risk markets hinder investment in reliability-enhancing technologies by exposing investors to revenue uncertainty. This study provides the first quantitative assessment of how missing risk markets affect investment decisions in power systems that depend on long-duration energy storage (LDES) for reliability. We develop a two-stage stochastic equilibrium model with risk-averse market participants, which independently sizes power and energy capacity. We apply the method to a case study of a deeply decarbonised power system in Great Britain. The results show that incomplete risk markets reduce social welfare, harm reliability, and discourage investment in LDES and other technologies with volatile revenue streams. Revenue volatility leads to substantial risk premiums and higher financing costs for LDES, creating a barrier to its large-scale deployment. These findings demonstrate the importance of policy mechanisms that hedge revenue risk to lower the cost of capital and accelerate investment in reliability-enhancing, zero-carbon technologies
Physics-Informed Neural Network Modeling of Vehicle Collision Dynamics in Precision Immobilization Technique Maneuvers
Accurate prediction of vehicle collision dynamics is crucial for advanced safety systems and post-impact control applications, yet existing methods face inherent trade-offs among computational efficiency, prediction accuracy, and data requirements. This paper proposes a dual Physics-Informed Neural Network framework addressing these challenges through two complementary networks. The first network integrates Gaussian Mixture Models with PINN architecture to learn impact force distributions from finite element analysis data while enforcing momentum conservation and energy consistency constraints. The second network employs an adaptive PINN with dynamic constraint weighting to predict post-collision vehicle dynamics, featuring an adaptive physics guard layer that prevents unrealistic predictions whil e preserving data-driven learning capabilities. The framework incorporates uncertainty quantification through time-varying parameters and enables rapid adaptation via fine-tuning strategies. Validation demonstrates significant improvements: the impact force model achieves relative errors below 15.0% for force prediction on finite element analysis (FEA) datasets, while the vehicle dynamics model reduces average trajectory prediction error by 63.6% compared to traditional four-degree-of-freedom models in scaled vehicle experiments. The integrated system maintains millisecond-level computational efficiency suitable for real-time applications while providing probabilistic confidence bounds essential for safety-critical control. Comprehensive validation through FEA simulation, dynamic modeling, and scaled vehicle experiments confirms the framework's effectiveness for Precision Immobilization Technique scenarios and general collision dynamics prediction.
On the Flexibility Potential of a Swiss Distribution Grid: Opportunities and Limitations
The growing integration of distributed renewable generation and the electrification of heating and transportation are rapidly increasing the number of flexible devices within modern distribution grids. Leveraging the aggregated flexibility of these small-scale distributed resources is essential to maintaining future grid-wide stability. This work uses the Swiss distribution grid of Walenstadt as a case study to provide insights into the aggregated flexibility potential of distribution grids. It demonstrates that incorporating devices such as heat pumps and photovoltaic systems significantly enhances distribution grid flexibility. It investigates the time-varying nature of aggregated flexibility and highlights how it can vary seasonally. Furthermore, simulations of future scenarios reveal that aggregated flexibility does not increase linearly or monotonically with higher levels of flexible device penetration. This is primarily due to the overloading of individual feeders, which underscores the impact of grid topology and network constraints on the aggregated flexibility potential.
Multipolar dynamics of social segregation: Data validation on Swedish vaccination statistics
We perform a validation analysis on the multipolar model of opinion dynamics. A general methodology for using the model on datasets of two correlated variables is proposed and tested using data on the relationship between COVID-19 vaccination rates and political participation in Sweden. The model is shown to successfully capture the opinion segregation demonstrated by the data and spatial correlation of biases is demonstrated as necessary for the result. A mixing of the biases on the other hand leads to a more homogeneous opinion distribution, and greater penetration of the majority opinion, which here corresponds to a decision to vote or vaccinate.
comment: Presented at CoDIT 2025
Performance Comparison of Gate-Based and Adiabatic Quantum Computing for Power Flow Analysis SC
In this paper, we present the first direct comparison between gate-based quantum computing (GQC) and adiabatic quantum computing (AQC) for solving the AC power flow (PF) equations. Building on the Adiabatic Quantum Power Flow (AQPF) algorithm originally designed for annealing platforms, we adapt it to the Quantum Approximate Optimization Algorithm (QAOA). The PF equations are reformulated as a combinatorial optimization problem. Numerical experiments on a 4-bus test system assess solution accuracy and computational time. Results from QAOA are benchmarked against those obtained using D-Wave's Advantage system and Fujitsu's latest generation Digital Annealer, i.e., Quantum-Inspired Integrated Optimization software (QIIO). The findings provide quantitative insights into the performance trade-offs, scalability, and practical viability of GQC versus AQC paradigms for PF analysis, highlighting the potential of quantum algorithms to address the computational challenges associated with modern electricity networks in the Noisy Intermediate-Scale Quantum (NISQ).
comment: 7 pages, 1 figure, 4 tables, submitted to PSCC 2026
Partitioned Scheduling for DAG Tasks Considering Probabilistic Execution Time
Autonomous driving systems, critical for safety, require real-time guarantees and can be modeled as DAGs. Their acceleration features, such as caches and pipelining, often result in execution times below the worst-case. Thus, a probabilistic approach ensuring constraint satisfaction within a probability threshold is more suitable than worst-case guarantees for these systems. This paper considers probabilistic guarantees for DAG tasks by utilizing the results of probabilistic guarantees for single processors, which have been relatively more advanced than those for multi-core processors. This paper proposes a task set partitioning method that guarantees schedulability under the partitioned scheduling. The evaluation on randomly generated DAG task sets demonstrates that the proposed method schedules more task sets with a smaller mean analysis time compared to existing probabilistic schedulability analysis for DAGs. The evaluation also compares four bin-packing heuristics, revealing Item-Centric Worst-Fit-Decreasing schedules the most task sets.
Safe Driving in Occluded Environments
Ensuring safe autonomous driving in the presence of occlusions poses a significant challenge in its policy design. While existing model-driven control techniques based on set invariance can handle visible risks, occlusions create latent risks in which safety-critical states are not observable. Data-driven techniques also struggle to handle latent risks because direct mappings from risk-critical objects in sensor inputs to safe actions cannot be learned without visible risk-critical objects. Motivated by these challenges, in this paper, we propose a probabilistic safety certificate for latent risk. Our key technical enabler is the application of probabilistic invariance: It relaxes the strict observability requirements imposed by set-invariance methods that demand the knowledge of risk-critical states. The proposed techniques provide linear action constraints that confine the latent risk probability within tolerance. Such constraints can be integrated into model predictive controllers or embedded in data-driven policies to mitigate latent risks. The proposed method is tested using the CARLA simulator and compared with a few existing techniques. The theoretical and empirical analysis jointly demonstrate that the proposed methods assure long-term safety in real-time control in occluded environments without being overly conservative and with transparency to exposed risks.
Decision-dependent Robust Charging Infrastructure Planning for Light-duty Truck Electrification at Industrial Sites: Scheduling and Abandonment
Many industrial sites rely on diesel-powered light-duty trucks to transport workers and small-scale facilities, which has resulted in a significant amount of greenhouse emissions (GHGs). To address this, we developed a two-stage robust charging infrastructure planning model for electrifying light-duty trucks at industrial sites. The model is formulated as a mixed-integer linear programming (MILP) that optimizes the charging infrastructure, selected from multiple charger types and potential locations, and determines opportunity charging schedules for each truck based on the chosen infrastructure. Given the strict stopping points and schedules at industrial sites, we introduced a scheduling problem with abandonment, where trucks forgo charging if their waiting times exceed a maximum threshold. We also further incorporated the impacts of overnight charging and range anxiety on waiting and abandonment behaviors. To represent the stochastic and heterogeneous parking durations of trucks, we constructed a decision-dependent robust uncertainty set in which parking time variability flexibly depends on charging choices. We applied the model in a case study of an open-pit mining site, which plans charger installations in eight zones and schedules a fleet of around 200 trucks. By decomposing the problem into monthly subproblems and using heuristic approaches, for the whole-year dataset, the model achieves an optimality gap of less than 0.1 % within a reasonable computation time under diverse uncertainty scenarios.
Time-Varying Optimization for Streaming Data Via Temporal Weighting
Classical optimization theory deals with fixed, time-invariant objective functions. However, time-varying optimization has emerged as an important subject for decision-making in dynamic environments. In this work, we study the problem of learning from streaming data through a time-varying optimization lens. Unlike prior works that focus on generic formulations, we introduce a structured, \emph{weight-based} formulation that explicitly captures the streaming-data origin of the time-varying objective, where at each time step, an agent aims to minimize a weighted average loss over all the past data samples. We focus on two specific weighting strategies: (1) uniform weights, which treat all samples equally, and (2) discounted weights, which geometrically decay the influence of older data. For both schemes, we derive tight bounds on the ``tracking error'' (TE), defined as the deviation between the model parameter and the time-varying optimum at a given time step, under gradient descent (GD) updates. We show that under uniform weighting, the TE vanishes asymptotically with a $\mathcal{O}(1/t)$ decay rate, whereas discounted weighting incurs a nonzero error floor controlled by the discount factor and the number of gradient updates performed at each time step. Our theoretical findings are validated through numerical simulations.
comment: Accepted at IEEE Asilomar, 2025
Learning Wireless Interference Patterns: Decoupled GNN for Throughput Prediction in Heterogeneous Multi-Hop p-CSMA Networks
The p-persistent CSMA protocol is central to random-access MAC analysis, but predicting saturation throughput in heterogeneous multi-hop wireless networks remains a hard problem. Simplified models that assume a single, shared interference domain can underestimate throughput by 48--62\% in sparse topologies. Exact Markov-chain analyses are accurate but scale exponentially in computation time, making them impractical for large networks. These computational barriers motivate structural machine learning approaches like GNNs for scalable throughput prediction in general network topologies. Yet off-the-shelf GNNs struggle here: a standard GCN yields 63.94\% normalized mean absolute error (NMAE) on heterogeneous networks because symmetric normalization conflates a node's direct interference with higher-order, cascading effects that pertain to how interference propagates over the network graph. Building on these insights, we propose the Decoupled Graph Convolutional Network (D-GCN), a novel architecture that explicitly separates processing of a node's own transmission probability from neighbor interference effects. D-GCN replaces mean aggregation with learnable attention, yielding interpretable, per-neighbor contribution weights while capturing complex multihop interference patterns. D-GCN attains 3.3\% NMAE, outperforms strong baselines, remains tractable even when exact analytical methods become computationally infeasible, and enables gradient-based network optimization that achieves within 1\% of theoretical optima.
Laser Fault Injection in Memristor-Based Accelerators for AI/ML and Neuromorphic Computing
Memristive crossbar arrays (MCA) are emerging as efficient building blocks for in-memory computing and neuromorphic hardware due to their high density and parallel analog matrix-vector multiplication capabilities. However, the physical properties of their nonvolatile memory elements introduce new attack surfaces, particularly under fault injection scenarios. This work explores Laser Fault Injection as a means of inducing analog perturbations in MCA-based architectures. We present a detailed threat model in which adversaries target memristive cells to subtly alter their physical properties or outputs using laser beams. Through HSPICE simulations of a large MCA on 45 nm CMOS tech. node, we show how laser-induced photocurrent manifests in output current distributions, enabling differential fault analysis to infer internal weights with up to 99.7% accuracy, replicate the model, and compromise computational integrity through targeted weight alterations by approximately 143%.
comment: 3 pages, 4 figures
Resource-Aware Stealthy Attacks in Vehicle Platoons
Connected and Autonomous Vehicles (CAVs) are transforming modern transportation by enabling cooperative applications such as vehicle platooning, where multiple vehicles travel in close formation to improve efficiency and safety. However, the heavy reliance on inter-vehicle communication makes platoons highly susceptible to attacks, where even subtle manipulations can escalate into severe physical consequences. While existing research has largely focused on defending against attacks, far less attention has been given to stealthy adversaries that aim to covertly manipulate platoon behavior. This paper introduces a new perspective on the attack design problem by demonstrating how attackers can guide platoons toward their own desired trajectories while remaining undetected. We outline conditions under which such attacks are feasible, analyze their dependence on communication topologies and control protocols, and investigate the resources required by the attacker. By characterizing the resources needed to launch stealthy attacks, we address system vulnerabilities and informing the design of resilient countermeasures. Our findings reveal critical weaknesses in current platoon architectures and anomaly detection mechanisms and provide methods to develop more secure and trustworthy CAV systems.
comment: 13 pages, 8 figures
Belief Space Control of Safety-Critical Systems Under State-Dependent Measurement Noise
Safety-critical control is imperative for deploying autonomous systems in the real world. Control Barrier Functions (CBFs) offer strong safety guarantees when accurate system and sensor models are available. However, widely used additive, fixed-noise models are not representative of complex sensor modalities with state-dependent error characteristics. Although CBFs have been designed to mitigate uncertainty using fixed worst-case bounds on measurement noise, this approach can lead to overly-conservative control. To solve this problem, we extend the Belief Control Barrier Function (BCBF) framework to accommodate state-dependent measurement noise via the Generalized Extended Kalman Filter (GEKF) algorithm, which models measurement noise as a linear function of the state. Using the original BCBF framework as baseline, we demonstrate the performance of the BCBF-GEKF approach through simulation results on a 1D single integrator setpoint tracking scenario and 2D unicycle kinematics trajectory tracking scenario. Our results confirm that the BCBF-GEKF approach offers less conservative control with greater safety.
comment: Preprint - Submitted to the 2026 American Control Conference
DiffOPF: Diffusion Solver for Optimal Power Flow
The optimal power flow (OPF) is a multi-valued, non-convex mapping from loads to dispatch setpoints. The variability of system parameters (e.g., admittances, topology) further contributes to the multiplicity of dispatch setpoints for a given load. Existing deep learning OPF solvers are single-valued and thus fail to capture the variability of system parameters unless fully represented in the feature space, which is prohibitive. To solve this problem, we introduce a diffusion-based OPF solver, termed \textit{DiffOPF}, that treats OPF as a conditional sampling problem. The solver learns the joint distribution of loads and dispatch setpoints from operational history, and returns the marginal dispatch distributions conditioned on loads. Unlike single-valued solvers, DiffOPF enables sampling statistically credible warm starts with favorable cost and constraint satisfaction trade-offs. We explore the sample complexity of DiffOPF to ensure the OPF solution within a prescribed distance from the optimization-based solution, and verify this experimentally on power system benchmarks.
comment: 7 pages, 4 figures, 2 tables
Dual Detection Framework for Faults and Integrity Attacks in Cyber-Physical Control Systems
Anomaly detection plays a vital role in the security and safety of cyber-physical control systems, and accurately distinguishing between different anomaly types is crucial for system recovery and mitigation. This study proposes a dual detection framework for anomaly detection and discrimination. By leveraging the dynamic characteristics of control loops and the stealthiness features of integrity attacks, the closed-loop stealthiness condition is first derived, and two dedicated detectors are designed and deployed on the controller side and the plant side, respectively, enabling joint plant fault and cyber attack detection. Moreover, by jointly analyzing the residual response of the two detectors corresponding to different anomalies, it is proved that the proposed method can distinguish between faults and integrity attacks due to the detectors' individual residual spaces. According to the detector's residual space, the fault and attack detection performance is further improved by a two-stage optimization scheme. Simulation results validate the effectiveness of the proposed approach.
Multi-Period Sparse Optimization for Proactive Grid Blackout Diagnosis
Existing or planned power grids need to evaluate survivability under extreme events, like a number of peak load overloading conditions, which could possibly cause system collapses (i.e. blackouts). For realistic extreme events that are correlated or share similar patterns, it is reasonable to expect that the dominant vulnerability or failure sources behind them share the same locations but with different severity. Early warning diagnosis that proactively identifies the key vulnerabilities responsible for a number of system collapses of interest can significantly enhance resilience. This paper proposes a multi-period sparse optimization method, enabling the discovery of {persistent failure sources} across a sequence of collapsed systems with increasing system stress, such as rising demand or worsening contingencies. This work defines persistency and efficiently integrates persistency constraints to capture the ``hidden'' evolving vulnerabilities. Circuit-theory based power flow formulations and circuit-inspired optimization heuristics are used to facilitate the scalability of the method. Experiments on benchmark systems show that the method reliably tracks persistent vulnerability locations under increasing load stress, and solves with scalability to large systems ({on average} taking {around} 200 s per scenario on 2000+ bus systems).
Cyber-Resilient System Identification for Power Grid through Bayesian Integration
Power grids increasingly need real-time situational awareness under the ever-evolving cyberthreat landscape. Advances in snapshot-based system identification approaches have enabled accurately estimating states and topology from a snapshot of measurement data, under random bad data and topology errors. However, modern interactive, targeted false data can stay undetectable to these methods, and significantly compromise estimation accuracy. This work advances system identification that combines snapshot-based method with time-series model via Bayesian Integration, to advance cyber resiliency against both random and targeted false data. Using a distance-based time-series model, this work can leverage historical data of different distributions induced by changes in grid topology and other settings. The normal system behavior captured from historical data is integrated into system identification through a Bayesian treatment, to make solutions robust to targeted false data. We experiment on mixed random anomalies (bad data, topology error) and targeted false data injection attack (FDIA) to demonstrate our method's 1) cyber resilience: achieving over 70% reduction in estimation error under FDIA; 2) anomalous data identification: being able to alarm and locate anomalous data; 3) almost linear scalability: achieving comparable speed with the snapshot-based baseline, both taking <1min per time tick on the large 2,383-bus system using a laptop CPU.
The Algorithmic Regulator
The regulator theorem states that, under certain conditions, any optimal controller must embody a model of the system it regulates, grounding the idea that controllers embed, explicitly or implicitly, internal models of the controlled. This principle underpins neuroscience and predictive brain theories like the Free-Energy Principle or Kolmogorov/Algorithmic Agent theory. However, the theorem is only proven in limited settings. Here, we treat the deterministic, closed, coupled world-regulator system $(W,R)$ as a single self-delimiting program $p$ via a constant-size wrapper that produces the world output string~$x$ fed to the regulator. We analyze regulation from the viewpoint of the algorithmic complexity of the output, $K(x)$. We define $R$ to be a \emph{good algorithmic regulator} if it \emph{reduces} the algorithmic complexity of the readout relative to a null (unregulated) baseline $\varnothing$, i.e., \[ \Delta = K\big(O_{W,\varnothing}\big) - K\big(O_{W,R}\big) > 0. \] We then prove that the larger $\Delta$ is, the more world-regulator pairs with high mutual algorithmic information are favored. More precisely, a complexity gap $\Delta > 0$ yields \[ \Pr\big((W,R)\mid x\big) \le C\,2^{\,M(W{:}R)}\,2^{-\Delta}, \] making low $M(W{:}R)$ exponentially unlikely as $\Delta$ grows. This is an AIT version of the idea that ``the regulator contains a model of the world.'' The framework is distribution-free, applies to individual sequences, and complements the Internal Model Principle. Beyond this necessity claim, the same coding-theorem calculus singles out a \emph{canonical scalar objective} and implicates a \emph{planner}. On the realized episode, a regulator behaves \emph{as if} it minimized the conditional description length of the readout.
comment: 2 Figures
The value of storage in electricity distribution: The role of markets
Electricity distribution companies deploy battery storage to defer grid upgrades by reducing peak demand. In deregulated jurisdictions, such storage often sits idle because regulatory constraints bar participation in electricity markets. Here, we develop an optimization framework that, to our knowledge, provides the first formal model of market participation constraints within storage investment and operation planning. Applying the framework to a Massachusetts case study, we find that market participation could deliver similar savings as peak demand reduction. Under current conditions, market participation does not increase storage investment, but at very low storage costs, could incentivize deployment beyond local distribution needs. This might run contrary to the separation of distribution from generation in deregulated markets. Our framework can identify investment levels appropriate for local distribution needs.
High-Parallel FPGA-Based Discrete Simulated Bifurcation for Large-Scale Optimization
Combinatorial Optimization (CO) problems exhibit exponential complexity, making their resolution challenging. Simulated Adiabatic Bifurcation (aSB) is a quantum-inspired algorithm to obtain approximate solutions to largescale CO problems written in the Ising form. It explores the solution space by emulating the adiabatic evolution of a network of Kerr-nonlinear parametric oscillators (KPOs), where each oscillator represents a variable in the problem. The optimal solution corresponds to the ground state of this system. A key advantage of this approach is the possibility of updating multiple variables simultaneously, making it particularly suited for hardware implementation. To enhance solution quality and convergence speed, variations of the algorithm have been proposed in the literature, including ballistic (bSB), discrete (dSB), and thermal (HbSB) versions. In this work, we have comprehensively analyzed dSB, bSB, and HbSB using dedicated software models, evaluating the feasibility of using a fixed-point representation for hardware implementation. We then present an opensource hardware architecture implementing the dSB algorithm for Field-Programmable Gate Arrays (FPGAs). The design allows users to adjust the degree of algorithmic parallelization based on their specific requirements. A proof-of-concept implementation that solves 256-variable problems was achieved on an AMD Kria KV260 SoM, a low-tier FPGA, validated using well-known max-cut and knapsack problems.
Hybrid Terrain-Aware Path Planning: Integrating VD-RRT* Exploration and VD-D* Lite Repair
Autonomous ground vehicles operating off-road must plan curvature-feasible paths while accounting for spatially varying soil strength and slope hazards in real time. We present a continuous state--cost metric that combines a Bekker pressure--sinkage model with elevation-derived slope and attitude penalties. The resulting terrain cost field is analytic, bounded, and monotonic in soil modulus and slope, ensuring well-posed discretization and stable updates under sensor noise. This metric is evaluated on a lattice with exact steering primitives: Dubins and Reeds--Shepp motions for differential drive and time-parameterized bicycle arcs for Ackermann steering. Global exploration is performed using Vehicle-Dynamics RRT\(^{*}\), while local repair is managed by Vehicle-Dynamics D\(^{*}\) Lite, enabling millisecond-scale replanning without heuristic smoothing. By separating the terrain--vehicle model from the planner, the framework provides a reusable basis for deterministic, sampling-based, or learning-driven planning in deformable terrain. Hardware trials on an off-road platform demonstrate real-time navigation across soft soil and slope transitions, supporting reliable autonomy in unstructured environments.
Product Digital Twin Supporting End-of-life Phase of Electric Vehicle Batteries Utilizing Product-Process-Resource Asset Network
In a circular economy, products in their end-of-life phase should be either remanufactured or recycled. Both of these processes are crucial for sustainability and environmental conservation. However, manufacturers frequently do not support these processes enough in terms of not sharing relevant data about the products nor their (re-)manufacturing processes. This paper proposes to accompany each product with a digital twin technology, specifically the Product Digital Twin (PDT), which can carry information for facilitating and optimizing production and remanufacturing processes. This paper introduces a knowledge representation called Bi-Flow Product-Process-Resource Asset Network (Bi-PAN). Bi-PAN extends a well-proven Product-Process-Resource Asset Network (PAN) paradigm by integrating both assembly and disassembly workflows into a single information model. Such networks enable capturing relevant relationships across products, production resources, manufacturing processes, and specific production operations that have to be done in the manufacturing phase of a product. The proposed approach is demonstrated in a use-case of disassembling electric vehicle (EV) batteries. By utilizing PDTs with Bi-PAN knowledge models, challenges associated with disassembling of EV batteries can be solved flexibly and efficiently for various battery types, enhancing the sustainability of the EV battery life-cycle management.
comment: \copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
A Personalized Data-Driven Generative Model of Human Repetitive Motion
The deployment of autonomous virtual avatars (in extended reality) and robots in human group activities -- such as rehabilitation therapy, sports, and manufacturing -- is expected to increase as these technologies become more pervasive. Designing cognitive architectures and control strategies to drive these agents requires realistic models of human motion. Furthermore, recent research has shown that each person exhibits a unique velocity signature, highlighting how individual motor behaviors are both rich in variability and internally consistent. However, existing models only provide simplified descriptions of human motor behavior, hindering the development of effective cognitive architectures. In this work, we first show that motion amplitude provides a valid and complementary characterization of individual motor signatures. Then, we propose a fully data-driven approach, based on long short-term memory neural networks, to generate original motion that captures the unique features of specific individuals. We validate the architecture using real human data from participants performing spontaneous oscillatory motion. Extensive analyses show that state-of-the-art Kuramoto-like models fail to replicate individual motor signatures, whereas our model accurately reproduces the velocity distribution and amplitude envelopes of the individual it was trained on, while remaining distinct from others.
comment: 12 pages, 6 figures
Addressing Model Inaccuracies in Transmission Network Reconfiguration via Diverse Alternatives
The ongoing energy transition places significant pressure on the transmission network due to increasing shares of renewables and electrification. To mitigate grid congestion, transmission system operators need decision support tools to suggest remedial actions, such as transmission network reconfigurations or redispatch. However, these tools are prone to model inaccuracies and may not provide relevant suggestions with regard to important unmodeled constraints or operator preferences. We propose a human-in-the-loop modeling-to-generate alternatives (HITL-MGA) approach to address these shortcomings by generating diverse topology reconfiguration alternatives. Case studies on the IEEE 57-bus and IEEE 118-bus systems show the method can leverage expert feedback and improve the quality of the suggested topology reconfigurations.
comment: This preprint is currently under peer review
Dual-Regularized Riccati Recursions for Interior-Point Optimal Control
We derive closed-form extensions of Riccati's recursions (both sequential and parallel) for solving dual-regularized LQR problems. We show how these methods can be used to solve general constrained, non-convex, discrete-time optimal control problems via a regularized interior point method, while guaranteeing that each step is a descent direction of an Augmented Barrier-Lagrangian merit function. We provide MIT-licensed implementations of our methods in C++ and JAX.
On the Fast Nonlinear Filtering with Matrix Fisher Distributions on SO(3)
This paper addresses two interrelated problems: the nonlinear filtering mechanism and fast attitude filtering with the matrix Fisher distribution (MFD) on the special orthogonal group. By analyzing the distribution evolution along Bayes' rule, we reveal two essential properties that enhance the performance of Bayesian attitude filters with MFDs, particularly in challenging conditions from a theoretical viewpoint. Benefiting from the new understanding of the filtering mechanism associated with MFDs, two closed-form filters with MFDs are then proposed. The filters avoids the burdensome computations in previous MFD-based filters by introducing linearized error systems with invariant errors but retaining the two advantageous properties. Numerical simulations demonstrate that the proposed filters are more accurate than the classic invariant Kalman filter. Besides, it is also as accurate as recent MFD-based Bayesian filters in challenging circumstances with large initial error and measurement uncertainty, but it consumes far less computation time (about 1/5 to 1/100 of previous MFD-based attitude filters).
Design and benchmarking of a two degree of freedom tendon driver unit for cable-driven wearable technologies
Exosuits have recently been developed as alternatives to rigid exoskeletons and are increasingly adopted for both upper and lower limb therapy and assistance in clinical and home environments. Many cable-driven exosuits have been developed but little has been published on their electromechanical designs and performance. Therefore, this paper presents a comprehensive design and performance analysis of a two degree of freedom tendon driver unit (TDU) for cable-driven wearable exosuits. Detailed methodologies are presented to benchmark the functionality of the TDU. A static torque output test compares the commanded and measured torques. A velocity control test evaluates the attenuation and phase shift across velocities. A noise test evaluates how loud the TDU is for the wearer under different speeds. A thermal stress test captures the cooling performance of the TDU to ensure safe operation at higher loads. Finally, a battery endurance test evaluates the runtime of the TDU under various loading conditions to inform the usable time. To demonstrate these tests, a modular TDU system for cable-driven applications is introduced, which allows components such as motors, pulleys, and sensors to be adapted based on the requirements of the intended application. By sharing detailed methodologies and performance results, this study aims to provide a TDU design that may be leveraged by others and resources for researchers and engineers to better document the capabilities of their TDU designs.
Geometric Backstepping Control of Omnidirectional Tiltrotors Incorporating Servo-Rotor Dynamics for Robustness against Sudden Disturbances
This work presents a geometric backstepping controller for a variable-tilt omnidirectional multirotor that explicitly accounts for both servo and rotor dynamics. Considering actuator dynamics is essential for more effective and reliable operation, particularly during aggressive flight maneuvers or recovery from sudden disturbances. While prior studies have investigated actuator-aware control for conventional and fixed-tilt multirotors, these approaches rely on linear relationships between actuator input and wrench, which cannot capture the nonlinearities induced by variable tilt angles. In this work, we exploit the cascade structure between the rigid-body dynamics of the multirotor and its nonlinear actuator dynamics to design the proposed backstepping controller and establish exponential stability of the overall system. Furthermore, we reveal parametric uncertainty in the actuator model through experiments, and we demonstrate that the proposed controller remains robust against such uncertainty. The controller was compared against a baseline that does not account for actuator dynamics across three experimental scenarios: fast translational tracking, rapid rotational tracking, and recovery from sudden disturbance. The proposed method consistently achieved better tracking performance, and notably, while the baseline diverged and crashed during the fastest translational trajectory tracking and the recovery experiment, the proposed controller maintained stability and successfully completed the tasks, thereby demonstrating its effectiveness.
A Verification Methodology for Safety Assurance of Robotic Autonomous Systems
Autonomous robots deployed in shared human environments, such as agricultural settings, require rigorous safety assurance to meet both functional reliability and regulatory compliance. These systems must operate in dynamic, unstructured environments, interact safely with humans, and respond effectively to a wide range of potential hazards. This paper presents a verification workflow for the safety assurance of an autonomous agricultural robot, covering the entire development life-cycle, from concept study and design to runtime verification. The outlined methodology begins with a systematic hazard analysis and risk assessment to identify potential risks and derive corresponding safety requirements. A formal model of the safety controller is then developed to capture its behaviour and verify that the controller satisfies the specified safety properties with respect to these requirements. The proposed approach is demonstrated on a field robot operating in an agricultural setting. The results show that the methodology can be effectively used to verify safety-critical properties and facilitate the early identification of design issues, contributing to the development of safer robots and autonomous systems.
comment: In Proc. of the 26th TAROS (Towards Autonomous Robotic Systems) Conference, York, UK, August, 2025
Multi Timescale Stochastic Approximation: Stability and Convergence
This paper presents the first sufficient conditions that guarantee the stability and almost sure convergence of multi-timescale stochastic approximation (SA) iterates. It extends the existing results on one-timescale and two-timescale SA iterates to general $N$-timescale stochastic recursions, for any $N \geq 1$, using the ordinary differential equation (ODE) method. As an application, we study SA algorithms augmented with heavy-ball momentum in the context of Gradient Temporal Difference (GTD) learning. The added momentum introduces an auxiliary state evolving on an intermediate timescale, yielding a three-timescale recursion. We show that with appropriate momentum parameters, the scheme fits within our framework and converges almost surely to the same fixed point as baseline GTD. The stability and convergence of all iterates including the momentum state follow from our main results without ad hoc bounds. We then study off-policy actor-critic algorithms with a baseline learner, actor, and critic updated on separate timescales. In contrast to prior work, we eliminate projection steps from the actor update and instead use our framework to guarantee stability and almost sure convergence of all components. Finally, we extend the analysis to constrained policy optimization in the average reward setting, where the actor, critic, and dual variables evolve on three distinct timescales, and we verify that the resulting dynamics satisfy the conditions of our general theorem. These examples show how diverse reinforcement learning algorithms covering momentum acceleration, off-policy learning, and primal-dual methods-fit naturally into the proposed multi-timescale framework.
comment: arXiv admin note: text overlap with arXiv:2111.11004, Added an application to the 4-Timescale case
Learning Power Flow with Confidence: A Probabilistic Guarantee Framework for Voltage Risk
The absence of formal performance guarantees in machine learning (ML) has limited its adoption for safety-critical power system applications, where confidence and interpretability are as vital as accuracy. In this work, we present a probabilistic guarantee for power flow learning and voltage risk estimation, derived through the framework of Gaussian Process (GP) regression. Specifically, we establish a bound on the expected estimation error that connects the GP's predictive variance to confidence in voltage risk estimates, ensuring statistical equivalence with Monte Carlo-based ACPF risk quantification. To enhance model learnability in the low-data regime, we first design the Vertex-Degree Kernel (VDK), a topology-aware additive kernel that decomposes voltage-load interactions into local neighborhoods for efficient large-scale learning. Building on this, we introduce a network-swipe active learning (AL) algorithm that adaptively samples informative operating points and provides a principled stopping criterion without requiring out-of-sample validation. Together, these developments mitigate the principal bottleneck of ML-based power flow-its lack of guaranteed reliability-by combining data efficiency with analytical assurance. Empirical evaluations across IEEE 118-, 500-, and 1354-bus systems confirm that the proposed VDK-GP achieves mean absolute voltage errors below 1E-03 p.u., reproduces Monte Carlo-level voltage risk estimates with 15x fewer ACPF computations, and achieves over 120x reduction in evaluation time while conservatively bounding violation probabilities.
comment: 10 pages
Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs NeurIPS 2020
We study the sequential decision making problem of maximizing the expected total reward while satisfying a constraint on the expected total utility. We employ the natural policy gradient method to solve the discounted infinite-horizon optimal control problem for Constrained Markov Decision Processes (constrained MDPs). Specifically, we propose a new Natural Policy Gradient Primal-Dual (NPG-PD) method that updates the primal variable via natural policy gradient ascent and the dual variable via projected subgradient descent. Although the underlying maximization involves a nonconcave objective function and a nonconvex constraint set, under the softmax policy parametrization, we prove that our method achieves global convergence with sublinear rates regarding both the optimality gap and the constraint violation. Such convergence is independent of the size of the state-action space, i.e., it is~dimension-free. Furthermore, for log-linear and general smooth policy parametrizations, we establish sublinear convergence rates up to a function approximation error caused by restricted policy parametrization. We also provide convergence and finite-sample complexity guarantees for two sample-based NPG-PD algorithms. We use a set of computational experiments to showcase the effectiveness of our approach.
comment: 76 pages, 4 figures, 2 tables; Journal version of the NeurIPS 2020 paper; Accepted to JMLR
Tiny Learning-Based MPC for Multirotors: Solver-Aware Learning for Efficient Embedded Predictive Control
Tiny aerial robots hold great promise for applications such as environmental monitoring and search-and-rescue, yet face significant control challenges due to limited onboard computing power and nonlinear dynamics. Model Predictive Control (MPC) enables agile trajectory tracking and constraint handling but depends on an accurate dynamics model. While existing Learning-Based (LB) MPC methods, such as Gaussian Process (GP) MPC, enhance performance by learning residual dynamics, their high computational cost restricts onboard deployment on tiny robots. This paper introduces Tiny LB MPC, a co-designed MPC framework and optimization solver for resource-constrained micro multirotor platforms. The proposed approach achieves 100 Hz control on a Crazyflie 2.1 equipped with a Teensy 4.0 microcontroller, demonstrating a 43% average improvement in tracking performance over existing embedded MPC methods under model uncertainty, and achieving the first onboard implementation of LB MPC on a 53 g multirotor.
A Faster and More Reliable Middleware for Autonomous Driving Systems
Ensuring safety in high-speed autonomous vehicles requires rapid control loops and tightly bounded delays from perception to actuation. Many open-source autonomy systems rely on ROS 2 middleware; when multiple sensor and control nodes share one compute unit, ROS 2 and its DDS transports add significant (de)serialization, copying, and discovery overheads, shrinking the available time budget. We present Sensor-in-Memory (SIM), a shared-memory transport designed for intra-host pipelines in autonomous vehicles. SIM keeps sensor data in native memory layouts (e.g., cv::Mat, PCL), uses lock-free bounded double buffers that overwrite old data to prioritize freshness, and integrates into ROS 2 nodes with four lines of code. Unlike traditional middleware, SIM operates beside ROS 2 and is optimized for applications where data freshness and minimal latency outweigh guaranteed completeness. SIM provides sequence numbers, a writer heartbeat, and optional checksums to ensure ordering, liveness, and basic integrity. On an NVIDIA Jetson Orin Nano, SIM reduces data-transport latency by up to 98% compared to ROS 2 zero-copy transports such as FastRTPS and Zenoh, lowers mean latency by about 95%, and narrows 95th/99th-percentile tail latencies by around 96%. In tests on a production-ready Level 4 vehicle running Autoware.Universe, SIM increased localization frequency from 7.5 Hz to 9.5 Hz. Applied across all latency-critical modules, SIM cut average perception-to-decision latency from 521.91 ms to 290.26 ms, reducing emergency braking distance at 40 mph (64 km/h) on dry concrete by 13.6 ft (4.14 m).
comment: 8 pages,7 figures, 8 tables
An AI-Driven Multimodal Smart Home Platform for Continuous Monitoring and Assistance in Post-Stroke Motor Impairment
At-home rehabilitation for post-stroke patients presents significant challenges, as continuous, personalized care is often limited outside clinical settings. Moreover, the lack of integrated solutions capable of simultaneously monitoring motor recovery and providing intelligent assistance in home environments hampers rehabilitation outcomes. Here, we present a multimodal smart home platform designed for continuous, at-home rehabilitation of post-stroke patients, integrating wearable sensing, ambient monitoring, and adaptive automation. A plantar pressure insole equipped with a machine learning pipeline classifies users into motor recovery stages with up to 94\% accuracy, enabling quantitative tracking of walking patterns during daily activities. An optional head-mounted eye-tracking module, together with ambient sensors such as cameras and microphones, supports seamless hands-free control of household devices with a 100\% success rate and sub-second response time. These data streams are fused locally via a hierarchical Internet of Things (IoT) architecture, ensuring low latency and data privacy. An embedded large language model (LLM) agent, Auto-Care, continuously interprets multimodal data to provide real-time interventions -- issuing personalized reminders, adjusting environmental conditions, and notifying caregivers. Implemented in a post-stroke context, this integrated smart home platform increased mean user satisfaction from 3.9 $\pm$ 0.8 in conventional home environments to 8.4 $\pm$ 0.6 with the full system ($n=20$). Beyond stroke, the system offers a scalable, patient-centered framework with potential for long-term use in broader neurorehabilitation and aging-in-place applications.
comment: 5 figures, 41 references
Electromagnetically Reconfigurable Fluid Antenna System for Wireless Communications: Design, Modeling, Algorithm, Fabrication, and Experiment
This paper presents the concept, design, channel modeling, beamforming algorithm development, prototype fabrication, and experimental measurement of an electromagnetically reconfigurable fluid antenna system (ER-FAS), in which each FAS array element features electromagnetic (EM) reconfigurability. Unlike most existing FAS works that investigate spatial reconfigurability by adjusting the position and/or orientation of array elements, the proposed ER-FAS enables direct control over the EM characteristics of each element, allowing for dynamic radiation pattern reconfigurability. Specifically, a novel ER-FAS architecture leveraging software-controlled fluidics is proposed, and corresponding wireless channel models are established. Based on this ER-FAS channel model, a low-complexity greedy beamforming algorithm is developed to jointly optimize the analog phase shift and the radiation state of each array element. The accuracy of the ER-FAS channel model and the effectiveness of the beamforming algorithm are validated through (i) full-wave EM simulations and (ii) numerical spectral efficiency evaluations. These results confirm that the proposed ER-FAS significantly enhances spectral efficiency in both near-field and far-field scenarios compared to conventional antenna arrays. To further validate this design, we fabricate prototypes for both the ER-FAS element and array, using Galinstan liquid metal alloy, fluid silver paste, and software-controlled fluidic channels. The simulation results are experimentally validated through prototype measurements conducted in an anechoic chamber. Additionally, several indoor communication experiments using a pair of software-defined radios demonstrate the superior received power and bit error rate performance of the ER-FAS prototype.
Systems and Control (EESS)
A 0.62 μW/sensor 82 fps Time-to-Digital Impedance Measurement IC with Unified Excitation/Readout Front-end for Large-Scale Piezo-Resistive Sensor Array
This paper presents a fast impedance measurement IC for large-scale piezo-resistive sensor array. It features a unified differential time-to-digital demodulation architecture that readout impedance directly through the excitation circuit. The proposed pre-saturation adaptive bias technique further improves power efficiency. The chip scans 253 sensors in 12.2 ms (82 fps) at 125 kHz, consuming 158 {\mu}W (7.5 nJ/sensor). With loads from 20 {\Omega} to 500 k{\Omega}, it achieves 0.5% error and up to 71.1 dB SNR.
Cryo-CMOS Antenna for Wireless Communications within a Quantum Computer Cryostat
Scaling quantum computers from a few qubits to large numbers remains one of the critical challenges in realizing practical quantum advantage. Multi-core quantum architectures have emerged as a promising solution, enabling scalability through distributed quantum processing units (QPUs) interconnected via classical and quantum links. However, the bottleneck of wired connections persists, as densely packed wired interconnects, both vertically across temperature stages and horizontally within the same layer, introduce spatial constraints, power dissipation, and latency, which could hinder performance as the number of QPUs increases. To overcome these limitations, this work proposes a cryo-compatible on-chip differential dipole antenna operating at 28 GHz to enable short-range wireless communication within a quantum computer cryostat. Temperature-dependent material properties are incorporated to accurately capture antenna behavior at 4 K. Moreover, by embedding the antenna in a realistic cryostat structure, we evaluate the feasibility of antenna operation within the cryogenic environment. The proposed antenna achieves a reflection coefficient of -20.8 dB in free space and -18.38 dB within the cryostat, demonstrating efficient impedance matching.
Efficient Force and Stiffness Prediction in Robotic Produce Handling with a Piezoresistive Pressure Sensor
Properly handling delicate produce with robotic manipulators is a major part of the future role of automation in agricultural harvesting and processing. Grasping with the correct amount of force is crucial in not only ensuring proper grip on the object, but also to avoid damaging or bruising the product. In this work, a flexible pressure sensor that is both low cost and easy to fabricate is integrated with robotic grippers for working with produce of varying shapes, sizes, and stiffnesses. The sensor is successfully integrated with both a rigid robotic gripper, as well as a pneumatically actuated soft finger. Furthermore, an algorithm is proposed for accelerated estimation of the steady-state value of the sensor output based on the transient response data, to enable real-time applications. The sensor is shown to be effective in incorporating feedback to correctly grasp objects of unknown sizes and stiffnesses. At the same time, the sensor provides estimates for these values which can be utilized for identification of qualities such as ripeness levels and bruising. It is also shown to be able to provide force feedback for objects of variable stiffnesses. This enables future use not only for produce identification, but also for tasks such as quality control and selective distribution based on ripeness levels.
comment: For supplementary videos, see https://drive.google.com/drive/folders/1jol-_z6gaUfjpL1Qi7EG420usTbVSodv?usp=sharing
Channel Estimation under Large Doppler Shifts in NOMA-Based Air-Ground Communications
This paper investigates a multiple antenna system with non-orthogonal multiple access (NOMA) for the exchange of air traffic management data between commercial aircraft pilots and ground-based air traffic controllers. While NOMA techniques enhance spectral efficiency, their application to aircraft communications is challenged by the high speed of the aircraft (up to 214 m/s) and the long communication ranges (up to 250 km), resulting in significant Doppler shifts and low signal-to-noise ratios, respectively. To accurately assess these challenges, we employ a realistic geometry-based stochastic air-ground channel model, derived from dedicated flight measurement campaigns. In this paper, multiple aircraft simultaneously transmit data to the ground station. We focus on the channel estimation problem at the ground station under high carrier frequency offsets and the effects of channel aging due to channel's time-varying nature. For the channel estimation problem, we compare the Zadoff-Chu sequences with time-division approach under varying carrier frequency offset pre-compensation accuracies at the aircraft transmitter. For the channel aging problem and performance evaluation of channel estimators, we compute the outage probability for both the zero-forcing detector and the minimum mean squared error detector with successive interference cancellation. The results show that the favorable channel estimator-detector combinations differ between the takeoff & landing phase and the enroute cruise phase of the flight, due to the distinct channel propagation characteristics of each phase.
comment: Submitted to IEEE Conference, 6 pages, 2 Figures
Data-driven learning of feedback maps for explicit robust predictive control: an approximation theoretic view
We establish an algorithm to learn feedback maps from data for a class of robust model predictive control (MPC) problems. The algorithm accounts for the approximation errors due to the learning directly at the synthesis stage, ensuring recursive feasibility by construction. The optimal control problem consists of a linear noisy dynamical system, a quadratic stage and quadratic terminal costs as the objective, and convex constraints on the state, control, and disturbance sequences; the control minimizes and the disturbance maximizes the objective. We proceed via two steps -- (a) Data generation: First, we reformulate the given minmax problem into a convex semi-infinite program and employ recently developed tools to solve it in an exact fashion on grid points of the state space to generate (state, action) data. (b) Learning approximate feedback maps: We employ a couple of approximation schemes that furnish tight approximations within preassigned uniform error bounds on the admissible state space to learn the unknown feedback policy. The stability of the closed-loop system under the approximate feedback policies is also guaranteed under a standard set of hypotheses. Two benchmark numerical examples are provided to illustrate the results.
comment: 27 pages; submitted
Quantifying the Impact of Missing Risk Markets for Decarbonized Power Systems with Long Duration Energy Storage
The transition to a fully decarbonised electricity system depends on integrating new technologies that ensure reliability alongside sustainability. However, missing risk markets hinder investment in reliability-enhancing technologies by exposing investors to revenue uncertainty. This study provides the first quantitative assessment of how missing risk markets affect investment decisions in power systems that depend on long-duration energy storage (LDES) for reliability. We develop a two-stage stochastic equilibrium model with risk-averse market participants, which independently sizes power and energy capacity. We apply the method to a case study of a deeply decarbonised power system in Great Britain. The results show that incomplete risk markets reduce social welfare, harm reliability, and discourage investment in LDES and other technologies with volatile revenue streams. Revenue volatility leads to substantial risk premiums and higher financing costs for LDES, creating a barrier to its large-scale deployment. These findings demonstrate the importance of policy mechanisms that hedge revenue risk to lower the cost of capital and accelerate investment in reliability-enhancing, zero-carbon technologies
Physics-Informed Neural Network Modeling of Vehicle Collision Dynamics in Precision Immobilization Technique Maneuvers
Accurate prediction of vehicle collision dynamics is crucial for advanced safety systems and post-impact control applications, yet existing methods face inherent trade-offs among computational efficiency, prediction accuracy, and data requirements. This paper proposes a dual Physics-Informed Neural Network framework addressing these challenges through two complementary networks. The first network integrates Gaussian Mixture Models with PINN architecture to learn impact force distributions from finite element analysis data while enforcing momentum conservation and energy consistency constraints. The second network employs an adaptive PINN with dynamic constraint weighting to predict post-collision vehicle dynamics, featuring an adaptive physics guard layer that prevents unrealistic predictions whil e preserving data-driven learning capabilities. The framework incorporates uncertainty quantification through time-varying parameters and enables rapid adaptation via fine-tuning strategies. Validation demonstrates significant improvements: the impact force model achieves relative errors below 15.0% for force prediction on finite element analysis (FEA) datasets, while the vehicle dynamics model reduces average trajectory prediction error by 63.6% compared to traditional four-degree-of-freedom models in scaled vehicle experiments. The integrated system maintains millisecond-level computational efficiency suitable for real-time applications while providing probabilistic confidence bounds essential for safety-critical control. Comprehensive validation through FEA simulation, dynamic modeling, and scaled vehicle experiments confirms the framework's effectiveness for Precision Immobilization Technique scenarios and general collision dynamics prediction.
On the Flexibility Potential of a Swiss Distribution Grid: Opportunities and Limitations
The growing integration of distributed renewable generation and the electrification of heating and transportation are rapidly increasing the number of flexible devices within modern distribution grids. Leveraging the aggregated flexibility of these small-scale distributed resources is essential to maintaining future grid-wide stability. This work uses the Swiss distribution grid of Walenstadt as a case study to provide insights into the aggregated flexibility potential of distribution grids. It demonstrates that incorporating devices such as heat pumps and photovoltaic systems significantly enhances distribution grid flexibility. It investigates the time-varying nature of aggregated flexibility and highlights how it can vary seasonally. Furthermore, simulations of future scenarios reveal that aggregated flexibility does not increase linearly or monotonically with higher levels of flexible device penetration. This is primarily due to the overloading of individual feeders, which underscores the impact of grid topology and network constraints on the aggregated flexibility potential.
Multipolar dynamics of social segregation: Data validation on Swedish vaccination statistics
We perform a validation analysis on the multipolar model of opinion dynamics. A general methodology for using the model on datasets of two correlated variables is proposed and tested using data on the relationship between COVID-19 vaccination rates and political participation in Sweden. The model is shown to successfully capture the opinion segregation demonstrated by the data and spatial correlation of biases is demonstrated as necessary for the result. A mixing of the biases on the other hand leads to a more homogeneous opinion distribution, and greater penetration of the majority opinion, which here corresponds to a decision to vote or vaccinate.
comment: Presented at CoDIT 2025
Performance Comparison of Gate-Based and Adiabatic Quantum Computing for Power Flow Analysis SC
In this paper, we present the first direct comparison between gate-based quantum computing (GQC) and adiabatic quantum computing (AQC) for solving the AC power flow (PF) equations. Building on the Adiabatic Quantum Power Flow (AQPF) algorithm originally designed for annealing platforms, we adapt it to the Quantum Approximate Optimization Algorithm (QAOA). The PF equations are reformulated as a combinatorial optimization problem. Numerical experiments on a 4-bus test system assess solution accuracy and computational time. Results from QAOA are benchmarked against those obtained using D-Wave's Advantage system and Fujitsu's latest generation Digital Annealer, i.e., Quantum-Inspired Integrated Optimization software (QIIO). The findings provide quantitative insights into the performance trade-offs, scalability, and practical viability of GQC versus AQC paradigms for PF analysis, highlighting the potential of quantum algorithms to address the computational challenges associated with modern electricity networks in the Noisy Intermediate-Scale Quantum (NISQ).
comment: 7 pages, 1 figure, 4 tables, submitted to PSCC 2026
Partitioned Scheduling for DAG Tasks Considering Probabilistic Execution Time
Autonomous driving systems, critical for safety, require real-time guarantees and can be modeled as DAGs. Their acceleration features, such as caches and pipelining, often result in execution times below the worst-case. Thus, a probabilistic approach ensuring constraint satisfaction within a probability threshold is more suitable than worst-case guarantees for these systems. This paper considers probabilistic guarantees for DAG tasks by utilizing the results of probabilistic guarantees for single processors, which have been relatively more advanced than those for multi-core processors. This paper proposes a task set partitioning method that guarantees schedulability under the partitioned scheduling. The evaluation on randomly generated DAG task sets demonstrates that the proposed method schedules more task sets with a smaller mean analysis time compared to existing probabilistic schedulability analysis for DAGs. The evaluation also compares four bin-packing heuristics, revealing Item-Centric Worst-Fit-Decreasing schedules the most task sets.
Safe Driving in Occluded Environments
Ensuring safe autonomous driving in the presence of occlusions poses a significant challenge in its policy design. While existing model-driven control techniques based on set invariance can handle visible risks, occlusions create latent risks in which safety-critical states are not observable. Data-driven techniques also struggle to handle latent risks because direct mappings from risk-critical objects in sensor inputs to safe actions cannot be learned without visible risk-critical objects. Motivated by these challenges, in this paper, we propose a probabilistic safety certificate for latent risk. Our key technical enabler is the application of probabilistic invariance: It relaxes the strict observability requirements imposed by set-invariance methods that demand the knowledge of risk-critical states. The proposed techniques provide linear action constraints that confine the latent risk probability within tolerance. Such constraints can be integrated into model predictive controllers or embedded in data-driven policies to mitigate latent risks. The proposed method is tested using the CARLA simulator and compared with a few existing techniques. The theoretical and empirical analysis jointly demonstrate that the proposed methods assure long-term safety in real-time control in occluded environments without being overly conservative and with transparency to exposed risks.
Decision-dependent Robust Charging Infrastructure Planning for Light-duty Truck Electrification at Industrial Sites: Scheduling and Abandonment
Many industrial sites rely on diesel-powered light-duty trucks to transport workers and small-scale facilities, which has resulted in a significant amount of greenhouse emissions (GHGs). To address this, we developed a two-stage robust charging infrastructure planning model for electrifying light-duty trucks at industrial sites. The model is formulated as a mixed-integer linear programming (MILP) that optimizes the charging infrastructure, selected from multiple charger types and potential locations, and determines opportunity charging schedules for each truck based on the chosen infrastructure. Given the strict stopping points and schedules at industrial sites, we introduced a scheduling problem with abandonment, where trucks forgo charging if their waiting times exceed a maximum threshold. We also further incorporated the impacts of overnight charging and range anxiety on waiting and abandonment behaviors. To represent the stochastic and heterogeneous parking durations of trucks, we constructed a decision-dependent robust uncertainty set in which parking time variability flexibly depends on charging choices. We applied the model in a case study of an open-pit mining site, which plans charger installations in eight zones and schedules a fleet of around 200 trucks. By decomposing the problem into monthly subproblems and using heuristic approaches, for the whole-year dataset, the model achieves an optimality gap of less than 0.1 % within a reasonable computation time under diverse uncertainty scenarios.
Time-Varying Optimization for Streaming Data Via Temporal Weighting
Classical optimization theory deals with fixed, time-invariant objective functions. However, time-varying optimization has emerged as an important subject for decision-making in dynamic environments. In this work, we study the problem of learning from streaming data through a time-varying optimization lens. Unlike prior works that focus on generic formulations, we introduce a structured, \emph{weight-based} formulation that explicitly captures the streaming-data origin of the time-varying objective, where at each time step, an agent aims to minimize a weighted average loss over all the past data samples. We focus on two specific weighting strategies: (1) uniform weights, which treat all samples equally, and (2) discounted weights, which geometrically decay the influence of older data. For both schemes, we derive tight bounds on the ``tracking error'' (TE), defined as the deviation between the model parameter and the time-varying optimum at a given time step, under gradient descent (GD) updates. We show that under uniform weighting, the TE vanishes asymptotically with a $\mathcal{O}(1/t)$ decay rate, whereas discounted weighting incurs a nonzero error floor controlled by the discount factor and the number of gradient updates performed at each time step. Our theoretical findings are validated through numerical simulations.
comment: Accepted at IEEE Asilomar, 2025
Learning Wireless Interference Patterns: Decoupled GNN for Throughput Prediction in Heterogeneous Multi-Hop p-CSMA Networks
The p-persistent CSMA protocol is central to random-access MAC analysis, but predicting saturation throughput in heterogeneous multi-hop wireless networks remains a hard problem. Simplified models that assume a single, shared interference domain can underestimate throughput by 48--62\% in sparse topologies. Exact Markov-chain analyses are accurate but scale exponentially in computation time, making them impractical for large networks. These computational barriers motivate structural machine learning approaches like GNNs for scalable throughput prediction in general network topologies. Yet off-the-shelf GNNs struggle here: a standard GCN yields 63.94\% normalized mean absolute error (NMAE) on heterogeneous networks because symmetric normalization conflates a node's direct interference with higher-order, cascading effects that pertain to how interference propagates over the network graph. Building on these insights, we propose the Decoupled Graph Convolutional Network (D-GCN), a novel architecture that explicitly separates processing of a node's own transmission probability from neighbor interference effects. D-GCN replaces mean aggregation with learnable attention, yielding interpretable, per-neighbor contribution weights while capturing complex multihop interference patterns. D-GCN attains 3.3\% NMAE, outperforms strong baselines, remains tractable even when exact analytical methods become computationally infeasible, and enables gradient-based network optimization that achieves within 1\% of theoretical optima.
Laser Fault Injection in Memristor-Based Accelerators for AI/ML and Neuromorphic Computing
Memristive crossbar arrays (MCA) are emerging as efficient building blocks for in-memory computing and neuromorphic hardware due to their high density and parallel analog matrix-vector multiplication capabilities. However, the physical properties of their nonvolatile memory elements introduce new attack surfaces, particularly under fault injection scenarios. This work explores Laser Fault Injection as a means of inducing analog perturbations in MCA-based architectures. We present a detailed threat model in which adversaries target memristive cells to subtly alter their physical properties or outputs using laser beams. Through HSPICE simulations of a large MCA on 45 nm CMOS tech. node, we show how laser-induced photocurrent manifests in output current distributions, enabling differential fault analysis to infer internal weights with up to 99.7% accuracy, replicate the model, and compromise computational integrity through targeted weight alterations by approximately 143%.
comment: 3 pages, 4 figures
Resource-Aware Stealthy Attacks in Vehicle Platoons
Connected and Autonomous Vehicles (CAVs) are transforming modern transportation by enabling cooperative applications such as vehicle platooning, where multiple vehicles travel in close formation to improve efficiency and safety. However, the heavy reliance on inter-vehicle communication makes platoons highly susceptible to attacks, where even subtle manipulations can escalate into severe physical consequences. While existing research has largely focused on defending against attacks, far less attention has been given to stealthy adversaries that aim to covertly manipulate platoon behavior. This paper introduces a new perspective on the attack design problem by demonstrating how attackers can guide platoons toward their own desired trajectories while remaining undetected. We outline conditions under which such attacks are feasible, analyze their dependence on communication topologies and control protocols, and investigate the resources required by the attacker. By characterizing the resources needed to launch stealthy attacks, we address system vulnerabilities and informing the design of resilient countermeasures. Our findings reveal critical weaknesses in current platoon architectures and anomaly detection mechanisms and provide methods to develop more secure and trustworthy CAV systems.
comment: 13 pages, 8 figures
Belief Space Control of Safety-Critical Systems Under State-Dependent Measurement Noise
Safety-critical control is imperative for deploying autonomous systems in the real world. Control Barrier Functions (CBFs) offer strong safety guarantees when accurate system and sensor models are available. However, widely used additive, fixed-noise models are not representative of complex sensor modalities with state-dependent error characteristics. Although CBFs have been designed to mitigate uncertainty using fixed worst-case bounds on measurement noise, this approach can lead to overly-conservative control. To solve this problem, we extend the Belief Control Barrier Function (BCBF) framework to accommodate state-dependent measurement noise via the Generalized Extended Kalman Filter (GEKF) algorithm, which models measurement noise as a linear function of the state. Using the original BCBF framework as baseline, we demonstrate the performance of the BCBF-GEKF approach through simulation results on a 1D single integrator setpoint tracking scenario and 2D unicycle kinematics trajectory tracking scenario. Our results confirm that the BCBF-GEKF approach offers less conservative control with greater safety.
comment: Preprint - Submitted to the 2026 American Control Conference
DiffOPF: Diffusion Solver for Optimal Power Flow
The optimal power flow (OPF) is a multi-valued, non-convex mapping from loads to dispatch setpoints. The variability of system parameters (e.g., admittances, topology) further contributes to the multiplicity of dispatch setpoints for a given load. Existing deep learning OPF solvers are single-valued and thus fail to capture the variability of system parameters unless fully represented in the feature space, which is prohibitive. To solve this problem, we introduce a diffusion-based OPF solver, termed \textit{DiffOPF}, that treats OPF as a conditional sampling problem. The solver learns the joint distribution of loads and dispatch setpoints from operational history, and returns the marginal dispatch distributions conditioned on loads. Unlike single-valued solvers, DiffOPF enables sampling statistically credible warm starts with favorable cost and constraint satisfaction trade-offs. We explore the sample complexity of DiffOPF to ensure the OPF solution within a prescribed distance from the optimization-based solution, and verify this experimentally on power system benchmarks.
comment: 7 pages, 4 figures, 2 tables
Dual Detection Framework for Faults and Integrity Attacks in Cyber-Physical Control Systems
Anomaly detection plays a vital role in the security and safety of cyber-physical control systems, and accurately distinguishing between different anomaly types is crucial for system recovery and mitigation. This study proposes a dual detection framework for anomaly detection and discrimination. By leveraging the dynamic characteristics of control loops and the stealthiness features of integrity attacks, the closed-loop stealthiness condition is first derived, and two dedicated detectors are designed and deployed on the controller side and the plant side, respectively, enabling joint plant fault and cyber attack detection. Moreover, by jointly analyzing the residual response of the two detectors corresponding to different anomalies, it is proved that the proposed method can distinguish between faults and integrity attacks due to the detectors' individual residual spaces. According to the detector's residual space, the fault and attack detection performance is further improved by a two-stage optimization scheme. Simulation results validate the effectiveness of the proposed approach.
Multi-Period Sparse Optimization for Proactive Grid Blackout Diagnosis
Existing or planned power grids need to evaluate survivability under extreme events, like a number of peak load overloading conditions, which could possibly cause system collapses (i.e. blackouts). For realistic extreme events that are correlated or share similar patterns, it is reasonable to expect that the dominant vulnerability or failure sources behind them share the same locations but with different severity. Early warning diagnosis that proactively identifies the key vulnerabilities responsible for a number of system collapses of interest can significantly enhance resilience. This paper proposes a multi-period sparse optimization method, enabling the discovery of {persistent failure sources} across a sequence of collapsed systems with increasing system stress, such as rising demand or worsening contingencies. This work defines persistency and efficiently integrates persistency constraints to capture the ``hidden'' evolving vulnerabilities. Circuit-theory based power flow formulations and circuit-inspired optimization heuristics are used to facilitate the scalability of the method. Experiments on benchmark systems show that the method reliably tracks persistent vulnerability locations under increasing load stress, and solves with scalability to large systems ({on average} taking {around} 200 s per scenario on 2000+ bus systems).
Cyber-Resilient System Identification for Power Grid through Bayesian Integration
Power grids increasingly need real-time situational awareness under the ever-evolving cyberthreat landscape. Advances in snapshot-based system identification approaches have enabled accurately estimating states and topology from a snapshot of measurement data, under random bad data and topology errors. However, modern interactive, targeted false data can stay undetectable to these methods, and significantly compromise estimation accuracy. This work advances system identification that combines snapshot-based method with time-series model via Bayesian Integration, to advance cyber resiliency against both random and targeted false data. Using a distance-based time-series model, this work can leverage historical data of different distributions induced by changes in grid topology and other settings. The normal system behavior captured from historical data is integrated into system identification through a Bayesian treatment, to make solutions robust to targeted false data. We experiment on mixed random anomalies (bad data, topology error) and targeted false data injection attack (FDIA) to demonstrate our method's 1) cyber resilience: achieving over 70% reduction in estimation error under FDIA; 2) anomalous data identification: being able to alarm and locate anomalous data; 3) almost linear scalability: achieving comparable speed with the snapshot-based baseline, both taking <1min per time tick on the large 2,383-bus system using a laptop CPU.
The Algorithmic Regulator
The regulator theorem states that, under certain conditions, any optimal controller must embody a model of the system it regulates, grounding the idea that controllers embed, explicitly or implicitly, internal models of the controlled. This principle underpins neuroscience and predictive brain theories like the Free-Energy Principle or Kolmogorov/Algorithmic Agent theory. However, the theorem is only proven in limited settings. Here, we treat the deterministic, closed, coupled world-regulator system $(W,R)$ as a single self-delimiting program $p$ via a constant-size wrapper that produces the world output string~$x$ fed to the regulator. We analyze regulation from the viewpoint of the algorithmic complexity of the output, $K(x)$. We define $R$ to be a \emph{good algorithmic regulator} if it \emph{reduces} the algorithmic complexity of the readout relative to a null (unregulated) baseline $\varnothing$, i.e., \[ \Delta = K\big(O_{W,\varnothing}\big) - K\big(O_{W,R}\big) > 0. \] We then prove that the larger $\Delta$ is, the more world-regulator pairs with high mutual algorithmic information are favored. More precisely, a complexity gap $\Delta > 0$ yields \[ \Pr\big((W,R)\mid x\big) \le C\,2^{\,M(W{:}R)}\,2^{-\Delta}, \] making low $M(W{:}R)$ exponentially unlikely as $\Delta$ grows. This is an AIT version of the idea that ``the regulator contains a model of the world.'' The framework is distribution-free, applies to individual sequences, and complements the Internal Model Principle. Beyond this necessity claim, the same coding-theorem calculus singles out a \emph{canonical scalar objective} and implicates a \emph{planner}. On the realized episode, a regulator behaves \emph{as if} it minimized the conditional description length of the readout.
comment: 2 Figures
The value of storage in electricity distribution: The role of markets
Electricity distribution companies deploy battery storage to defer grid upgrades by reducing peak demand. In deregulated jurisdictions, such storage often sits idle because regulatory constraints bar participation in electricity markets. Here, we develop an optimization framework that, to our knowledge, provides the first formal model of market participation constraints within storage investment and operation planning. Applying the framework to a Massachusetts case study, we find that market participation could deliver similar savings as peak demand reduction. Under current conditions, market participation does not increase storage investment, but at very low storage costs, could incentivize deployment beyond local distribution needs. This might run contrary to the separation of distribution from generation in deregulated markets. Our framework can identify investment levels appropriate for local distribution needs.
High-Parallel FPGA-Based Discrete Simulated Bifurcation for Large-Scale Optimization
Combinatorial Optimization (CO) problems exhibit exponential complexity, making their resolution challenging. Simulated Adiabatic Bifurcation (aSB) is a quantum-inspired algorithm to obtain approximate solutions to largescale CO problems written in the Ising form. It explores the solution space by emulating the adiabatic evolution of a network of Kerr-nonlinear parametric oscillators (KPOs), where each oscillator represents a variable in the problem. The optimal solution corresponds to the ground state of this system. A key advantage of this approach is the possibility of updating multiple variables simultaneously, making it particularly suited for hardware implementation. To enhance solution quality and convergence speed, variations of the algorithm have been proposed in the literature, including ballistic (bSB), discrete (dSB), and thermal (HbSB) versions. In this work, we have comprehensively analyzed dSB, bSB, and HbSB using dedicated software models, evaluating the feasibility of using a fixed-point representation for hardware implementation. We then present an opensource hardware architecture implementing the dSB algorithm for Field-Programmable Gate Arrays (FPGAs). The design allows users to adjust the degree of algorithmic parallelization based on their specific requirements. A proof-of-concept implementation that solves 256-variable problems was achieved on an AMD Kria KV260 SoM, a low-tier FPGA, validated using well-known max-cut and knapsack problems.
Hybrid Terrain-Aware Path Planning: Integrating VD-RRT* Exploration and VD-D* Lite Repair
Autonomous ground vehicles operating off-road must plan curvature-feasible paths while accounting for spatially varying soil strength and slope hazards in real time. We present a continuous state--cost metric that combines a Bekker pressure--sinkage model with elevation-derived slope and attitude penalties. The resulting terrain cost field is analytic, bounded, and monotonic in soil modulus and slope, ensuring well-posed discretization and stable updates under sensor noise. This metric is evaluated on a lattice with exact steering primitives: Dubins and Reeds--Shepp motions for differential drive and time-parameterized bicycle arcs for Ackermann steering. Global exploration is performed using Vehicle-Dynamics RRT\(^{*}\), while local repair is managed by Vehicle-Dynamics D\(^{*}\) Lite, enabling millisecond-scale replanning without heuristic smoothing. By separating the terrain--vehicle model from the planner, the framework provides a reusable basis for deterministic, sampling-based, or learning-driven planning in deformable terrain. Hardware trials on an off-road platform demonstrate real-time navigation across soft soil and slope transitions, supporting reliable autonomy in unstructured environments.
Product Digital Twin Supporting End-of-life Phase of Electric Vehicle Batteries Utilizing Product-Process-Resource Asset Network
In a circular economy, products in their end-of-life phase should be either remanufactured or recycled. Both of these processes are crucial for sustainability and environmental conservation. However, manufacturers frequently do not support these processes enough in terms of not sharing relevant data about the products nor their (re-)manufacturing processes. This paper proposes to accompany each product with a digital twin technology, specifically the Product Digital Twin (PDT), which can carry information for facilitating and optimizing production and remanufacturing processes. This paper introduces a knowledge representation called Bi-Flow Product-Process-Resource Asset Network (Bi-PAN). Bi-PAN extends a well-proven Product-Process-Resource Asset Network (PAN) paradigm by integrating both assembly and disassembly workflows into a single information model. Such networks enable capturing relevant relationships across products, production resources, manufacturing processes, and specific production operations that have to be done in the manufacturing phase of a product. The proposed approach is demonstrated in a use-case of disassembling electric vehicle (EV) batteries. By utilizing PDTs with Bi-PAN knowledge models, challenges associated with disassembling of EV batteries can be solved flexibly and efficiently for various battery types, enhancing the sustainability of the EV battery life-cycle management.
comment: \copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
A Personalized Data-Driven Generative Model of Human Repetitive Motion
The deployment of autonomous virtual avatars (in extended reality) and robots in human group activities -- such as rehabilitation therapy, sports, and manufacturing -- is expected to increase as these technologies become more pervasive. Designing cognitive architectures and control strategies to drive these agents requires realistic models of human motion. Furthermore, recent research has shown that each person exhibits a unique velocity signature, highlighting how individual motor behaviors are both rich in variability and internally consistent. However, existing models only provide simplified descriptions of human motor behavior, hindering the development of effective cognitive architectures. In this work, we first show that motion amplitude provides a valid and complementary characterization of individual motor signatures. Then, we propose a fully data-driven approach, based on long short-term memory neural networks, to generate original motion that captures the unique features of specific individuals. We validate the architecture using real human data from participants performing spontaneous oscillatory motion. Extensive analyses show that state-of-the-art Kuramoto-like models fail to replicate individual motor signatures, whereas our model accurately reproduces the velocity distribution and amplitude envelopes of the individual it was trained on, while remaining distinct from others.
comment: 12 pages, 6 figures
Addressing Model Inaccuracies in Transmission Network Reconfiguration via Diverse Alternatives
The ongoing energy transition places significant pressure on the transmission network due to increasing shares of renewables and electrification. To mitigate grid congestion, transmission system operators need decision support tools to suggest remedial actions, such as transmission network reconfigurations or redispatch. However, these tools are prone to model inaccuracies and may not provide relevant suggestions with regard to important unmodeled constraints or operator preferences. We propose a human-in-the-loop modeling-to-generate alternatives (HITL-MGA) approach to address these shortcomings by generating diverse topology reconfiguration alternatives. Case studies on the IEEE 57-bus and IEEE 118-bus systems show the method can leverage expert feedback and improve the quality of the suggested topology reconfigurations.
comment: This preprint is currently under peer review
Dual-Regularized Riccati Recursions for Interior-Point Optimal Control
We derive closed-form extensions of Riccati's recursions (both sequential and parallel) for solving dual-regularized LQR problems. We show how these methods can be used to solve general constrained, non-convex, discrete-time optimal control problems via a regularized interior point method, while guaranteeing that each step is a descent direction of an Augmented Barrier-Lagrangian merit function. We provide MIT-licensed implementations of our methods in C++ and JAX.
On the Fast Nonlinear Filtering with Matrix Fisher Distributions on SO(3)
This paper addresses two interrelated problems: the nonlinear filtering mechanism and fast attitude filtering with the matrix Fisher distribution (MFD) on the special orthogonal group. By analyzing the distribution evolution along Bayes' rule, we reveal two essential properties that enhance the performance of Bayesian attitude filters with MFDs, particularly in challenging conditions from a theoretical viewpoint. Benefiting from the new understanding of the filtering mechanism associated with MFDs, two closed-form filters with MFDs are then proposed. The filters avoids the burdensome computations in previous MFD-based filters by introducing linearized error systems with invariant errors but retaining the two advantageous properties. Numerical simulations demonstrate that the proposed filters are more accurate than the classic invariant Kalman filter. Besides, it is also as accurate as recent MFD-based Bayesian filters in challenging circumstances with large initial error and measurement uncertainty, but it consumes far less computation time (about 1/5 to 1/100 of previous MFD-based attitude filters).
Design and benchmarking of a two degree of freedom tendon driver unit for cable-driven wearable technologies
Exosuits have recently been developed as alternatives to rigid exoskeletons and are increasingly adopted for both upper and lower limb therapy and assistance in clinical and home environments. Many cable-driven exosuits have been developed but little has been published on their electromechanical designs and performance. Therefore, this paper presents a comprehensive design and performance analysis of a two degree of freedom tendon driver unit (TDU) for cable-driven wearable exosuits. Detailed methodologies are presented to benchmark the functionality of the TDU. A static torque output test compares the commanded and measured torques. A velocity control test evaluates the attenuation and phase shift across velocities. A noise test evaluates how loud the TDU is for the wearer under different speeds. A thermal stress test captures the cooling performance of the TDU to ensure safe operation at higher loads. Finally, a battery endurance test evaluates the runtime of the TDU under various loading conditions to inform the usable time. To demonstrate these tests, a modular TDU system for cable-driven applications is introduced, which allows components such as motors, pulleys, and sensors to be adapted based on the requirements of the intended application. By sharing detailed methodologies and performance results, this study aims to provide a TDU design that may be leveraged by others and resources for researchers and engineers to better document the capabilities of their TDU designs.
Geometric Backstepping Control of Omnidirectional Tiltrotors Incorporating Servo-Rotor Dynamics for Robustness against Sudden Disturbances
This work presents a geometric backstepping controller for a variable-tilt omnidirectional multirotor that explicitly accounts for both servo and rotor dynamics. Considering actuator dynamics is essential for more effective and reliable operation, particularly during aggressive flight maneuvers or recovery from sudden disturbances. While prior studies have investigated actuator-aware control for conventional and fixed-tilt multirotors, these approaches rely on linear relationships between actuator input and wrench, which cannot capture the nonlinearities induced by variable tilt angles. In this work, we exploit the cascade structure between the rigid-body dynamics of the multirotor and its nonlinear actuator dynamics to design the proposed backstepping controller and establish exponential stability of the overall system. Furthermore, we reveal parametric uncertainty in the actuator model through experiments, and we demonstrate that the proposed controller remains robust against such uncertainty. The controller was compared against a baseline that does not account for actuator dynamics across three experimental scenarios: fast translational tracking, rapid rotational tracking, and recovery from sudden disturbance. The proposed method consistently achieved better tracking performance, and notably, while the baseline diverged and crashed during the fastest translational trajectory tracking and the recovery experiment, the proposed controller maintained stability and successfully completed the tasks, thereby demonstrating its effectiveness.
A Verification Methodology for Safety Assurance of Robotic Autonomous Systems
Autonomous robots deployed in shared human environments, such as agricultural settings, require rigorous safety assurance to meet both functional reliability and regulatory compliance. These systems must operate in dynamic, unstructured environments, interact safely with humans, and respond effectively to a wide range of potential hazards. This paper presents a verification workflow for the safety assurance of an autonomous agricultural robot, covering the entire development life-cycle, from concept study and design to runtime verification. The outlined methodology begins with a systematic hazard analysis and risk assessment to identify potential risks and derive corresponding safety requirements. A formal model of the safety controller is then developed to capture its behaviour and verify that the controller satisfies the specified safety properties with respect to these requirements. The proposed approach is demonstrated on a field robot operating in an agricultural setting. The results show that the methodology can be effectively used to verify safety-critical properties and facilitate the early identification of design issues, contributing to the development of safer robots and autonomous systems.
comment: In Proc. of the 26th TAROS (Towards Autonomous Robotic Systems) Conference, York, UK, August, 2025
Multi Timescale Stochastic Approximation: Stability and Convergence
This paper presents the first sufficient conditions that guarantee the stability and almost sure convergence of multi-timescale stochastic approximation (SA) iterates. It extends the existing results on one-timescale and two-timescale SA iterates to general $N$-timescale stochastic recursions, for any $N \geq 1$, using the ordinary differential equation (ODE) method. As an application, we study SA algorithms augmented with heavy-ball momentum in the context of Gradient Temporal Difference (GTD) learning. The added momentum introduces an auxiliary state evolving on an intermediate timescale, yielding a three-timescale recursion. We show that with appropriate momentum parameters, the scheme fits within our framework and converges almost surely to the same fixed point as baseline GTD. The stability and convergence of all iterates including the momentum state follow from our main results without ad hoc bounds. We then study off-policy actor-critic algorithms with a baseline learner, actor, and critic updated on separate timescales. In contrast to prior work, we eliminate projection steps from the actor update and instead use our framework to guarantee stability and almost sure convergence of all components. Finally, we extend the analysis to constrained policy optimization in the average reward setting, where the actor, critic, and dual variables evolve on three distinct timescales, and we verify that the resulting dynamics satisfy the conditions of our general theorem. These examples show how diverse reinforcement learning algorithms covering momentum acceleration, off-policy learning, and primal-dual methods-fit naturally into the proposed multi-timescale framework.
comment: arXiv admin note: text overlap with arXiv:2111.11004, Added an application to the 4-Timescale case
Learning Power Flow with Confidence: A Probabilistic Guarantee Framework for Voltage Risk
The absence of formal performance guarantees in machine learning (ML) has limited its adoption for safety-critical power system applications, where confidence and interpretability are as vital as accuracy. In this work, we present a probabilistic guarantee for power flow learning and voltage risk estimation, derived through the framework of Gaussian Process (GP) regression. Specifically, we establish a bound on the expected estimation error that connects the GP's predictive variance to confidence in voltage risk estimates, ensuring statistical equivalence with Monte Carlo-based ACPF risk quantification. To enhance model learnability in the low-data regime, we first design the Vertex-Degree Kernel (VDK), a topology-aware additive kernel that decomposes voltage-load interactions into local neighborhoods for efficient large-scale learning. Building on this, we introduce a network-swipe active learning (AL) algorithm that adaptively samples informative operating points and provides a principled stopping criterion without requiring out-of-sample validation. Together, these developments mitigate the principal bottleneck of ML-based power flow-its lack of guaranteed reliability-by combining data efficiency with analytical assurance. Empirical evaluations across IEEE 118-, 500-, and 1354-bus systems confirm that the proposed VDK-GP achieves mean absolute voltage errors below 1E-03 p.u., reproduces Monte Carlo-level voltage risk estimates with 15x fewer ACPF computations, and achieves over 120x reduction in evaluation time while conservatively bounding violation probabilities.
comment: 10 pages
Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs NeurIPS 2020
We study the sequential decision making problem of maximizing the expected total reward while satisfying a constraint on the expected total utility. We employ the natural policy gradient method to solve the discounted infinite-horizon optimal control problem for Constrained Markov Decision Processes (constrained MDPs). Specifically, we propose a new Natural Policy Gradient Primal-Dual (NPG-PD) method that updates the primal variable via natural policy gradient ascent and the dual variable via projected subgradient descent. Although the underlying maximization involves a nonconcave objective function and a nonconvex constraint set, under the softmax policy parametrization, we prove that our method achieves global convergence with sublinear rates regarding both the optimality gap and the constraint violation. Such convergence is independent of the size of the state-action space, i.e., it is~dimension-free. Furthermore, for log-linear and general smooth policy parametrizations, we establish sublinear convergence rates up to a function approximation error caused by restricted policy parametrization. We also provide convergence and finite-sample complexity guarantees for two sample-based NPG-PD algorithms. We use a set of computational experiments to showcase the effectiveness of our approach.
comment: 76 pages, 4 figures, 2 tables; Journal version of the NeurIPS 2020 paper; Accepted to JMLR
Tiny Learning-Based MPC for Multirotors: Solver-Aware Learning for Efficient Embedded Predictive Control
Tiny aerial robots hold great promise for applications such as environmental monitoring and search-and-rescue, yet face significant control challenges due to limited onboard computing power and nonlinear dynamics. Model Predictive Control (MPC) enables agile trajectory tracking and constraint handling but depends on an accurate dynamics model. While existing Learning-Based (LB) MPC methods, such as Gaussian Process (GP) MPC, enhance performance by learning residual dynamics, their high computational cost restricts onboard deployment on tiny robots. This paper introduces Tiny LB MPC, a co-designed MPC framework and optimization solver for resource-constrained micro multirotor platforms. The proposed approach achieves 100 Hz control on a Crazyflie 2.1 equipped with a Teensy 4.0 microcontroller, demonstrating a 43% average improvement in tracking performance over existing embedded MPC methods under model uncertainty, and achieving the first onboard implementation of LB MPC on a 53 g multirotor.
A Faster and More Reliable Middleware for Autonomous Driving Systems
Ensuring safety in high-speed autonomous vehicles requires rapid control loops and tightly bounded delays from perception to actuation. Many open-source autonomy systems rely on ROS 2 middleware; when multiple sensor and control nodes share one compute unit, ROS 2 and its DDS transports add significant (de)serialization, copying, and discovery overheads, shrinking the available time budget. We present Sensor-in-Memory (SIM), a shared-memory transport designed for intra-host pipelines in autonomous vehicles. SIM keeps sensor data in native memory layouts (e.g., cv::Mat, PCL), uses lock-free bounded double buffers that overwrite old data to prioritize freshness, and integrates into ROS 2 nodes with four lines of code. Unlike traditional middleware, SIM operates beside ROS 2 and is optimized for applications where data freshness and minimal latency outweigh guaranteed completeness. SIM provides sequence numbers, a writer heartbeat, and optional checksums to ensure ordering, liveness, and basic integrity. On an NVIDIA Jetson Orin Nano, SIM reduces data-transport latency by up to 98% compared to ROS 2 zero-copy transports such as FastRTPS and Zenoh, lowers mean latency by about 95%, and narrows 95th/99th-percentile tail latencies by around 96%. In tests on a production-ready Level 4 vehicle running Autoware.Universe, SIM increased localization frequency from 7.5 Hz to 9.5 Hz. Applied across all latency-critical modules, SIM cut average perception-to-decision latency from 521.91 ms to 290.26 ms, reducing emergency braking distance at 40 mph (64 km/h) on dry concrete by 13.6 ft (4.14 m).
comment: 8 pages,7 figures, 8 tables
An AI-Driven Multimodal Smart Home Platform for Continuous Monitoring and Assistance in Post-Stroke Motor Impairment
At-home rehabilitation for post-stroke patients presents significant challenges, as continuous, personalized care is often limited outside clinical settings. Moreover, the lack of integrated solutions capable of simultaneously monitoring motor recovery and providing intelligent assistance in home environments hampers rehabilitation outcomes. Here, we present a multimodal smart home platform designed for continuous, at-home rehabilitation of post-stroke patients, integrating wearable sensing, ambient monitoring, and adaptive automation. A plantar pressure insole equipped with a machine learning pipeline classifies users into motor recovery stages with up to 94\% accuracy, enabling quantitative tracking of walking patterns during daily activities. An optional head-mounted eye-tracking module, together with ambient sensors such as cameras and microphones, supports seamless hands-free control of household devices with a 100\% success rate and sub-second response time. These data streams are fused locally via a hierarchical Internet of Things (IoT) architecture, ensuring low latency and data privacy. An embedded large language model (LLM) agent, Auto-Care, continuously interprets multimodal data to provide real-time interventions -- issuing personalized reminders, adjusting environmental conditions, and notifying caregivers. Implemented in a post-stroke context, this integrated smart home platform increased mean user satisfaction from 3.9 $\pm$ 0.8 in conventional home environments to 8.4 $\pm$ 0.6 with the full system ($n=20$). Beyond stroke, the system offers a scalable, patient-centered framework with potential for long-term use in broader neurorehabilitation and aging-in-place applications.
comment: 5 figures, 41 references
Electromagnetically Reconfigurable Fluid Antenna System for Wireless Communications: Design, Modeling, Algorithm, Fabrication, and Experiment
This paper presents the concept, design, channel modeling, beamforming algorithm development, prototype fabrication, and experimental measurement of an electromagnetically reconfigurable fluid antenna system (ER-FAS), in which each FAS array element features electromagnetic (EM) reconfigurability. Unlike most existing FAS works that investigate spatial reconfigurability by adjusting the position and/or orientation of array elements, the proposed ER-FAS enables direct control over the EM characteristics of each element, allowing for dynamic radiation pattern reconfigurability. Specifically, a novel ER-FAS architecture leveraging software-controlled fluidics is proposed, and corresponding wireless channel models are established. Based on this ER-FAS channel model, a low-complexity greedy beamforming algorithm is developed to jointly optimize the analog phase shift and the radiation state of each array element. The accuracy of the ER-FAS channel model and the effectiveness of the beamforming algorithm are validated through (i) full-wave EM simulations and (ii) numerical spectral efficiency evaluations. These results confirm that the proposed ER-FAS significantly enhances spectral efficiency in both near-field and far-field scenarios compared to conventional antenna arrays. To further validate this design, we fabricate prototypes for both the ER-FAS element and array, using Galinstan liquid metal alloy, fluid silver paste, and software-controlled fluidic channels. The simulation results are experimentally validated through prototype measurements conducted in an anechoic chamber. Additionally, several indoor communication experiments using a pair of software-defined radios demonstrate the superior received power and bit error rate performance of the ER-FAS prototype.
Multiagent Systems
AOAD-MAT: Transformer-based multi-agent deep reinforcement learning model considering agents' order of action decisions
Multi-agent reinforcement learning focuses on training the behaviors of multiple learning agents that coexist in a shared environment. Recently, MARL models, such as the Multi-Agent Transformer (MAT) and ACtion dEpendent deep Q-learning (ACE), have significantly improved performance by leveraging sequential decision-making processes. Although these models can enhance performance, they do not explicitly consider the importance of the order in which agents make decisions. In this paper, we propose an Agent Order of Action Decisions-MAT (AOAD-MAT), a novel MAT model that considers the order in which agents make decisions. The proposed model explicitly incorporates the sequence of action decisions into the learning process, allowing the model to learn and predict the optimal order of agent actions. The AOAD-MAT model leverages a Transformer-based actor-critic architecture that dynamically adjusts the sequence of agent actions. To achieve this, we introduce a novel MARL architecture that cooperates with a subtask focused on predicting the next agent to act, integrated into a Proximal Policy Optimization based loss function to synergistically maximize the advantage of the sequential decision-making. The proposed method was validated through extensive experiments on the StarCraft Multi-Agent Challenge and Multi-Agent MuJoCo benchmarks. The experimental results show that the proposed AOAD-MAT model outperforms existing MAT and other baseline models, demonstrating the effectiveness of adjusting the AOAD order in MARL.
comment: This manuscript is an extended version of the work accepted as a short paper at the 26th International Conference on Principles and Practice of Multi-Agent Systems (PRIMA 2025). The Version of Record of this contribution is published in Springer's Lecture Notes in Artificial Intelligence series (LNCS/LNAI)
Altruistic Ride Sharing: A Community-Driven Approach to Short-Distance Mobility
Urban mobility faces persistent challenges of congestion and fuel consumption, specifically when people choose a private, point-to-point commute option. Profit-driven ride-sharing platforms prioritize revenue over fairness and sustainability. This paper introduces Altruistic Ride-Sharing (ARS), a decentralized, peer-to-peer mobility framework where participants alternate between driver and rider roles based on altruism points rather than monetary incentives. The system integrates multi-agent reinforcement learning (MADDPG) for dynamic ride-matching, game-theoretic equilibrium guarantees for fairness, and a population model to sustain long-term balance. Using real-world New York City taxi data, we demonstrate that ARS reduces travel distance and emissions, increases vehicle utilization, and promotes equitable participation compared to both no-sharing and optimization-based baselines. These results establish ARS as a scalable, community-driven alternative to conventional ride-sharing, aligning individual behavior with collective urban sustainability goals.
comment: Submitted to IEEE Transactions on Intelligent Transportation Systems
Addressing the alignment problem in transportation policy making: an LLM approach
A key challenge in transportation planning is that the collective preferences of heterogeneous travelers often diverge from the policies produced by model-driven decision tools. This misalignment frequently results in implementation delays or failures. Here, we investigate whether large language models (LLMs), noted for their capabilities in reasoning and simulating human decision-making, can help inform and address this alignment problem. We develop a multi-agent simulation in which LLMs, acting as agents representing residents from different communities in a city, participate in a referendum on a set of transit policy proposals. Using chain-of-thought reasoning, LLM agents provide ranked-choice or approval-based preferences, which are aggregated using instant-runoff voting (IRV) to model democratic consensus. We implement this simulation framework with both GPT-4o and Claude-3.5, and apply it for Chicago and Houston. Our findings suggest that LLM agents are capable of approximating plausible collective preferences and responding to local context, while also displaying model-specific behavioral biases and modest divergences from optimization-based benchmarks. These capabilities underscore both the promise and limitations of LLMs as tools for solving the alignment problem in transportation decision-making.
Agentic Discovery: Closing the Loop with Cooperative Agents
As data-driven methods, artificial intelligence (AI), and automated workflows accelerate scientific tasks, we see the rate of discovery increasingly limited by human decision-making tasks such as setting objectives, generating hypotheses, and designing experiments. We postulate that cooperative agents are needed to augment the role of humans and enable autonomous discovery. Realizing such agents will require progress in both AI and infrastructure.
comment: Published in IEEE Computer Volume 58 Issue 10
Formalizing the Safety, Security, and Functional Properties of Agentic AI Systems
Agentic AI systems, which leverage multiple autonomous agents and Large Language Models (LLMs), are increasingly used to address complex, multi-step tasks. The safety, security, and functionality of these systems are critical, especially in high-stakes applications. However, the current ecosystem of inter-agent communication is fragmented, with protocols such as the Model Context Protocol (MCP) for tool access and the Agent-to-Agent (A2A) protocol for coordination being analyzed in isolation. This fragmentation creates a semantic gap that prevents the rigorous analysis of system properties and introduces risks such as architectural misalignment and exploitable coordination issues. To address these challenges, we introduce a modeling framework for agentic AI systems composed of two foundational models. The first, the host agent model, formalizes the top-level entity that interacts with the user, decomposes tasks, and orchestrates their execution by leveraging external agents and tools. The second, the task lifecycle model, details the states and transitions of individual sub-tasks from creation to completion, providing a fine-grained view of task management and error handling. Together, these models provide a unified semantic framework for reasoning about the behavior of multi-AI agent systems. Grounded in this framework, we define 17 properties for the host agent and 14 for the task lifecycle, categorized into liveness, safety, completeness, and fairness. Expressed in temporal logic, these properties enable formal verification of system behavior, detection of coordination edge cases, and prevention of deadlocks and security vulnerabilities. Through this effort, we introduce the first rigorously grounded, domain-agnostic framework for the systematic analysis, design, and deployment of correct, reliable, and robust agentic AI systems.
Stop Reducing Responsibility in LLM-Powered Multi-Agent Systems to Local Alignment
LLM-powered Multi-Agent Systems (LLM-MAS) unlock new potentials in distributed reasoning, collaboration, and task generalization but also introduce additional risks due to unguaranteed agreement, cascading uncertainty, and adversarial vulnerabilities. We argue that ensuring responsible behavior in such systems requires a paradigm shift: from local, superficial agent-level alignment to global, systemic agreement. We conceptualize responsibility not as a static constraint but as a lifecycle-wide property encompassing agreement, uncertainty, and security, each requiring the complementary integration of subjective human-centered values and objective verifiability. Furthermore, a dual-perspective governance framework that combines interdisciplinary design with human-AI collaborative oversight is essential for tracing and ensuring responsibility throughout the lifecycle of LLM-MAS. Our position views LLM-MAS not as loose collections of agents, but as unified, dynamic socio-technical systems that demand principled mechanisms to support each dimension of responsibility and enable ethically aligned, verifiably coherent, and resilient behavior for sustained, system-wide agreement.
comment: Under Review
Static Sandboxes Are Inadequate: Modeling Societal Complexity Requires Open-Ended Co-Evolution in LLM-Based Multi-Agent Simulations
What if artificial agents could not just communicate, but also evolve, adapt, and reshape their worlds in ways we cannot fully predict? With llm now powering multi-agent systems and social simulations, we are witnessing new possibilities for modeling open-ended, ever-changing environments. Yet, most current simulations remain constrained within static sandboxes, characterized by predefined tasks, limited dynamics, and rigid evaluation criteria. These limitations prevent them from capturing the complexity of real-world societies. In this paper, we argue that static, task-specific benchmarks are fundamentally inadequate and must be rethought. We critically review emerging architectures that blend llm with multi-agent dynamics, highlight key hurdles such as balancing stability and diversity, evaluating unexpected behaviors, and scaling to greater complexity, and introduce a fresh taxonomy for this rapidly evolving field. Finally, we present a research roadmap centered on open-endedness, continuous co-evolution, and the development of resilient, socially aligned AI ecosystems. \textbf{We call on the community to move beyond static paradigms and help shape the next generation of adaptive, socially-aware multi-agent simulations.}
GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling
The emergence of large language models (LLMs) enables the development of intelligent agents capable of engaging in complex and multi-turn dialogues. However, multi-agent collaboration faces critical safety challenges, such as hallucination amplification and error injection and propagation. This paper presents GUARDIAN, a unified method for detecting and mitigating multiple safety concerns in GUARDing Intelligent Agent collaboratioNs. By modeling the multi-agent collaboration process as a discrete-time temporal attributed graph, GUARDIAN explicitly captures the propagation dynamics of hallucinations and errors. The unsupervised encoder-decoder architecture incorporating an incremental training paradigm learns to reconstruct node attributes and graph structures from latent embeddings, enabling the identification of anomalous nodes and edges with unparalleled precision. Moreover, we introduce a graph abstraction mechanism based on the Information Bottleneck Theory, which compresses temporal interaction graphs while preserving essential patterns. Extensive experiments demonstrate GUARDIAN's effectiveness in safeguarding LLM multi-agent collaborations against diverse safety vulnerabilities, achieving state-of-the-art accuracy with efficient resource utilization. The code is available at https://github.com/JialongZhou666/GUARDIAN
MACTAS: Self-Attention-Based Module for Inter-Agent Communication in Multi-Agent Reinforcement Learning AAMAS 2026
Communication is essential for the collective execution of complex tasks by human agents, motivating interest in communication mechanisms for multi-agent reinforcement learning (MARL). However, existing communication protocols in MARL are often complex and non-differentiable. In this work, we introduce a self-attention-based communication module that exchanges information between the agents in MARL. Our proposed approach is fully differentiable, allowing agents to learn to generate messages in a reward-driven manner. The module can be seamlessly integrated with any action-value function decomposition method and can be viewed as an extension of such decompositions. Notably, it includes a fixed number of trainable parameters, independent of the number of agents. Experimental results on the SMAC and SMACv2 benchmarks demonstrate the effectiveness of our approach, which achieves state-of-the-art performance on a number of maps.
comment: Submitted for AAMAS 2026
Benchmarking LLMs' Swarm intelligence
Large Language Models (LLMs) show potential for complex reasoning, yet their capacity for emergent coordination in Multi-Agent Systems (MAS) when operating under strict swarm-like constraints-limited local perception and communication-remains largely unexplored. Existing benchmarks often do not fully capture the unique challenges of decentralized coordination when agents operate with incomplete spatio-temporal information. To bridge this gap, we introduce SwarmBench, a novel benchmark designed to systematically evaluate the swarm intelligence capabilities of LLMs acting as decentralized agents. SwarmBench features five foundational MAS coordination tasks (Pursuit, Synchronization, Foraging, Flocking, Transport) within a configurable 2D grid environment, forcing agents to rely solely on local sensory input ($k\times k$ view) and local communication. We propose metrics for coordination effectiveness and analyze emergent group dynamics. Zero-shot evaluations of leading LLMs (e.g., deepseek-v3, o4-mini) reveal significant task-dependent performance variations. While some rudimentary coordination is observed, our results indicate that current LLMs significantly struggle with robust long-range planning and adaptive strategy formation under the uncertainty inherent in these decentralized scenarios. Assessing LLMs under such swarm-like constraints is crucial for understanding their utility in future decentralized intelligent systems. We release SwarmBench as an open, extensible toolkit-built on a customizable physical system-providing environments, prompts, evaluation scripts, and comprehensive datasets. This aims to foster reproducible research into LLM-based MAS coordination and the theoretical underpinnings of emergent collective behavior under severe informational decentralization. Our code repository is available at https://github.com/x66ccff/swarmbench.
Constant-Memory Strategies in Stochastic Games: Best Responses and Equilibria
Stochastic games have become a prevalent framework for studying long-term multi-agent interactions, especially in the context of multi-agent reinforcement learning. In this work, we comprehensively investigate the concept of constant-memory strategies in stochastic games. We first establish some results on best responses and Nash equilibria for behavioral constant-memory strategies, followed by a discussion on the computational hardness of best responding to mixed constant-memory strategies. Those theoretic insights are later verified on several sequential decision-making testbeds, including the $\textit{Iterated Prisoner's Dilemma}$, the $\textit{Iterated Traveler's Dilemma}$, and the $\textit{Pursuit}$ domain. This work aims to enhance the understanding of theoretical issues in single-agent planning under multi-agent systems, and uncover the connection between decision models in single-agent and multi-agent contexts. The code is available at $\texttt{https://github.com/Fernadoo/Const-Mem.}$
comment: 21 pages. Under review
Robotics
HYPE: Hybrid Planning with Ego Proposal-Conditioned Predictions
Safe and interpretable motion planning in complex urban environments needs to reason about bidirectional multi-agent interactions. This reasoning requires to estimate the costs of potential ego driving maneuvers. Many existing planners generate initial trajectories with sampling-based methods and refine them by optimizing on learned predictions of future environment states, which requires a cost function that encodes the desired vehicle behavior. Designing such a cost function can be very challenging, especially if a wide range of complex urban scenarios has to be considered. We propose HYPE: HYbrid Planning with Ego proposal-conditioned predictions, a planner that integrates multimodal trajectory proposals from a learned proposal model as heuristic priors into a Monte Carlo Tree Search (MCTS) refinement. To model bidirectional interactions, we introduce an ego-conditioned occupancy prediction model, enabling consistent, scene-aware reasoning. Our design significantly simplifies cost function design in refinement by considering proposal-driven guidance, requiring only minimalistic grid-based cost terms. Evaluations on large-scale real-world benchmarks nuPlan and DeepUrban show that HYPE effectively achieves state-of-the-art performance, especially in safety and adaptability.
T(R,O) Grasp: Efficient Graph Diffusion of Robot-Object Spatial Transformation for Cross-Embodiment Dexterous Grasping
Dexterous grasping remains a central challenge in robotics due to the complexity of its high-dimensional state and action space. We introduce T(R,O) Grasp, a diffusion-based framework that efficiently generates accurate and diverse grasps across multiple robotic hands. At its core is the T(R,O) Graph, a unified representation that models spatial transformations between robotic hands and objects while encoding their geometric properties. A graph diffusion model, coupled with an efficient inverse kinematics solver, supports both unconditioned and conditioned grasp synthesis. Extensive experiments on a diverse set of dexterous hands show that T(R,O) Grasp achieves average success rate of 94.83%, inference speed of 0.21s, and throughput of 41 grasps per second on an NVIDIA A100 40GB GPU, substantially outperforming existing baselines. In addition, our approach is robust and generalizable across embodiments while significantly reducing memory consumption. More importantly, the high inference speed enables closed-loop dexterous manipulation, underscoring the potential of T(R,O) Grasp to scale into a foundation model for dexterous grasping.
comment: 12 pages, 14 figures
Residual MPC: Blending Reinforcement Learning with GPU-Parallelized Model Predictive Control
Model Predictive Control (MPC) provides interpretable, tunable locomotion controllers grounded in physical models, but its robustness depends on frequent replanning and is limited by model mismatch and real-time computational constraints. Reinforcement Learning (RL), by contrast, can produce highly robust behaviors through stochastic training but often lacks interpretability, suffers from out-of-distribution failures, and requires intensive reward engineering. This work presents a GPU-parallelized residual architecture that tightly integrates MPC and RL by blending their outputs at the torque-control level. We develop a kinodynamic whole-body MPC formulation evaluated across thousands of agents in parallel at 100 Hz for RL training. The residual policy learns to make targeted corrections to the MPC outputs, combining the interpretability and constraint handling of model-based control with the adaptability of RL. The model-based control prior acts as a strong bias, initializing and guiding the policy towards desirable behavior with a simple set of rewards. Compared to standalone MPC or end-to-end RL, our approach achieves higher sample efficiency, converges to greater asymptotic rewards, expands the range of trackable velocity commands, and enables zero-shot adaptation to unseen gaits and uneven terrain.
comment: TRO submission preprint
Reflection-Based Task Adaptation for Self-Improving VLA
Pre-trained Vision-Language-Action (VLA) models represent a major leap towards general-purpose robots, yet efficiently adapting them to novel, specific tasks in-situ remains a significant hurdle. While reinforcement learning (RL) is a promising avenue for such adaptation, the process often suffers from low efficiency, hindering rapid task mastery. We introduce Reflective Self-Adaptation, a framework for rapid, autonomous task adaptation without human intervention. Our framework establishes a self-improving loop where the agent learns from its own experience to enhance both strategy and execution. The core of our framework is a dual-pathway architecture that addresses the full adaptation lifecycle. First, a Failure-Driven Reflective RL pathway enables rapid learning by using the VLM's causal reasoning to automatically synthesize a targeted, dense reward function from failure analysis. This provides a focused learning signal that significantly accelerates policy exploration. However, optimizing such proxy rewards introduces a potential risk of "reward hacking," where the agent masters the reward function but fails the actual task. To counteract this, our second pathway, Success-Driven Quality-Guided SFT, grounds the policy in holistic success. It identifies and selectively imitates high-quality successful trajectories, ensuring the agent remains aligned with the ultimate task goal. This pathway is strengthened by a conditional curriculum mechanism to aid initial exploration. We conduct experiments in challenging manipulation tasks. The results demonstrate that our framework achieves faster convergence and higher final success rates compared to representative baselines. Our work presents a robust solution for creating self-improving agents that can efficiently and reliably adapt to new environments.
EReLiFM: Evidential Reliability-Aware Residual Flow Meta-Learning for Open-Set Domain Generalization under Noisy Labels
Open-Set Domain Generalization (OSDG) aims to enable deep learning models to recognize unseen categories in new domains, which is crucial for real-world applications. Label noise hinders open-set domain generalization by corrupting source-domain knowledge, making it harder to recognize known classes and reject unseen ones. While existing methods address OSDG under Noisy Labels (OSDG-NL) using hyperbolic prototype-guided meta-learning, they struggle to bridge domain gaps, especially with limited clean labeled data. In this paper, we propose Evidential Reliability-Aware Residual Flow Meta-Learning (EReLiFM). We first introduce an unsupervised two-stage evidential loss clustering method to promote label reliability awareness. Then, we propose a residual flow matching mechanism that models structured domain- and category-conditioned residuals, enabling diverse and uncertainty-aware transfer paths beyond interpolation-based augmentation. During this meta-learning process, the model is optimized such that the update direction on the clean set maximizes the loss decrease on the noisy set, using pseudo labels derived from the most confident predicted class for supervision. Experimental results show that EReLiFM outperforms existing methods on OSDG-NL, achieving state-of-the-art performance. The source code is available at https://github.com/KPeng9510/ERELIFM.
comment: The source code is available at https://github.com/KPeng9510/ERELIFM
Autonomous Legged Mobile Manipulation for Lunar Surface Operations via Constrained Reinforcement Learning
Robotics plays a pivotal role in planetary science and exploration, where autonomous and reliable systems are crucial due to the risks and challenges inherent to space environments. The establishment of permanent lunar bases demands robotic platforms capable of navigating and manipulating in the harsh lunar terrain. While wheeled rovers have been the mainstay for planetary exploration, their limitations in unstructured and steep terrains motivate the adoption of legged robots, which offer superior mobility and adaptability. This paper introduces a constrained reinforcement learning framework designed for autonomous quadrupedal mobile manipulators operating in lunar environments. The proposed framework integrates whole-body locomotion and manipulation capabilities while explicitly addressing critical safety constraints, including collision avoidance, dynamic stability, and power efficiency, in order to ensure robust performance under lunar-specific conditions, such as reduced gravity and irregular terrain. Experimental results demonstrate the framework's effectiveness in achieving precise 6D task-space end-effector pose tracking, achieving an average positional accuracy of 4 cm and orientation accuracy of 8.1 degrees. The system consistently respects both soft and hard constraints, exhibiting adaptive behaviors optimized for lunar gravity conditions. This work effectively bridges adaptive learning with essential mission-critical safety requirements, paving the way for advanced autonomous robotic explorers for future lunar missions.
comment: This is the authors version of the paper accepted for publication in The IEEE International Conference on Space Robotics 2025. The final version link will be added here after conference proceedings are published
Maximal Adaptation, Minimal Guidance: Permissive Reactive Robot Task Planning with Humans in the Loop
We present a novel framework for human-robot \emph{logical} interaction that enables robots to reliably satisfy (infinite horizon) temporal logic tasks while effectively collaborating with humans who pursue independent and unknown tasks. The framework combines two key capabilities: (i) \emph{maximal adaptation} enables the robot to adjust its strategy \emph{online} to exploit human behavior for cooperation whenever possible, and (ii) \emph{minimal tunable feedback} enables the robot to request cooperation by the human online only when necessary to guarantee progress. This balance minimizes human-robot interference, preserves human autonomy, and ensures persistent robot task satisfaction even under conflicting human goals. We validate the approach in a real-world block-manipulation task with a Franka Emika Panda robotic arm and in the Overcooked-AI benchmark, demonstrating that our method produces rich, \emph{emergent} cooperative behaviors beyond the reach of existing approaches, while maintaining strong formal guarantees.
Designing Tools with Control Confidence
Prehistoric humans invented stone tools for specialized tasks by not just maximizing the tool's immediate goal-completion accuracy, but also increasing their confidence in the tool for later use under similar settings. This factor contributed to the increased robustness of the tool, i.e., the least performance deviations under environmental uncertainties. However, the current autonomous tool design frameworks solely rely on performance optimization, without considering the agent's confidence in tool use for repeated use. Here, we take a step towards filling this gap by i) defining an optimization framework for task-conditioned autonomous hand tool design for robots, where ii) we introduce a neuro-inspired control confidence term into the optimization routine that helps the agent to design tools with higher robustness. Through rigorous simulations using a robotic arm, we show that tools designed with control confidence as the objective function are more robust to environmental uncertainties during tool use than a pure accuracy-driven objective. We further show that adding control confidence to the objective function for tool design provides a balance between the robustness and goal accuracy of the designed tools under control perturbations. Finally, we show that our CMAES-based evolutionary optimization strategy for autonomous tool design outperforms other state-of-the-art optimizers by designing the optimal tool within the fewest iterations. Code: https://github.com/ajitham123/Tool_design_control_confidence.
Learning Robust Agile Flight Control with Stability Guarantees
In the evolving landscape of high-speed agile quadrotor flight, achieving precise trajectory tracking at the platform's operational limits is paramount. Controllers must handle actuator constraints, exhibit robustness to disturbances, and remain computationally efficient for safety-critical applications. In this work, we present a novel neural-augmented feedback controller for agile flight control. The controller addresses individual limitations of existing state-of-the-art control paradigms and unifies their strengths. We demonstrate the controller's capabilities, including the accurate tracking of highly aggressive trajectories that surpass the feasibility of the actuators. Notably, the controller provides universal stability guarantees, enhancing its robustness and tracking performance even in exceedingly disturbance-prone settings. Its nonlinear feedback structure is highly efficient enabling fast computation at high update rates. Moreover, the learning process in simulation is both fast and stable, and the controller's inherent robustness allows direct deployment to real-world platforms without the need for training augmentations or fine-tuning.
CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving
End-to-end autonomous driving models trained solely with imitation learning (IL) often suffer from poor generalization. In contrast, reinforcement learning (RL) promotes exploration through reward maximization but faces challenges such as sample inefficiency and unstable convergence. A natural solution is to combine IL and RL. Moving beyond the conventional two-stage paradigm (IL pretraining followed by RL fine-tuning), we propose CoIRL-AD, a competitive dual-policy framework that enables IL and RL agents to interact during training. CoIRL-AD introduces a competition-based mechanism that facilitates knowledge exchange while preventing gradient conflicts. Experiments on the nuScenes dataset show an 18% reduction in collision rate compared to baselines, along with stronger generalization and improved performance on long-tail scenarios. Code is available at: https://github.com/SEU-zxj/CoIRL-AD.
comment: 18 pages, 17 figures
Two-stream network-driven vision-based tactile sensor for object feature extraction and fusion perception
Tactile perception is crucial for embodied intelligent robots to recognize objects. Vision-based tactile sensors extract object physical attributes multidimensionally using high spatial resolution; however, this process generates abundant redundant information. Furthermore, single-dimensional extraction, lacking effective fusion, fails to fully characterize object attributes. These challenges hinder the improvement of recognition accuracy. To address this issue, this study introduces a two-stream network feature extraction and fusion perception strategy for vision-based tactile systems. This strategy employs a distributed approach to extract internal and external object features. It obtains depth map information through three-dimensional reconstruction while simultaneously acquiring hardness information by measuring contact force data. After extracting features with a convolutional neural network (CNN), weighted fusion is applied to create a more informative and effective feature representation. In standard tests on objects of varying shapes and hardness, the force prediction error is 0.06 N (within a 12 N range). Hardness recognition accuracy reaches 98.0%, and shape recognition accuracy reaches 93.75%. With fusion algorithms, object recognition accuracy in actual grasping scenarios exceeds 98.5%. Focused on object physical attributes perception, this method enhances the artificial tactile system ability to transition from perception to cognition, enabling its use in embodied perception applications.
Automated Behavior Planning for Fruit Tree Pruning via Redundant Robot Manipulators: Addressing the Behavior Planning Challenge
Pruning is an essential agricultural practice for orchards. Proper pruning can promote healthier growth and optimize fruit production throughout the orchard's lifespan. Robot manipulators have been developed as an automated solution for this repetitive task, which typically requires seasonal labor with specialized skills. While previous research has primarily focused on the challenges of perception, the complexities of manipulation are often overlooked. These challenges involve planning and control in both joint and Cartesian spaces to guide the end-effector through intricate, obstructive branches. Our work addresses the behavior planning challenge for a robotic pruning system, which entails a multi-level planning problem in environments with complex collisions. In this paper, we formulate the planning problem for a high-dimensional robotic arm in a pruning scenario, investigate the system's intrinsic redundancies, and propose a comprehensive pruning workflow that integrates perception, modeling, and holistic planning. In our experiments, we demonstrate that more comprehensive planning methods can significantly enhance the performance of the robotic manipulator. Finally, we implement the proposed workflow on a real-world robot. As a result, this work complements previous efforts on robotic pruning and motivates future research and development in planning for pruning applications.
Fast Visuomotor Policy for Robotic Manipulation
We present a fast and effective policy framework for robotic manipulation, named Energy Policy, designed for high-frequency robotic tasks and resource-constrained systems. Unlike existing robotic policies, Energy Policy natively predicts multimodal actions in a single forward pass, enabling high-precision manipulation at high speed. The framework is built upon two core components. First, we adopt the energy score as the learning objective to facilitate multimodal action modeling. Second, we introduce an energy MLP to implement the proposed objective while keeping the architecture simple and efficient. We conduct comprehensive experiments in both simulated environments and real-world robotic tasks to evaluate the effectiveness of Energy Policy. The results show that Energy Policy matches or surpasses the performance of state-of-the-art manipulation methods while significantly reducing computational overhead. Notably, on the MimicGen benchmark, Energy Policy achieves superior performance with at a faster inference compared to existing approaches.
A Task-Efficient Reinforcement Learning Task-Motion Planner for Safe Human-Robot Cooperation
In a Human-Robot Cooperation (HRC) environment, safety and efficiency are the two core properties to evaluate robot performance. However, safety mechanisms usually hinder task efficiency since human intervention will cause backup motions and goal failures of the robot. Frequent motion replanning will increase the computational load and the chance of failure. In this paper, we present a hybrid Reinforcement Learning (RL) planning framework which is comprised of an interactive motion planner and a RL task planner. The RL task planner attempts to choose statistically safe and efficient task sequences based on the feedback from the motion planner, while the motion planner keeps the task execution process collision-free by detecting human arm motions and deploying new paths when the previous path is not valid anymore. Intuitively, the RL agent will learn to avoid dangerous tasks, while the motion planner ensures that the chosen tasks are safe. The proposed framework is validated on the cobot in both simulation and the real world, we compare the planner with hard-coded task motion planning methods. The results show that our planning framework can 1) react to uncertain human motions at both joint and task levels; 2) reduce the times of repeating failed goal commands; 3) reduce the total number of replanning requests.
M3D-skin: Multi-material 3D-printed Tactile Sensor with Hierarchical Infill Structures for Pressure Sensing IROS2025
Tactile sensors have a wide range of applications, from utilization in robotic grippers to human motion measurement. If tactile sensors could be fabricated and integrated more easily, their applicability would further expand. In this study, we propose a tactile sensor-M3D-skin-that can be easily fabricated with high versatility by leveraging the infill patterns of a multi-material fused deposition modeling (FDM) 3D printer as the sensing principle. This method employs conductive and non-conductive flexible filaments to create a hierarchical structure with a specific infill pattern. The flexible hierarchical structure deforms under pressure, leading to a change in electrical resistance, enabling the acquisition of tactile information. We measure the changes in characteristics of the proposed tactile sensor caused by modifications to the hierarchical structure. Additionally, we demonstrate the fabrication and use of a multi-tile sensor. Furthermore, as applications, we implement motion pattern measurement on the sole of a foot, integration with a robotic hand, and tactile-based robotic operations. Through these experiments, we validate the effectiveness of the proposed tactile sensor.
comment: Accepted to IROS2025, Website: https://ssk-yoshimura.github.io/M3D-skin/
Robot Learning: A Tutorial
Robot learning is at an inflection point, driven by rapid advancements in machine learning and the growing availability of large-scale robotics data. This shift from classical, model-based methods to data-driven, learning-based paradigms is unlocking unprecedented capabilities in autonomous systems. This tutorial navigates the landscape of modern robot learning, charting a course from the foundational principles of Reinforcement Learning and Behavioral Cloning to generalist, language-conditioned models capable of operating across diverse tasks and even robot embodiments. This work is intended as a guide for researchers and practitioners, and our goal is to equip the reader with the conceptual understanding and practical tools necessary to contribute to developments in robot learning, with ready-to-use examples implemented in $\texttt{lerobot}$.
comment: Tutorial on Robot Learning using LeRobot, the end-to-end robot learning library developed by Hugging Face
Improving Generative Behavior Cloning via Self-Guidance and Adaptive Chunking NeurIPS25
Generative Behavior Cloning (GBC) is a simple yet effective framework for robot learning, particularly in multi-task settings. Recent GBC methods often employ diffusion policies with open-loop (OL) control, where actions are generated via a diffusion process and executed in multi-step chunks without replanning. While this approach has demonstrated strong success rates and generalization, its inherent stochasticity can result in erroneous action sampling, occasionally leading to unexpected task failures. Moreover, OL control suffers from delayed responses, which can degrade performance in noisy or dynamic environments. To address these limitations, we propose two novel techniques to enhance the consistency and reactivity of diffusion policies: (1) self-guidance, which improves action fidelity by leveraging past observations and implicitly promoting future-aware behavior; and (2) adaptive chunking, which selectively updates action sequences when the benefits of reactivity outweigh the need for temporal consistency. Extensive experiments show that our approach substantially improves GBC performance across a wide range of simulated and real-world robotic manipulation tasks. Our code is available at https://github.com/junhyukso/SGAC
comment: Accepted at NeurIPS25
Controlling Intent Expressiveness in Robot Motion with Diffusion Models
Legibility of robot motion is critical in human-robot interaction, as it allows humans to quickly infer a robot's intended goal. Although traditional trajectory generation methods typically prioritize efficiency, they often fail to make the robot's intentions clear to humans. Meanwhile, existing approaches to legible motion usually produce only a single "most legible" trajectory, overlooking the need to modulate intent expressiveness in different contexts. In this work, we propose a novel motion generation framework that enables controllable legibility across the full spectrum, from highly legible to highly ambiguous motions. We introduce a modeling approach based on an Information Potential Field to assign continuous legibility scores to trajectories, and build upon it with a two-stage diffusion framework that first generates paths at specified legibility levels and then translates them into executable robot actions. Experiments in both 2D and 3D reaching tasks demonstrate that our approach produces diverse and controllable motions with varying degrees of legibility, while achieving performance comparable to SOTA. Code and project page: https://legibility-modulator.github.io.
comment: Using diffusion models trained on quality diversity datasets for generating robot motions with adjustable legibility levels
Pretraining in Actor-Critic Reinforcement Learning for Robot Motion Control ICLR 2026
The pretraining-finetuning paradigm has facilitated numerous transformative advancements in artificial intelligence research in recent years. However, in the domain of reinforcement learning (RL) for robot motion control, individual skills are often learned from scratch despite the high likelihood that some generalizable knowledge is shared across all task-specific policies belonging to a single robot embodiment. This work aims to define a paradigm for pretraining neural network models that encapsulate such knowledge and can subsequently serve as a basis for warm-starting the RL process in classic actor-critic algorithms, such as Proximal Policy Optimization (PPO). We begin with a task-agnostic exploration-based data collection algorithm to gather diverse, dynamic transition data, which is then used to train a Proprioceptive Inverse Dynamics Model (PIDM) through supervised learning. The pretrained weights are loaded into both the actor and critic networks to warm-start the policy optimization of actual tasks. We systematically validated our proposed method on seven distinct robot motion control tasks, showing significant benefits to this initialization strategy. Our proposed approach on average improves sample efficiency by 40.1% and task performance by 7.5%, compared to random initialization. We further present key ablation studies and empirical analyses that shed light on the mechanisms behind the effectiveness of our method.
comment: Submitted to ICLR 2026
A Unidirectionally Connected FAS Approach for 6-DOF Quadrotor Control
This paper proposes a unidirectionally connected fully actuated system (UC-FAS) approach for the sub-stabilization and tracking control of 6-DOF quadrotors, tackling limitations both in state-space and FAS framework to some extent. The framework systematically converts underactuated quadrotor dynamics into a UC-FAS model, unifying the existing different FAS transformation ways. By eliminating estimation of the high-order derivatives of control inputs, a drawback of current methods, the UC-FAS model simplifies controller design and enables direct eigenstructure assignment for closed-loop dynamics. Simulations demonstrate precise 6-DOF tracking performance. This work bridges theoretical FAS approach advancements with practical implementation needs, offering a standardized paradigm for nonlinear quadrotor control.
comment: This paper has been submitted to 2026 IFAC World Congress. Corresponding author: Guang-Ren Duan
PolygMap: A Perceptive Locomotion Framework for Humanoid Robot Stair Climbing
Recently, biped robot walking technology has been significantly developed, mainly in the context of a bland walking scheme. To emulate human walking, robots need to step on the positions they see in unknown spaces accurately. In this paper, we present PolyMap, a perception-based locomotion planning framework for humanoid robots to climb stairs. Our core idea is to build a real-time polygonal staircase plane semantic map, followed by a footstep planar using these polygonal plane segments. These plane segmentation and visual odometry are done by multi-sensor fusion(LiDAR, RGB-D camera and IMUs). The proposed framework is deployed on a NVIDIA Orin, which performs 20-30 Hz whole-body motion planning output. Both indoor and outdoor real-scene experiments indicate that our method is efficient and robust for humanoid robot stair climbing.
Achieving Meaningful Collaboration: Worker-centered Design of a Physical Human-Robot Collaborative Blending Task ICRA
The use of robots in industrial settings continues to grow, driven by the need to address complex societal challenges such as labor shortages, aging populations, and ever-increasing production demands. In this abstract, we advocate for (and demonstrate) a transdisciplinary approach when considering robotics in the workplace. Transdisciplinarity emphasizes the integration of academic research with pragmatic expertise and embodied experiential knowledge, that prioritize values such as worker wellbeing and job attractiveness. In the following, we describe an ongoing multi-pronged effort to explore the potential of collaborative robots in the context of airplane engine repair and maintenance operations.
comment: 3 pages, 1 figure, ICRA@40 (Extended abstract)
Shape-Aware Whole-Body Control for Continuum Robots with Application in Endoluminal Surgical Robotics
This paper presents a shape-aware whole-body control framework for tendon-driven continuum robots with direct application to endoluminal surgical navigation. Endoluminal procedures, such as bronchoscopy, demand precise and safe navigation through tortuous, patient-specific anatomy where conventional tip-only control often leads to wall contact, tissue trauma, or failure to reach distal targets. To address these challenges, our approach combines a physics-informed backbone model with residual learning through an Augmented Neural ODE, enabling accurate shape estimation and efficient Jacobian computation. A sampling-based Model Predictive Path Integral (MPPI) controller leverages this representation to jointly optimize tip tracking, backbone conformance, and obstacle avoidance under actuation constraints. A task manager further enhances adaptability by allowing real-time adjustment of objectives, such as wall clearance or direct advancement, during tele-operation. Extensive simulation studies demonstrate millimeter-level accuracy across diverse scenarios, including trajectory tracking, dynamic obstacle avoidance, and shape-constrained reaching. Real-robot experiments on a bronchoscopy phantom validate the framework, showing improved lumen-following accuracy, reduced wall contacts, and enhanced adaptability compared to joystick-only navigation and existing baselines. These results highlight the potential of the proposed framework to increase safety, reliability, and operator efficiency in minimally invasive endoluminal surgery, with broader applicability to other confined and safety-critical environments.
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model
Vision-language-action (VLA) models have recently shown strong potential in enabling robots to follow language instructions and execute precise actions. However, most VLAs are built upon vision-language models pretrained solely on 2D data, which lack accurate spatial awareness and hinder their ability to operate in the 3D physical world. Existing solutions attempt to incorporate explicit 3D sensor inputs such as depth maps or point clouds, but these approaches face challenges due to sensor noise, hardware heterogeneity, and incomplete depth coverage in existing datasets. Alternative methods that estimate 3D cues from 2D images also suffer from the limited performance of depth estimators.We propose Spatial Forcing (SF), a simple yet effective alignment strategy that implicitly forces VLA models to develop spatial comprehension capabilities without relying on explicit 3D inputs or depth estimators. SF aligns intermediate visual embeddings of VLAs with geometric representations produced by pretrained 3D foundation models. By enforcing alignment at intermediate layers, SF guides VLAs to encode richer spatial representations that enhance action precision.Extensive experiments in simulation and real-world environments demonstrate that SF achieves state-of-the-art results, surpassing both 2D- and 3D-based VLAs. SF further accelerates training by up to 3.8x and improves data efficiency across diverse robotic tasks. Project page is at https://spatial-forcing.github.io/
Learning Social Navigation from Positive and Negative Demonstrations and Rule-Based Specifications
Mobile robot navigation in dynamic human environments requires policies that balance adaptability to diverse behaviors with compliance to safety constraints. We hypothesize that integrating data-driven rewards with rule-based objectives enables navigation policies to achieve a more effective balance of adaptability and safety. To this end, we develop a framework that learns a density-based reward from positive and negative demonstrations and augments it with rule-based objectives for obstacle avoidance and goal reaching. A sampling-based lookahead controller produces supervisory actions that are both safe and adaptive, which are subsequently distilled into a compact student policy suitable for real-time operation with uncertainty estimates. Experiments in synthetic and elevator co-boarding simulations show consistent gains in success rate and time efficiency over baselines, and real-world demonstrations with human participants confirm the practicality of deployment. A video illustrating this work can be found on our project page https://chanwookim971024.github.io/PioneeR/.
comment: For more videos, see https://chanwookim971024.github.io/PioneeR/
Controllable Collision Scenario Generation via Collision Pattern Prediction ICRA
Evaluating the safety of autonomous vehicles (AVs) requires diverse, safety-critical scenarios, with collisions being especially important yet rare and unsafe to collect in the real world. Therefore, the community has been focusing on generating safety-critical scenarios in simulation. However, controlling attributes such as collision type and time-to-accident (TTA) remains challenging. We introduce a new task called controllable collision scenario generation, where the goal is to produce trajectories that realize a user-specified collision type and TTA, to investigate the feasibility of automatically generating desired collision scenarios. To support this task, we present COLLIDE, a large-scale collision scenario dataset constructed by transforming real-world driving logs into diverse collisions, balanced across five representative collision types and different TTA intervals. We propose a framework that predicts Collision Pattern, a compact and interpretable representation that captures the spatial configuration of the ego and the adversarial vehicles at impact, before rolling out full adversarial trajectories. Experiments show that our approach outperforms strong baselines in both collision rate and controllability. Furthermore, generated scenarios consistently induce higher planner failure rates, revealing limitations of existing planners. We demonstrate that these scenarios fine-tune planners for robustness improvements, contributing to safer AV deployment in different collision scenarios.
comment: 8 pages, 3 figures. Submitted to IEEE International Conference on Robotics and Automation (ICRA) 2026
UniGS: Unified Geometry-Aware Gaussian Splatting for Multimodal Rendering
In this paper, we propose UniGS, a unified map representation and differentiable framework for high-fidelity multimodal 3D reconstruction based on 3D Gaussian Splatting. Our framework integrates a CUDA-accelerated rasterization pipeline capable of rendering photo-realistic RGB images, geometrically accurate depth maps, consistent surface normals, and semantic logits simultaneously. We redesign the rasterization to render depth via differentiable ray-ellipsoid intersection rather than using Gaussian centers, enabling effective optimization of rotation and scale attribute through analytic depth gradients. Furthermore, we derive the analytic gradient formulation for surface normal rendering, ensuring geometric consistency among reconstructed 3D scenes. To improve computational and storage efficiency, we introduce a learnable attribute that enables differentiable pruning of Gaussians with minimal contribution during training. Quantitative and qualitative experiments demonstrate state-of-the-art reconstruction accuracy across all modalities, validating the efficacy of our geometry-aware paradigm. Source code and multimodal viewer will be available on GitHub.
Hybrid Terrain-Aware Path Planning: Integrating VD--RRT\(^{*}\) Exploration and VD--D\(^{*}\) Lite Repair
Autonomous ground vehicles operating off-road must plan curvature-feasible paths while accounting for spatially varying soil strength and slope hazards in real time. We present a continuous state--cost metric that combines a Bekker pressure--sinkage model with elevation-derived slope and attitude penalties. The resulting terrain cost field is analytic, bounded, and monotonic in soil modulus and slope, ensuring well-posed discretization and stable updates under sensor noise. This metric is evaluated on a lattice with exact steering primitives: Dubins and Reeds--Shepp motions for differential drive and time-parameterized bicycle arcs for Ackermann steering. Global exploration is performed using Vehicle-Dynamics RRT\(^{*}\), while local repair is managed by Vehicle-Dynamics D\(^{*}\) Lite, enabling millisecond-scale replanning without heuristic smoothing. By separating the terrain--vehicle model from the planner, the framework provides a reusable basis for deterministic, sampling-based, or learning-driven planning in deformable terrain. Hardware trials on an off-road platform demonstrate real-time navigation across soft soil and slope transitions, supporting reliable autonomy in unstructured environments.
Gaussian Semantic Field for One-shot LiDAR Global Localization
We present a one-shot LiDAR global localization algorithm featuring semantic disambiguation ability based on a lightweight tri-layered scene graph. While landmark semantic registration-based methods have shown promising performance improvements in global localization compared with geometric-only methods, landmarks can be repetitive and misleading for correspondence establishment. We propose to mitigate this problem by modeling semantic distributions with continuous functions learned from a population of Gaussian processes. Compared with discrete semantic labels, the continuous functions capture finer-grained geo-semantic information and also provide more detailed metric information for correspondence establishment. We insert this continuous function as the middle layer between the object layer and the metric-semantic layer, forming a tri-layered 3D scene graph, serving as a light-weight yet performant backend for one-shot localization. We term our global localization pipeline Outram-GSF (Gaussian semantic field) and conduct a wide range of experiments on publicly available data sets, validating the superior performance against the current state-of-the-art.
Translating Milli/Microrobots with A Value-Centered Readiness Framework
Untethered mobile milli/microrobots hold transformative potential for interventional medicine by enabling more precise and entirely non-invasive diagnosis and therapy. Realizing this promise requires bridging the gap between groundbreaking laboratory demonstrations and successful clinical integration. Despite remarkable technical progress over the past two decades, most millirobots and microrobots remain confined to laboratory proof-of-concept demonstrations, with limited real-world feasibility. In this Review, we identify key factors that slow translation from bench to bedside, focusing on the disconnect between technical innovation and real-world application. We argue that the long-term impact and sustainability of the field depend on aligning development with unmet medical needs, ensuring applied feasibility, and integrating seamlessly into existing clinical workflows, which are essential pillars for delivering meaningful patient outcomes. To support this shift, we introduce a strategic milli/microrobot Technology Readiness Level framework (mTRL), which maps system development from initial conceptualization to clinical adoption through clearly defined milestones and their associated stepwise activities. The mTRL model provides a structured gauge of technological maturity, a common language for cross-disciplinary collaboration and actionable guidance to accelerate translational development toward new, safer and more efficient interventions.
EmboMatrix: A Scalable Training-Ground for Embodied Decision-Making
Embodied decision-making enables agents to translate high-level goals into executable actions through continuous interactions within the physical world, forming a cornerstone of general-purpose embodied intelligence. Large language models (LLMs), with their general decision-making capabilities, offer a promising path to realize this potential; however, LLMs trained solely on language lack exposure to physical environments, limiting their true embodied understanding. To bridge this gap, we propose the concept of a training ground: a comprehensive infrastructure that provides task and scene simulation, embodied interaction, and feedback signals, offering a one-stop solution for LLM acquire genuine embodied decision-making skills. In this work, we present EmboMatrix, the first training ground of its kind, providing massive and diverse tasks with efficient simulation and precise rewards. EmboMatrix incorporates a series of novel techniques: a multi-agent data engine for large-scale task and scene generation, a distributed heterogeneous-hardware system for scalable simulation, and a multi-level reward architecture for precise supervision. Leveraging EmboMatrix, we cultivate EmboBrain, an LLM whose embodied decision-making abilities emerge from extensive embodied interactions. Experiments show that EmboBrain-7B surpasses the 671B DeepSeek-R1 baseline by 9.5\% on two challenging embodied decision-making benchmarks, demonstrating the power of interactive, environment-grounded learning for building truly intelligent embodied agents.
comment: 10 pages 8 figures
Kinematic Kitbashing for Modeling Functional Articulated Objects
We introduce Kinematic Kitbashing, an automatic framework that synthesizes functionality-aware articulated objects by reusing parts from existing models. Given a kinematic graph with a small collection of articulated parts, our optimizer jointly solves for the spatial placement of every part so that (i) attachments remain geometrically sound over the entire range of motion and (ii) the assembled object satisfies user-specified functional goals such as collision-free actuation, reachability, or trajectory following. At its core is a kinematics-aware attachment energy that aligns vector distance function features sampled across multiple articulation snapshots. We embed this attachment term within an annealed Riemannian Langevin dynamics sampler that treats functionality objectives as additional energies, enabling robust global exploration while accommodating non-differentiable functionality objectives and constraints. Our framework produces a wide spectrum of assembled articulated shapes, from trash-can wheels grafted onto car bodies to multi-segment lamps, gear-driven paddlers, and reconfigurable furniture, and delivers strong quantitative improvements over state-of-the-art baselines across geometric, kinematic, and functional metrics. By tightly coupling articulation-aware geometry matching with functionality-driven optimization, Kinematic Kitbashing bridges part-based shape modeling and functional assembly design, empowering rapid creation of interactive articulated assets.
Development of a Linear Guide-Rail Testbed for Physically Emulating ISAM Operations
In-Space Servicing, Assembly, and Manufacturing (ISAM) is a set of emerging operations that provides several benefits to improve the longevity, capacity, mo- bility, and expandability of existing and future space assets. Serial robotic ma- nipulators are particularly vital in accomplishing ISAM operations, however, the complex perturbation forces and motions associated with movement of a robotic arm on a free-flying satellite presents a complex controls problem requiring addi- tional study. While many dynamical models are developed, experimentally test- ing and validating these models is challenging given that the models operate in space, where satellites have six-degrees-of-freedom (6-DOF). This paper attempts to resolve those challenges by presenting the design and development of a new hardware-in-the-loop (HIL) experimental testbed utilized to emulate ISAM. This emulation will be accomplished by means of a 6-DOF UR3e robotic arm attached to a satellite bus. This satellite bus is mounted to a 1-DOF guide-rail system, en- abling the satellite bus and robotic arm to move freely in one linear direction. This experimental ISAM emulation system will explore and validate models for space motion, serial robot manipulation, and contact mechanics.
comment: 12 pages, 4 figures, AAS/AIAA Space Flight Mechanics
Comparison of Forced and Unforced Rendezvous, Proximity Operations, and Docking Under Model Mismatch
This paper compares the required fuel usage for forced and unforced motion of a chaser satellite engaged in Rendezvous, Proximity Operations, and Docking (RPOD) maneuvers. Improved RPOD models are vital, particularly as the space industry expands and demands for improved fuel efficiency, cost effectiveness, and mission life span increase. This paper specifically examines the Clohessy- Wiltshire (CW) Equations and the extent of model mismatch by comparing pre- dicted trajectories from this model with a more computationally complex, higher fidelity RPOD model. This paper assesses several test cases of similar mission parameters, in each case comparing natural motion circumnavigation (NMC) with comparable forced motion circumnavigation. The Guidance, Navigation, and Con- trol (GNC) impulse maneuvers required to maintain the supposedly zero fuel CW trajectories is representative of the extent of CW model mismatch. This paper demonstrates that unforced motions are not inherently more fuel efficient than forced motions, thus permitting extended orbital operations given the higher fuel efficiency.
comment: 12 pages, 4 figures, AAS/AIAA Space Flight Mechanics
UNCAP: Uncertainty-Guided Planning Using Natural Language Communication for Cooperative Autonomous Vehicles
Safe large-scale coordination of multiple cooperative connected autonomous vehicles (CAVs) hinges on communication that is both efficient and interpretable. Existing approaches either rely on transmitting high-bandwidth raw sensor data streams or neglect perception and planning uncertainties inherent in shared data, resulting in systems that are neither scalable nor safe. To address these limitations, we propose Uncertainty-Guided Natural Language Cooperative Autonomous Planning (UNCAP), a vision-language model-based planning approach that enables CAVs to communicate via lightweight natural language messages while explicitly accounting for perception uncertainty in decision-making. UNCAP features a two-stage communication protocol: (i) an ego CAV first identifies the subset of vehicles most relevant for information exchange, and (ii) the selected CAVs then transmit messages that quantitatively express their perception uncertainty. By selectively fusing messages that maximize mutual information, this strategy allows the ego vehicle to integrate only the most relevant signals into its decision-making, improving both the scalability and reliability of cooperative planning. Experiments across diverse driving scenarios show a 63% reduction in communication bandwidth with a 31% increase in driving safety score, a 61% reduction in decision uncertainty, and a four-fold increase in collision distance margin during near-miss events. Project website: https://uncap-project.github.io/
Actron3D: Learning Actionable Neural Functions from Videos for Transferable Robotic Manipulation
We present Actron3D, a framework that enables robots to acquire transferable 6-DoF manipulation skills from just a few monocular, uncalibrated, RGB-only human videos. At its core lies the Neural Affordance Function, a compact object-centric representation that distills actionable cues from diverse uncalibrated videos-geometry, visual appearance, and affordance-into a lightweight neural network, forming a memory bank of manipulation skills. During deployment, we adopt a pipeline that retrieves relevant affordance functions and transfers precise 6-DoF manipulation policies via coarse-to-fine optimization, enabled by continuous queries to the multimodal features encoded in the neural functions. Experiments in both simulation and the real world demonstrate that Actron3D significantly outperforms prior methods, achieving a 14.9 percentage point improvement in average success rate across 13 tasks while requiring only 2-3 demonstration videos per task.
comment: 8 pages, 5 figures
The Omega Turn: A General Turning Template for Elongate Robots
Elongate limbless robots have the potential to locomote through tightly packed spaces for applications such as search-and-rescue and industrial inspections. The capability to effectively and robustly maneuver elongate limbless robots is crucial to realize such potential. However, there has been limited research on turning strategies for such systems. To achieve effective and robust turning performance in cluttered spaces, we take inspiration from a microscopic nematode, C. elegans, which exhibits remarkable maneuverability in rheologically complex environments partially because of its ability to perform omega turns. Despite recent efforts to analyze omega turn kinematics, it remains unknown if there exists a wave equation sufficient to prescribe an omega turn, let alone its reconstruction on robot platforms. Here, using a comparative theory-biology approach, we prescribe the omega turn as a superposition of two traveling waves. With wave equations as a guideline, we design a controller for limbless robots enabling robust and effective turning behaviors in lab and cluttered field environments. Finally, we show that such omega turn controllers can also generalize to elongate multi-legged robots, demonstrating an alternative effective body-driven turning strategy for elongate robots, with and without limbs.
Enhancing Sampling-based Planning with a Library of Paths
Path planning for 3D solid objects is a challenging problem, requiring a search in a six-dimensional configuration space, which is, nevertheless, essential in many robotic applications such as bin-picking and assembly. The commonly used sampling-based planners, such as Rapidly-exploring Random Trees, struggle with narrow passages where the sampling probability is low, increasing the time needed to find a solution. In scenarios like robotic bin-picking, various objects must be transported through the same environment. However, traditional planners start from scratch each time, losing valuable information gained during the planning process. We address this by using a library of past solutions, allowing the reuse of previous experiences even when planning for a new, previously unseen object. Paths for a set of objects are stored, and when planning for a new object, we find the most similar one in the library and use its paths as approximate solutions, adjusting for possible mutual transformations. The configuration space is then sampled along the approximate paths. Our method is tested in various narrow passage scenarios and compared with state-of-the-art methods from the OMPL library. Results show significant speed improvements (up to 85% decrease in the required time) of our method, often finding a solution in cases where the other planners fail. Our implementation of the proposed method is released as an open-source package.
Geometric Model Predictive Path Integral for Agile UAV Control with Online Collision Avoidance
In this letter, we introduce Geometric Model Predictive Path Integral (GMPPI), a sampling-based controller capable of tracking agile trajectories while avoiding obstacles. In each iteration, GMPPI generates a large number of candidate rollout trajectories and then averages them to create a nominal control to be followed by the Unmanned Aerial Vehicle (UAV). We propose using geometric SE(3) control to generate part of the rollout trajectories, significantly increasing precision in agile flight. Furthermore, we introduce varying rollout simulation time step length and dynamic cost and noise parameters, vastly improving tracking performance of smooth and low-speed trajectories over an existing Model Predictive Path Integral (MPPI) implementation. Finally, we propose an integration of GMPPI with a stereo depth camera, enabling online obstacle avoidance at high speeds, a crucial step towards autonomous UAV flights in complex environments. The proposed controller can track simulated agile reference trajectories with position error similar to the geometric SE(3) controller. However, the same configuration of the proposed controller can avoid obstacles in a simulated forest environment at speeds of up to 13m/s, surpassing the performance of a state-of-the-art obstacle-aware planner. In real-world experiments, GMPPI retains the capability to track agile trajectories and avoids obstacles at speeds of up to 10m/s.
comment: This work has been submitted to the IEEE for possible publication
Gaussian Process Implicit Surfaces as Control Barrier Functions for Safe Robot Navigation
Level set methods underpin modern safety techniques such as control barrier functions (CBFs), while also serving as implicit surface representations for geometric shapes via distance fields. Inspired by these two paradigms, we propose a unified framework where the implicit surface itself acts as a CBF. We leverage Gaussian process (GP) implicit surface (GPIS) to represent the safety boundaries, using safety samples which are derived from sensor measurements to condition the GP. The GP posterior mean defines the implicit safety surface (safety belief), while the posterior variance provides a robust safety margin. Although GPs have favorable properties such as uncertainty estimation and analytical tractability, they scale cubically with data. To alleviate this issue, we develop a sparse solution called sparse Gaussian CBFs. To the best of our knowledge, GPIS have not been explicitly used to synthesize CBFs. We validate the approach on collision avoidance tasks in two settings: a simulated 7-DOF manipulator operating around the Stanford bunny, and a quadrotor navigating in 3D around a physical chair. In both cases, Gaussian CBFs (with and without sparsity) enable safe interaction and collision-free execution of trajectories that would otherwise intersect the objects.
comment: 8 pages, 7 figures, under review
SimULi: Real-Time LiDAR and Camera Simulation with Unscented Transforms
Rigorous testing of autonomous robots, such as self-driving vehicles, is essential to ensure their safety in real-world deployments. This requires building high-fidelity simulators to test scenarios beyond those that can be safely or exhaustively collected in the real-world. Existing neural rendering methods based on NeRF and 3DGS hold promise but suffer from low rendering speeds or can only render pinhole camera models, hindering their suitability to applications that commonly require high-distortion lenses and LiDAR data. Multi-sensor simulation poses additional challenges as existing methods handle cross-sensor inconsistencies by favoring the quality of one modality at the expense of others. To overcome these limitations, we propose SimULi, the first method capable of rendering arbitrary camera models and LiDAR data in real-time. Our method extends 3DGUT, which natively supports complex camera models, with LiDAR support, via an automated tiling strategy for arbitrary spinning LiDAR models and ray-based culling. To address cross-sensor inconsistencies, we design a factorized 3D Gaussian representation and anchoring strategy that reduces mean camera and depth error by up to 40% compared to existing methods. SimULi renders 10-20x faster than ray tracing approaches and 1.5-10x faster than prior rasterization-based work (and handles a wider range of camera models). When evaluated on two widely benchmarked autonomous driving datasets, SimULi matches or exceeds the fidelity of existing state-of-the-art methods across numerous camera and LiDAR metrics.
comment: Project page: https://research.nvidia.com/labs/sil/projects/simuli
Learning to Grasp Anything by Playing with Random Toys
Robotic manipulation policies often struggle to generalize to novel objects, limiting their real-world utility. In contrast, cognitive science suggests that children develop generalizable dexterous manipulation skills by mastering a small set of simple toys and then applying that knowledge to more complex items. Inspired by this, we study if similar generalization capabilities can also be achieved by robots. Our results indicate robots can learn generalizable grasping using randomly assembled objects that are composed from just four shape primitives: spheres, cuboids, cylinders, and rings. We show that training on these "toys" enables robust generalization to real-world objects, yielding strong zero-shot performance. Crucially, we find the key to this generalization is an object-centric visual representation induced by our proposed detection pooling mechanism. Evaluated in both simulation and on physical robots, our model achieves a 67% real-world grasping success rate on the YCB dataset, outperforming state-of-the-art approaches that rely on substantially more in-domain data. We further study how zero-shot generalization performance scales by varying the number and diversity of training toys and the demonstrations per toy. We believe this work offers a promising path to scalable and generalizable learning in robotic manipulation. Demonstration videos, code, checkpoints and our dataset are available on our project page: https://lego-grasp.github.io/ .
VLURes: Benchmarking VLM Visual and Linguistic Understanding in Low-Resource Languages
Vision Language Models (VLMs) are pivotal for advancing perception in intelligent agents. Yet, evaluation of VLMs remains limited to predominantly English-centric benchmarks in which the image-text pairs comprise short texts. To evaluate VLM fine-grained abilities, in four languages under long-text settings, we introduce a novel multilingual benchmark VLURes featuring eight vision-and-language tasks, and a pioneering unrelatedness task, to probe the fine-grained Visual and Linguistic Understanding capabilities of VLMs across English, Japanese, and low-resource languages, Swahili, and Urdu. Our datasets, curated from web resources in the target language, encompass ten diverse image categories and rich textual context, introducing valuable vision-language resources for Swahili and Urdu. By prompting VLMs to generate responses and rationales, evaluated automatically and by native speakers, we uncover performance disparities across languages and tasks critical to intelligent agents, such as object recognition, scene understanding, and relationship understanding. We conducted evaluations of ten VLMs with VLURes. The best performing model, GPT-4o, achieves an overall accuracy of 90.8% and lags human performance by 6.7%, though the gap is larger for open-source models. The gap highlights VLURes' critical role in developing intelligent agents to tackle multi-modal visual reasoning.
ManiAgent: An Agentic Framework for General Robotic Manipulation
While Vision-Language-Action (VLA) models have demonstrated impressive capabilities in robotic manipulation, their performance in complex reasoning and long-horizon task planning is limited by data scarcity and model capacity. To address this, we introduce ManiAgent, an agentic architecture for general manipulation tasks that achieves end-to-end output from task descriptions and environmental inputs to robotic manipulation actions. In this framework, multiple agents involve inter-agent communication to perform environmental perception, sub-task decomposition and action generation, enabling efficient handling of complex manipulation scenarios. Evaluations show ManiAgent achieves an 86.8% success rate on the SimplerEnv benchmark and 95.8% on real-world pick-and-place tasks, enabling efficient data collection that yields VLA models with performance comparable to those trained on human-annotated datasets. The project webpage is available at https://yi-yang929.github.io/ManiAgent/.
comment: 8 pages, 6 figures, conference
REACT3D: Recovering Articulations for Interactive Physical 3D Scenes
Interactive 3D scenes are increasingly vital for embodied intelligence, yet existing datasets remain limited due to the labor-intensive process of annotating part segmentation, kinematic types, and motion trajectories. We present REACT3D, a scalable zero-shot framework that converts static 3D scenes into simulation-ready interactive replicas with consistent geometry, enabling direct use in diverse downstream tasks. Our contributions include: (i) openable-object detection and segmentation to extract candidate movable parts from static scenes, (ii) articulation estimation that infers joint types and motion parameters, (iii) hidden-geometry completion followed by interactive object assembly, and (iv) interactive scene integration in widely supported formats to ensure compatibility with standard simulation platforms. We achieve state-of-the-art performance on detection/segmentation and articulation metrics across diverse indoor scenes, demonstrating the effectiveness of our framework and providing a practical foundation for scalable interactive scene generation, thereby lowering the barrier to large-scale research on articulated scene understanding. Our project page is https://react3d.github.io/
comment: 8 pages
RoVer: Robot Reward Model as Test-Time Verifier for Vision-Language-Action Model
Vision-Language-Action (VLA) models have become a prominent paradigm for embodied intelligence, yet further performance improvements typically rely on scaling up training data and model size -- an approach that is prohibitively expensive for robotics and fundamentally limited by data collection costs. We address this limitation with $\mathbf{RoVer}$, an embodied test-time scaling framework that uses a $\mathbf{Ro}$bot Process Reward Model (PRM) as a Test-Time $\mathbf{Ver}$ifier to enhance the capabilities of existing VLA models without modifying their architectures or weights. Specifically, RoVer (i) assigns scalar-based process rewards to evaluate the reliability of candidate actions, and (ii) predicts an action-space direction for candidate expansion/refinement. During inference, RoVer generates multiple candidate actions concurrently from the base policy, expands them along PRM-predicted directions, and then scores all candidates with PRM to select the optimal action for execution. Notably, by caching shared perception features, it can amortize perception cost and evaluate more candidates under the same test-time computational budget. Essentially, our approach effectively transforms available computing resources into better action decision-making, realizing the benefits of test-time scaling without extra training overhead. Our contributions are threefold: (1) a general, plug-and-play test-time scaling framework for VLAs; (2) a PRM that jointly provides scalar process rewards and an action-space direction to guide exploration; and (3) an efficient direction-guided sampling strategy that leverages a shared perception cache to enable scalable candidate generation and selection during inference.
Towards Safe Maneuvering of Double-Ackermann-Steering Robots with a Soft Actor-Critic Framework IROS 2025
We present a deep reinforcement learning framework based on Soft Actor-Critic (SAC) for safe and precise maneuvering of double-Ackermann-steering mobile robots (DASMRs). Unlike holonomic or simpler non-holonomic robots such as differential-drive robots, DASMRs face strong kinematic constraints that make classical planners brittle in cluttered environments. Our framework leverages the Hindsight Experience Replay (HER) and the CrossQ overlay to encourage maneuvering efficiency while avoiding obstacles. Simulation results with a heavy four-wheel-steering rover show that the learned policy can robustly reach up to 97% of target positions while avoiding obstacles. Our framework does not rely on handcrafted trajectories or expert demonstrations.
comment: 4 pages, 3 figures, 2 tables, Accepted for Safety of Intelligent and Autonomous Vehicles: Formal Methods vs. Machine Learning approaches for reliable navigation (SIAV-FM2L) an IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025) workshop
Integration of the TIAGo Robot into Isaac Sim with Mecanum Drive Modeling and Learned S-Curve Velocity Profiles
Efficient physics simulation has significantly accelerated research progress in robotics applications such as grasping and assembly. The advent of GPU-accelerated simulation frameworks like Isaac Sim has particularly empowered learning-based methods, enabling them to tackle increasingly complex tasks. The PAL Robotics TIAGo++ Omni is a versatile mobile manipulator equipped with a mecanum-wheeled base, allowing omnidirectional movement and a wide range of task capabilities. However, until now, no model of the robot has been available in Isaac Sim. In this paper, we introduce such a model, calibrated to approximate the behavior of the real robot, with a focus on its omnidirectional drive dynamics. We present two control models for the omnidirectional drive: a physically accurate model that replicates real-world wheel dynamics and a lightweight velocity-based model optimized for learning-based applications. With these models, we introduce a learning-based calibration approach to approximate the real robot's S-shaped velocity profile using minimal trajectory data recordings. This simulation should allow researchers to experiment with the robot and perform efficient learning-based control in diverse environments. We provide the integration publicly at https://github.com/AIS-Bonn/tiago_isaac.
comment: In Proceedings of IEEE 21st International Conference on Automation Science and Engineering (CASE), Los Angeles, USA, August 2025
Dynamics-aware Diffusion Models for Planning and Control
This paper addresses the problem of generating dynamically admissible trajectories for control tasks using diffusion models, particularly in scenarios where the environment is complex and system dynamics are crucial for practical application. We propose a novel framework that integrates system dynamics directly into the diffusion model's denoising process through a sequential prediction and projection mechanism. This mechanism, aligned with the diffusion model's noising schedule, ensures generated trajectories are both consistent with expert demonstrations and adhere to underlying physical constraints. Notably, our approach can generate maximum likelihood trajectories and accurately recover trajectories generated by linear feedback controllers, even when explicit dynamics knowledge is unavailable. We validate the effectiveness of our method through experiments on standard control tasks and a complex non-convex optimal control problem involving waypoint tracking and collision avoidance, demonstrating its potential for efficient trajectory generation in practical applications. Our code repository is available at www.github.com/darshangm/dynamics-aware-diffusion.
comment: 8 pages, 3 figures
PSN Game: Game-theoretic Prediction and Planning via a Player Selection Network
While game-theoretic planning frameworks are effective at modeling multi-agent interactions, they require solving large optimization problems where the number of variables increases with the number of agents, resulting in long computation times that limit their use in large-scale, real-time systems. To address this issue, we propose 1) PSN Game: a learning-based, game-theoretic prediction and planning framework that reduces runtime by learning a Player Selection Network (PSN); and 2) a Goal Inference Network (GIN) that makes it possible to use the PSN in incomplete information games where agents' intentions are unknown. A PSN outputs a player selection mask that distinguishes influential players from less relevant ones, enabling the ego player to solve a smaller, masked game involving only selected players. By reducing the number of players in the game, and therefore reducing the number of variables in the corresponding optimization problem, PSN directly lowers computation time. The PSN Game framework is more flexible than existing player selection methods as it 1) relies solely on observations of players' past trajectories, without requiring full state, action, or other game-specific information; and 2) requires no online parameter tuning. Experiments in both simulated scenarios and human trajectory datasets demonstrate that PSNs outperform baseline selection methods in 1) prediction accuracy; and 2) planning safety. PSNs also generalize effectively to real-world scenarios in which agents' objectives are unknown without fine-tuning. By selecting only the most relevant players for decision-making, PSN Game offers a general mechanism for reducing planning complexity that can be seamlessly integrated into existing multi-agent planning frameworks.
Image Quality Assessment for Embodied AI
Embodied AI has developed rapidly in recent years, but it is still mainly deployed in laboratories, with various distortions in the Real-world limiting its application. Traditionally, Image Quality Assessment (IQA) methods are applied to predict human preferences for distorted images; however, there is no IQA method to assess the usability of an image in embodied tasks, namely, the perceptual quality for robots. To provide accurate and reliable quality indicators for future embodied scenarios, we first propose the topic: IQA for Embodied AI. Specifically, we (1) based on the Mertonian system and meta-cognitive theory, constructed a perception-cognition-decision-execution pipeline and defined a comprehensive subjective score collection process; (2) established the Embodied-IQA database, containing over 36k reference/distorted image pairs, with more than 5m fine-grained annotations provided by Vision Language Models/Vision Language Action-models/Real-world robots; (3) trained and validated the performance of mainstream IQA methods on Embodied-IQA, demonstrating the need to develop more accurate quality indicators for Embodied AI. We sincerely hope that through evaluation, we can promote the application of Embodied AI under complex distortions in the Real-world. Project page: https://github.com/lcysyzxdxc/EmbodiedIQA
Product-oriented Product-Process-Resource Asset Network and its Representation in AutomationML for Asset Administration Shell
Current products, especially in the automotive sector, pose complex technical systems having a multi-disciplinary mechatronic nature. Industrial standards supporting system engineering and production typically (i) address the production phase only, but do not cover the complete product life cycle, and (ii) focus on production processes and resources rather than the products themselves. The presented approach is motivated by incorporating the impacts of the end-of-life phase of the product life cycle into the engineering phase. This paper proposes a modeling approach coming up from the Product-Process-Resource (PPR) modeling paradigm. It combines requirements on (i) respecting the product structure as a basis for the model, and (ii) incorporates repairing, remanufacturing, or upcycling within cyber-physical production systems. The proposed model called PoPAN should accompany the product during the entire life cycle as a digital shadow encapsulated within the Asset Administration Shell of a product. To facilitate the adoption of the proposed paradigm, the paper also proposes serialization of the model in the AutomationML data format. The model is demonstrated on a use-case for disassembling electric vehicle batteries to support their remanufacturing for stationary battery applications.
comment: \copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
OpenLex3D: A Tiered Evaluation Benchmark for Open-Vocabulary 3D Scene Representations NeurIPS 2025
3D scene understanding has been transformed by open-vocabulary language models that enable interaction via natural language. However, at present the evaluation of these representations is limited to datasets with closed-set semantics that do not capture the richness of language. This work presents OpenLex3D, a dedicated benchmark for evaluating 3D open-vocabulary scene representations. OpenLex3D provides entirely new label annotations for scenes from Replica, ScanNet++, and HM3D, which capture real-world linguistic variability by introducing synonymical object categories and additional nuanced descriptions. Our label sets provide 13 times more labels per scene than the original datasets. By introducing an open-set 3D semantic segmentation task and an object retrieval task, we evaluate various existing 3D open-vocabulary methods on OpenLex3D, showcasing failure cases, and avenues for improvement. Our experiments provide insights on feature precision, segmentation, and downstream capabilities. The benchmark is publicly available at: https://openlex3d.github.io/.
comment: NeurIPS 2025
BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models NeurIPS 2025
Recently, leveraging pre-trained vision-language models (VLMs) for building vision-language-action (VLA) models has emerged as a promising approach to effective robot manipulation learning. However, only few methods incorporate 3D signals into VLMs for action prediction, and they do not fully leverage the spatial structure inherent in 3D data, leading to low sample efficiency. In this paper, we introduce BridgeVLA, a novel 3D VLA model that (1) projects 3D inputs to multiple 2D images, ensuring input alignment with the VLM backbone, and (2) utilizes 2D heatmaps for action prediction, unifying the input and output spaces within a consistent 2D image space. In addition, we propose a scalable pre-training method that equips the VLM backbone with the capability to predict 2D heatmaps before downstream policy learning. Extensive experiments show the proposed method is able to learn 3D manipulation efficiently and effectively. BridgeVLA outperforms state-of-the-art baseline methods across three simulation benchmarks. In RLBench, it improves the average success rate from 81.4% to 88.2%. In COLOSSEUM, it demonstrates significantly better performance in challenging generalization settings, boosting the average success rate from 56.7% to 64.0%. In GemBench, it surpasses all the comparing baseline methods in terms of average success rate. In real-robot experiments, BridgeVLA outperforms a state-of-the-art baseline method by 32% on average. It generalizes robustly in multiple out-of-distribution settings, including visual disturbances and unseen instructions. Remarkably, it is able to achieve a success rate of 96.8% on 10+ tasks with only 3 trajectories per task, highlighting its extraordinary sample efficiency. Project Website:https://bridgevla.github.io/
comment: NeurIPS 2025
Aux-Think: Exploring Reasoning Strategies for Data-Efficient Vision-Language Navigation
Vision-Language Navigation (VLN) is a critical task for developing embodied agents that can follow natural language instructions to navigate in complex real-world environments. Recent advances in VLN by large pretrained models have significantly improved generalization and instruction grounding compared to traditional approaches. However, the role of reasoning strategies in navigation-an action-centric, long-horizon task-remains underexplored, despite Chain-of-Thought (CoT) reasoning's demonstrated success in static tasks like visual question answering. To address this gap, we conduct the first systematic evaluation of reasoning strategies for VLN, including No-Think (direct action prediction), Pre-Think (reason before action), and Post-Think (reason after action). Surprisingly, our findings reveal the Inference-time Reasoning Collapse issue, where inference-time reasoning degrades navigation accuracy, highlighting the challenges of integrating reasoning into VLN. Based on this insight, we propose Aux-Think, a framework that trains models to internalize structured reasoning patterns through CoT supervision, while inferring action directly without reasoning in online prediction. To support this framework, we release R2R-CoT-320k, the first Chain-of-Thought annotated dataset for VLN. Extensive experiments show that Aux-Think reduces training effort greatly and achieves the best performance under the same data scale.
SIG-Chat: Spatial Intent-Guided Conversational Gesture Generation Involving How, When and Where
The accompanying actions and gestures in dialogue are often closely linked to interactions with the environment, such as looking toward the interlocutor or using gestures to point to the described target at appropriate moments. Speech and semantics guide the production of gestures by determining their timing (WHEN) and style (HOW), while the spatial locations of interactive objects dictate their directional execution (WHERE). Existing approaches either rely solely on descriptive language to generate motions or utilize audio to produce non-interactive gestures, thereby lacking the characterization of interactive timing and spatial intent. This significantly limits the applicability of conversational gesture generation, whether in robotics or in the fields of game and animation production. To address this gap, we present a full-stack solution. We first established a unique data collection method to simultaneously capture high-precision human motion and spatial intent. We then developed a generation model driven by audio, language, and spatial data, alongside dedicated metrics for evaluating interaction timing and spatial accuracy. Finally, we deployed the solution on a humanoid robot, enabling rich, context-aware physical interactions.
MTIL: Encoding Full History with Mamba for Temporal Imitation Learning
Standard imitation learning (IL) methods have achieved considerable success in robotics, yet often rely on the Markov assumption, which falters in long-horizon tasks where history is crucial for resolving perceptual ambiguity. This limitation stems not only from a conceptual gap but also from a fundamental computational barrier: prevailing architectures like Transformers are often constrained by quadratic complexity, rendering the processing of long, high-dimensional observation sequences infeasible. To overcome this dual challenge, we introduce Mamba Temporal Imitation Learning (MTIL). Our approach represents a new paradigm for robotic learning, which we frame as a practical synthesis of World Model and Dynamical System concepts. By leveraging the linear-time recurrent dynamics of State Space Models (SSMs), MTIL learns an implicit, action-oriented world model that efficiently encodes the entire trajectory history into a compressed, evolving state. This allows the policy to be conditioned on a comprehensive temporal context, transcending the confines of Markovian approaches. Through extensive experiments on simulated benchmarks (ACT, Robomimic, LIBERO) and on challenging real-world tasks, MTIL demonstrates superior performance against SOTA methods like ACT and Diffusion Policy, particularly in resolving long-term temporal ambiguities. Our findings not only affirm the necessity of full temporal context but also validate MTIL as a powerful and a computationally feasible approach for learning long-horizon, non-Markovian behaviors from high-dimensional observations.
comment: 8 pages,5 figures.Published in IEEE Robotics and Automation Letters (RA-L), 2025
No Plan but Everything Under Control: Robustly Solving Sequential Tasks with Dynamically Composed Gradient Descent ICRA25
We introduce a novel gradient-based approach for solving sequential tasks by dynamically adjusting the underlying myopic potential field in response to feedback and the world's regularities. This adjustment implicitly considers subgoals encoded in these regularities, enabling the solution of long sequential tasks, as demonstrated by solving the traditional planning domain of Blocks World - without any planning. Unlike conventional planning methods, our feedback-driven approach adapts to uncertain and dynamic environments, as demonstrated by one hundred real-world trials involving drawer manipulation. These experiments highlight the robustness of our method compared to planning and show how interactive perception and error recovery naturally emerge from gradient descent without explicitly implementing them. This offers a computationally efficient alternative to planning for a variety of sequential tasks, while aligning with observations on biological problem-solving strategies.
comment: Accepted at ICRA25; Supplementary Material under https://www.tu.berlin/robotics/papers/noplan ; 7 pages + 6 figures;
GPA-RAM: Grasp-Pretraining Augmented Robotic Attention Mamba for Spatial Task Learning
Most existing robot manipulation methods prioritize task learning by enhancing perception through complex deep network architectures. However, they face challenges in real-time collision-free planning. Hence, Robotic Attention Mamba (RAM) is designed for refined planning. Specifically, by integrating Mamba and parallel single-view attention, RAM aligns multi-view vision and task-related language features, ensuring efficient fine-grained task planning with linear complexity and robust real-time performance. Nevertheless, it has the potential for further improvement in high-precision grasping and manipulation. Thus, Grasp-Pretraining Augmentation (GPA) is devised, with a grasp pose feature extractor pretrained utilizing object grasp poses directly inherited from whole-task demonstrations. Subsequently, the extracted grasp features are fused with the spatially aligned planning features from RAM through attention-based Pre-trained Location Fusion, preserving high-resolution grasping cues overshadowed by an overemphasis on global planning. To summarize, we propose Grasp-Pretraining Augmented Robotic Attention Mamba (GPA-RAM), dividing spatial task learning into RAM for planning skill learning and GPA for grasping skill learning. GPA-RAM demonstrates superior performance across three robot systems with distinct camera configurations in simulation and the real world. Compared with previous state-of-the-art methods, it improves the absolute success rate by 8.2% (from 79.3% to 87.5%) on the RLBench multi-task benchmark and 40% (from 16% to 56%), 12% (from 86% to 98%) on the ALOHA bimanual manipulation tasks, while delivering notably faster inference. Furthermore, experimental results demonstrate that both RAM and GPA enhance task learning, with GPA proving robust to different architectures of pretrained grasp pose feature extractors. The project is https://logssim.github.io/GPA_RAM_website/
SPiDR: A Simple Approach for Zero-Shot Safety in Sim-to-Real Transfer
Deploying reinforcement learning (RL) safely in the real world is challenging, as policies trained in simulators must face the inevitable sim-to-real gap. Robust safe RL techniques are provably safe, however difficult to scale, while domain randomization is more practical yet prone to unsafe behaviors. We address this gap by proposing SPiDR, short for Sim-to-real via Pessimistic Domain Randomization -- a scalable algorithm with provable guarantees for safe sim-to-real transfer. SPiDR uses domain randomization to incorporate the uncertainty about the sim-to-real gap into the safety constraints, making it versatile and highly compatible with existing training pipelines. Through extensive experiments on sim-to-sim benchmarks and two distinct real-world robotic platforms, we demonstrate that SPiDR effectively ensures safety despite the sim-to-real gap while maintaining strong performance.
mmWave Radar-Based Non-Line-of-Sight Pedestrian Localization at T-Junctions Utilizing Road Layout Extraction via Camera
Pedestrians Localization in Non-Line-of-Sight (NLoS) regions within urban environments poses a significant challenge for autonomous driving systems. While mmWave radar has demonstrated potential for detecting objects in such scenarios, the 2D radar point cloud (PCD) data is susceptible to distortions caused by multipath reflections, making accurate spatial inference difficult. Additionally, although camera images provide high-resolution visual information, they lack depth perception and cannot directly observe objects in NLoS regions. In this paper, we propose a novel framework that interprets radar PCD through road layout inferred from camera for localization of NLoS pedestrians. The proposed method leverages visual information from the camera to interpret 2D radar PCD, enabling spatial scene reconstruction. The effectiveness of the proposed approach is validated through experiments conducted using a radar-camera system mounted on a real vehicle. The localization performance is evaluated using a dataset collected in outdoor NLoS driving environments, demonstrating the practical applicability of the method.
PolySim: Bridging the Sim-to-Real Gap for Humanoid Control via Multi-Simulator Dynamics Randomization
Humanoid whole-body control (WBC) policies trained in simulation often suffer from the sim-to-real gap, which fundamentally arises from simulator inductive bias, the inherent assumptions and limitations of any single simulator. These biases lead to nontrivial discrepancies both across simulators and between simulation and the real world. To mitigate the effect of simulator inductive bias, the key idea is to train policies jointly across multiple simulators, encouraging the learned controller to capture dynamics that generalize beyond any single simulator's assumptions. We thus introduce PolySim, a WBC training platform that integrates multiple heterogeneous simulators. PolySim can launch parallel environments from different engines simultaneously within a single training run, thereby realizing dynamics-level domain randomization. Theoretically, we show that PolySim yields a tighter upper bound on simulator inductive bias than single-simulator training. In experiments, PolySim substantially reduces motion-tracking error in sim-to-sim evaluations; for example, on MuJoCo, it improves execution success by 52.8 over an IsaacSim baseline. PolySim further enables zero-shot deployment on a real Unitree G1 without additional fine-tuning, showing effective transfer from simulation to the real world. We will release the PolySim code upon acceptance of this work.
comment: 8 pages, 5 figures
HEAL: An Empirical Study on Hallucinations in Embodied Agents Driven by Large Language Models EMNLP 2025
Large language models (LLMs) are increasingly being adopted as the cognitive core of embodied agents. However, inherited hallucinations, which stem from failures to ground user instructions in the observed physical environment, can lead to navigation errors, such as searching for a refrigerator that does not exist. In this paper, we present the first systematic study of hallucinations in LLM-based embodied agents performing long-horizon tasks under scene-task inconsistencies. Our goal is to understand to what extent hallucinations occur, what types of inconsistencies trigger them, and how current models respond. To achieve these goals, we construct a hallucination probing set by building on an existing benchmark, capable of inducing hallucination rates up to 40x higher than base prompts. Evaluating 12 models across two simulation environments, we find that while models exhibit reasoning, they fail to resolve scene-task inconsistencies-highlighting fundamental limitations in handling infeasible tasks. We also provide actionable insights on ideal model behavior for each scenario, offering guidance for developing more robust and reliable planning strategies.
comment: Accepted by EMNLP 2025 Findings
InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts
The advancement of Embodied AI heavily relies on large-scale, simulatable 3D scene datasets characterized by scene diversity and realistic layouts. However, existing datasets typically suffer from limitations in data scale or diversity, sanitized layouts lacking small items, and severe object collisions. To address these shortcomings, we introduce \textbf{InternScenes}, a novel large-scale simulatable indoor scene dataset comprising approximately 40,000 diverse scenes by integrating three disparate scene sources, real-world scans, procedurally generated scenes, and designer-created scenes, including 1.96M 3D objects and covering 15 common scene types and 288 object classes. We particularly preserve massive small items in the scenes, resulting in realistic and complex layouts with an average of 41.5 objects per region. Our comprehensive data processing pipeline ensures simulatability by creating real-to-sim replicas for real-world scans, enhances interactivity by incorporating interactive objects into these scenes, and resolves object collisions by physical simulations. We demonstrate the value of InternScenes with two benchmark applications: scene layout generation and point-goal navigation. Both show the new challenges posed by the complex and realistic layouts. More importantly, InternScenes paves the way for scaling up the model training for both tasks, making the generation and navigation in such complex scenes possible. We commit to open-sourcing the data, models, and benchmarks to benefit the whole community.
How Vulnerable Is My Learned Policy? Universal Adversarial Perturbation Attacks On Modern Behavior Cloning Policies
Learning from Demonstration (LfD) algorithms have shown promising results in robotic manipulation tasks, but their vulnerability to offline universal perturbation attacks remains underexplored. This paper presents a comprehensive study of adversarial attacks on both classic and recently proposed algorithms, including Behavior Cloning (BC), LSTM-GMM, Implicit Behavior Cloning (IBC), Diffusion Policy (DP), and Vector-Quantizied Behavior Transformer (VQ-BET). We study the vulnerability of these methods to universal adversarial perturbations. Our experiments on several simulated robotic manipulation tasks reveal that most of the current methods are highly vulnerable to adversarial perturbations. We also show that these attacks are often transferable across algorithms, architectures, and tasks, raising concerning security vulnerabilities to black-box attacks. To the best of our knowledge, we are the first to present a systematic study of the vulnerabilities of different LfD algorithms to both white-box and black-box attacks. Our findings highlight the vulnerabilities of modern BC algorithms, paving the way for future work in addressing such limitations.
DSM: Constructing a Diverse Semantic Map for 3D Visual Grounding
Effective scene representation is critical for the visual grounding ability of representations, yet existing methods for 3D Visual Grounding are often constrained. They either only focus on geometric and visual cues, or, like traditional 3D scene graphs, lack the multi-dimensional attributes needed for complex reasoning. To bridge this gap, we introduce the Diverse Semantic Map (DSM) framework, a novel scene representation framework that enriches robust geometric models with a spectrum of VLM-derived semantics, including appearance, physical properties, and affordances. The DSM is first constructed online by fusing multi-view observations within a temporal sliding window, creating a persistent and comprehensive world model. Building on this foundation, we propose DSM-Grounding, a new paradigm that shifts grounding from free-form VLM queries to a structured reasoning process over the semantic-rich map, markedly improving accuracy and interpretability. Extensive evaluations validate our approach's superiority. On the ScanRefer benchmark, DSM-Grounding achieves a state-of-the-art 59.06% overall accuracy of IoU@0.5, surpassing others by 10%. In semantic segmentation, our DSM attains a 67.93% F-mIoU, outperforming all baselines, including privileged ones. Furthermore, successful deployment on physical robots for complex navigation and grasping tasks confirms the framework's practical utility in real-world scenarios.
comment: 8 pages, 6 figures, Project Page: https://binicey.github.io/DSM
Behavior Trees and State Machines in Robotics Applications
Autonomous robots combine skills to form increasingly complex behaviors, called missions. While skills are often programmed at a relatively low abstraction level, their coordination is architecturally separated and often expressed in higher-level languages or frameworks. State machines have been the go-to language to model behavior for decades, but recently, behavior trees have gained attention among roboticists. Although several implementations of behavior trees are in use, little is known about their usage and scope in the real world.How do concepts offered by behavior trees relate to traditional languages, such as state machines? How are concepts in behavior trees and state machines used in actual applications? This paper is a study of the key language concepts in behavior trees as realized in domain-specific languages (DSLs), internal and external DSLs offered as libraries, and their use in open-source robotic applications supported by the Robot Operating System (ROS). We analyze behavior-tree DSLs and compare them to the standard language for behavior models in robotics:state machines. We identify DSLs for both behavior-modeling languages, and we analyze five in-depth.We mine open-source repositories for robotic applications that use the analyzed DSLs and analyze their usage. We identify similarities between behavior trees and state machines in terms of language design and the concepts offered to accommodate the needs of the robotics domain. We observed that the usage of behavior-tree DSLs in open-source projects is increasing rapidly. We observed similar usage patterns at model structure and at code reuse in the behavior-tree and state-machine models within the mined open-source projects. We contribute all extracted models as a dataset, hoping to inspire the community to use and further develop behavior trees, associated tools, and analysis techniques.
comment: Published at IEEE TSE as a journal extension of a preceding SLE paper (available as arXiv:2010.06256). arXiv admin note: text overlap with arXiv:2010.06256
Multiagent Systems
Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics
We present Ax-Prover, a multi-agent system for automated theorem proving in Lean that can solve problems across diverse scientific domains and operate either autonomously or collaboratively with human experts. To achieve this, Ax-Prover approaches scientific problem solving through formal proof generation, a process that demands both creative reasoning and strict syntactic rigor. Ax-Prover meets this challenge by equipping Large Language Models (LLMs), which provide knowledge and reasoning, with Lean tools via the Model Context Protocol (MCP), which ensure formal correctness. To evaluate its performance as an autonomous prover, we benchmark our approach against frontier LLMs and specialized prover models on two public math benchmarks and on two Lean benchmarks we introduce in the fields of abstract algebra and quantum theory. On public datasets, Ax-Prover is competitive with state-of-the-art provers, while it largely outperform them on the new benchmarks. This shows that, unlike specialized systems that struggle to generalize, our tool-based agentic theorem prover approach offers a generalizable methodology for formal verification across diverse scientific domains. Furthermore, we demonstrate Ax-Prover's assistant capabilities in a practical use case, showing how it enabled an expert mathematician to formalize the proof of a complex cryptography theorem.
Characterizing Agent-Based Model Dynamics via $ε$-Machines and Kolmogorov-Style Complexity
We propose a two-level information-theoretic framework for characterizing the informational organization of Agent-Based Model (ABM) dynamics within the broader paradigm of Complex Adaptive Systems (CAS). At the macro level, a pooled $\epsilon$-machine is reconstructed as a reference model that summarizes the system-wide informational regime. At the micro level, $\epsilon$-machines are reconstructed for each caregiver-elder dyad and variable, and are complemented with algorithm-agnostic Kolmogorov-style measures, including normalized LZ78 complexity and bits per symbol from lossless compression. The resulting feature set $\{h_{\mu}, C_{\mu}, E, \mathrm{LZ78}, \mathrm{bps}\}$ enables distributional analysis, stratified comparisons, and unsupervised clustering across agents and scenarios. This dual-scale design preserves agent heterogeneity while providing an interpretable macro-level baseline, aligning ABM practice with CAS principles of emergence, feedback, and adaptation. A case study on caregiver-elder interactions illustrates the framework's implementation; the results and discussion will be completed following final simulation runs.
comment: 5 pages, methodological preprint
Runtime Composition in Dynamic System of Systems: A Systematic Review of Challenges, Solutions, Tools, and Evaluation Methods
Context: Modern Systems of Systems (SoSs) increasingly operate in dynamic environments (e.g., smart cities, autonomous vehicles) where runtime composition -- the on-the-fly discovery, integration, and coordination of constituent systems (CSs)--is crucial for adaptability. Despite growing interest, the literature lacks a cohesive synthesis of runtime composition in dynamic SoSs. Objective: This study synthesizes research on runtime composition in dynamic SoSs and identifies core challenges, solution strategies, supporting tools, and evaluation methods. Methods: We conducted a Systematic Literature Review (SLR), screening 1,774 studies published between 2019 and 2024 and selecting 80 primary studies for thematic analysis (TA). Results: Challenges fall into four categories: modeling and analysis, resilient operations, system orchestration, and heterogeneity of CSs. Solutions span seven areas: co-simulation and digital twins, semantic ontologies, integration frameworks, adaptive architectures, middleware, formal methods, and AI-driven resilience. Service-oriented frameworks for composition and integration dominate tooling, while simulation platforms support evaluation. Interoperability across tools, limited cross-toolchain workflows, and the absence of standardized benchmarks remain key gaps. Evaluation approaches include simulation-based, implementation-driven, and human-centered studies, which have been applied in domains such as smart cities, healthcare, defense, and industrial automation. Conclusions: The synthesis reveals tensions, including autonomy versus coordination, the modeling-reality gap, and socio-technical integration. It calls for standardized evaluation metrics, scalable decentralized architectures, and cross-domain frameworks. The analysis aims to guide researchers and practitioners in developing and implementing dynamically composable SoSs.
Inclusive Fitness as a Key Step Towards More Advanced Social Behaviors in Multi-Agent Reinforcement Learning Settings AAMAS 2022
The competitive and cooperative forces of natural selection have driven the evolution of intelligence for millions of years, culminating in nature's vast biodiversity and the complexity of human minds. Inspired by this process, we propose a novel multi-agent reinforcement learning framework where each agent is assigned a genotype and where reward functions are modelled after the concept of inclusive fitness. An agent's genetic material may be shared with other agents, and our inclusive reward function naturally accounts for this. We study the resulting social dynamics in two types of network games with prisoner's dilemmas and find that our results align with well-established principles from biology, such as Hamilton's rule. Furthermore, we outline how this framework can extend to more open-ended environments with spatial and temporal structure, finite resources, and evolving populations. We hypothesize the emergence of an arms race of strategies, where each new strategy is a gradual improvement over earlier adaptations of other agents, effectively producing a multi-agent autocurriculum analogous to biological evolution. In contrast to the binary team-based structures prevalent in earlier research, our gene-based reward structure introduces a spectrum of cooperation ranging from full adversity to full cooperativeness based on genetic similarity, enabling unique non team-based social dynamics. For example, one agent having a mutual cooperative relationship with two other agents, while the two other agents behave adversarially towards each other. We argue that incorporating inclusive fitness in agents provides a foundation for the emergence of more strategically advanced and socially intelligent agents.
comment: This version is a slightly updated version (e.g., added an important reference) compared to the peer-reviewed versions at 'Adapative Learning Agents' at AAMAS 2022 or 'From Cells to Societies' at ICLR 2022
Heterogeneous RBCs via deep multi-agent reinforcement learning
Current macroeconomic models with agent heterogeneity can be broadly divided into two main groups. Heterogeneous-agent general equilibrium (GE) models, such as those based on Heterogeneous Agents New Keynesian (HANK) or Krusell-Smith (KS) approaches, rely on GE and 'rational expectations', somewhat unrealistic assumptions that make the models very computationally cumbersome, which in turn limits the amount of heterogeneity that can be modelled. In contrast, agent-based models (ABMs) can flexibly encompass a large number of arbitrarily heterogeneous agents, but typically require the specification of explicit behavioural rules, which can lead to a lengthy trial-and-error model-development process. To address these limitations, we introduce MARL-BC, a framework that integrates deep multi-agent reinforcement learning (MARL) with Real Business Cycle (RBC) models. We demonstrate that MARL-BC can: (1) recover textbook RBC results when using a single agent; (2) recover the results of the mean-field KS model using a large number of identical agents; and (3) effectively simulate rich heterogeneity among agents, a hard task for traditional GE approaches. Our framework can be thought of as an ABM if used with a variety of heterogeneous interacting agents, and can reproduce GE results in limit cases. As such, it is a step towards a synthesis of these often opposed modelling paradigms.
comment: 13 pages, 9 figures
UNCAP: Uncertainty-Guided Planning Using Natural Language Communication for Cooperative Autonomous Vehicles
Safe large-scale coordination of multiple cooperative connected autonomous vehicles (CAVs) hinges on communication that is both efficient and interpretable. Existing approaches either rely on transmitting high-bandwidth raw sensor data streams or neglect perception and planning uncertainties inherent in shared data, resulting in systems that are neither scalable nor safe. To address these limitations, we propose Uncertainty-Guided Natural Language Cooperative Autonomous Planning (UNCAP), a vision-language model-based planning approach that enables CAVs to communicate via lightweight natural language messages while explicitly accounting for perception uncertainty in decision-making. UNCAP features a two-stage communication protocol: (i) an ego CAV first identifies the subset of vehicles most relevant for information exchange, and (ii) the selected CAVs then transmit messages that quantitatively express their perception uncertainty. By selectively fusing messages that maximize mutual information, this strategy allows the ego vehicle to integrate only the most relevant signals into its decision-making, improving both the scalability and reliability of cooperative planning. Experiments across diverse driving scenarios show a 63% reduction in communication bandwidth with a 31% increase in driving safety score, a 61% reduction in decision uncertainty, and a four-fold increase in collision distance margin during near-miss events. Project website: https://uncap-project.github.io/
KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems NeurIPS2025
Multi-agent large language model (LLM) systems are increasingly adopted for complex language processing tasks that require communication and coordination among agents. However, these systems often suffer substantial overhead from repeated reprocessing of overlapping contexts across agents. In typical pipelines, once an agent receives a message from its predecessor, the full context-including prior turns-must be reprocessed from scratch, leading to inefficient processing. While key-value (KV) caching is an effective solution for avoiding redundant computation in single-agent settings where prefixes remain unchanged, it cannot be directly reused in multi-agent scenarios due to diverging prefixes introduced by agent-specific context extensions. We identify that the core challenge lies in the offset variance of KV-caches across agents. To address this, we propose KVCOMM, a training-free framework that enables efficient prefilling in multi-agent inference by reusing KV-caches and aligning cache offsets of overlapping contexts under diverse prefix contexts. KVCOMM estimates and adjusts KV-caches for shared content by referencing a pool of cached examples-termed anchors-that store observed cache deviations under varying prefixes. The anchor pool is maintained and updated online, allowing dynamic adaptation to distinct user requests and context structures. KVCOMM achieves over 70% reuse rate across diverse multi-agent workloads, including retrieval-augmented generation, math reasoning, and collaborative coding tasks, all without quality degradation. Particularly, when each fully-connected agent receives 1K input tokens with 512 prefix tokens and 512 output tokens under a five-agent setting, KVCOMM achieves up to 7.8x speedup compared to the standard prefill pipeline, reducing TTFT from ~430 ms to ~55 ms.
comment: Accepted for publication in NeurIPS2025. Code is available at \url{https://github.com/HankYe/KVCOMM}
Equilibria in routing games with connected autonomous vehicles will not be strong, as exclusive clubs may form
User Equilibrium is the standard representation of the so-called routing game in which drivers adjust their route choices to arrive at their destinations as fast as possible. Asking whether this Equilibrium is strong or not was meaningless for human drivers who did not form coalitions due to technical and behavioral constraints. This is no longer the case for connected autonomous vehicles (CAVs), which will be able to communicate and collaborate to jointly form routing coalitions. We demonstrate this for the first time on a carefully designed toy-network example, where a `club` of three autonomous vehicles jointly decides to deviate from the user equilibrium and benefit (arrive faster). The formation of such a club has negative consequences for other users, who are not invited to join it and now travel longer, and for the system, making it suboptimal and disequilibrated, which triggers adaptation dynamics. This discovery has profound implications for the future of our cities. We demonstrate that, if not prevented, CAV operators may intentionally disequilibrate traffic systems from their classic Nash equilibria, benefiting their own users and imposing costs on others. These findings suggest the possible emergence of an exclusive CAV elite, from which human-driven vehicles and non-coalition members may be excluded, potentially leading to systematically longer travel times for those outside the coalition, which would be harmful for the equity of public road networks.
Benefits and Limitations of Communication in Multi-Agent Reasoning
Chain-of-thought prompting has popularized step-by-step reasoning in large language models, yet model performance still degrades as problem complexity and context length grow. By decomposing difficult tasks with long contexts into shorter, manageable ones, recent multi-agent paradigms offer a promising near-term solution to this problem. However, the fundamental capacities of such systems are poorly understood. In this work, we propose a theoretical framework to analyze the expressivity of multi-agent systems. We apply our framework to three algorithmic families: state tracking, recall, and $k$-hop reasoning. We derive bounds on (i) the number of agents required to solve the task exactly, (ii) the quantity and structure of inter-agent communication, and (iii) the achievable speedups as problem size and context scale. Our results identify regimes where communication is provably beneficial, delineate tradeoffs between agent count and bandwidth, and expose intrinsic limitations when either resource is constrained. We complement our theoretical analysis with a set of experiments on pretrained LLMs using controlled synthetic benchmarks. Empirical outcomes confirm the tradeoffs between key quantities predicted by our theory. Collectively, our analysis offers principled guidance for designing scalable multi-agent reasoning systems.
comment: 34 pages, 14 figures
GenCellAgent: Generalizable, Training-Free Cellular Image Segmentation via Large Language Model Agents
Cellular image segmentation is essential for quantitative biology yet remains difficult due to heterogeneous modalities, morphological variability, and limited annotations. We present GenCellAgent, a training-free multi-agent framework that orchestrates specialist segmenters and generalist vision-language models via a planner-executor-evaluator loop (choose tool $\rightarrow$ run $\rightarrow$ quality-check) with long-term memory. The system (i) automatically routes images to the best tool, (ii) adapts on the fly using a few reference images when imaging conditions differ from what a tool expects, (iii) supports text-guided segmentation of organelles not covered by existing models, and (iv) commits expert edits to memory, enabling self-evolution and personalized workflows. Across four cell-segmentation benchmarks, this routing yields a 15.7\% mean accuracy gain over state-of-the-art baselines. On endoplasmic reticulum and mitochondria from new datasets, GenCellAgent improves average IoU by 37.6\% over specialist models. It also segments novel objects such as the Golgi apparatus via iterative text-guided refinement, with light human correction further boosting performance. Together, these capabilities provide a practical path to robust, adaptable cellular image segmentation without retraining, while reducing annotation burden and matching user preferences.
comment: 43 pages
Large Language Model Agents Enable Autonomous Design and Image Analysis of Microwell Microfluidics
Microwell microfluidics has been utilized for single-cell analysis to reveal heterogeneity in gene expression, signaling pathways, and phenotypic responses for identifying rare cell types, understanding disease progression, and developing more precise therapeutic strategies. However, designing microwell microfluidics is a considerably complex task, requiring knowledge, experience, and CAD software, as well as manual intervention, which often fails initial designs, demanding multiple costly and time-consuming iterations. In this study, we establish an autonomous large language model (LLM)-driven microwell design framework to generate code-based computer-aided design (CAD) scripts, that enables the rapid and reproducible creation of microwells with diverse geometries and imaging-based analysis. We propose a multimodal large language model (MLLM)-logistic regression framework based on integrating high-level semantic descriptions generated by MLLMs with image embeddings for image classification tasks, aiming to identify microwell occupancy and microwell shape. The fused multimodal representation is input to a logistic regression model, which is both interpretable and computationally efficient. We achieved significant improvements, exceeding 0.92 for occupancy classification and 0.99 for shape classification, across all evaluated MLLMs, compared with 0.50 and 0.55, respectively, when relying solely on direct classification. The MLLM-logistic regression framework is a scalable, efficient solution for high-throughput microwell image analysis. Our study demonstrates an autonomous design microwell platform by translating natural language prompts into optimized device geometries, CAD scripts and image analysis, facilitating the development of next-generation digital discovery by integration of literature mining, autonomous design and experimental data analysis.
Stronger Together: On-Policy Reinforcement Learning for Collaborative LLMs
Multi-agent systems (MAS) and reinforcement learning (RL) are widely used to enhance the agentic capabilities of large language models (LLMs). MAS improves task performance through role-based orchestration, while RL uses environmental rewards to learn stronger policies, such as GRPO-style optimization. However, applying on-policy RL to MAS remains underexplored and presents unique challenges. Algorithmically, standard GRPO grouping assumptions break down because prompts vary by role and by turn. System-wise, the training stack must support MAS-workflow rollouts and on-policy updates for both single-policy and multi-policy models. We propose AT-GRPO, which includes (i) an agent- and turn-wise grouped RL algorithm tailored to MAS and (ii) a training system that supports both single- and multi-policy regimes. Across game, planning, coding, and math tasks, AT-GRPO delivers substantial gains. On long-horizon planning, it increases accuracy from a 14.0 to 47.0 percent single-agent RL baseline to 96.0 to 99.5 percent. It also improves reasoning performance, with average gains of 3.87 to 7.62 percent on coding tasks and 9.0 to 17.93 percent on math. Code and environments are available at: https://github.com/pettingllms-ai/PettingLLMs.
Autonomous vehicles need social awareness to find optima in multi-agent reinforcement learning routing games
Previous work has shown that when multiple selfish Autonomous Vehicles (AVs) are introduced to future cities and start learning optimal routing strategies using Multi-Agent Reinforcement Learning (MARL), they may destabilize traffic systems, as they would require a significant amount of time to converge to the optimal solution, equivalent to years of real-world commuting. We demonstrate that moving beyond the selfish component in the reward significantly relieves this issue. If each AV, apart from minimizing its own travel time, aims to reduce its impact on the system, this will be beneficial not only for the system-wide performance but also for each individual player in this routing game. By introducing an intrinsic reward signal based on the marginal cost matrix, we significantly reduce training time and achieve convergence more reliably. Marginal cost quantifies the impact of each individual action (route-choice) on the system (total travel time). Including it as one of the components of the reward can reduce the degree of non-stationarity by aligning agents' objectives. Notably, the proposed counterfactual formulation preserves the system's equilibria and avoids oscillations. Our experiments show that training MARL algorithms with our novel reward formulation enables the agents to converge to the optimal solution, whereas the baseline algorithms fail to do so. We show these effects in both a toy network and the real-world network of Saint-Arnoult. Our results optimistically indicate that social awareness (i.e., including marginal costs in routing decisions) improves both the system-wide and individual performance of future urban systems with AVs.
Optimistic Multi-Agent Policy Gradient ICML 2024
*Relative overgeneralization* (RO) occurs in cooperative multi-agent learning tasks when agents converge towards a suboptimal joint policy due to overfitting to suboptimal behavior of other agents. No methods have been proposed for addressing RO in multi-agent policy gradient (MAPG) methods although these methods produce state-of-the-art results. To address this gap, we propose a general, yet simple, framework to enable optimistic updates in MAPG methods that alleviate the RO problem. Our approach involves clipping the advantage to eliminate negative values, thereby facilitating optimistic updates in MAPG. The optimism prevents individual agents from quickly converging to a local optimum. Additionally, we provide a formal analysis to show that the proposed method retains optimality at a fixed point. In extensive evaluations on a diverse set of tasks including the *Multi-agent MuJoCo* and *Overcooked* benchmarks, our method outperforms strong baselines on 13 out of 19 tested tasks and matches the performance on the rest.
comment: Updated MMDP notation. Published at ICML 2024, 17 pages, 10 figures
Nash Equilibria in Games with Playerwise Concave Coupling Constraints: Existence and Computation
We study the existence and computation of Nash equilibria in continuous static games where the players' admissible strategies are subject to shared coupling constraints, i.e., constraints that depend on their \emph{joint} strategies. Specifically, we focus on a class of games characterized by playerwise concave utilities and playerwise concave constraints. Prior results on the existence of Nash equilibria are not applicable to this class, as they rely on strong assumptions such as joint convexity of the feasible set. By leveraging topological fixed point theory and novel structural insights into the contractibility of feasible sets under playerwise concave constraints, we give an existence proof for Nash equilibria under weaker conditions. Having established existence, we then focus on the computation of Nash equilibria via independent gradient methods under the additional assumption that the utilities admit a potential function. To account for the possibly nonconvex feasible region, we employ a log barrier regularized gradient ascent with adaptive stepsizes. Starting from an initial feasible strategy profile and under exact gradient feedback, the proposed method converges to an $\epsilon$-approximate constrained Nash equilibrium within $\mathcal{O}(\epsilon^{-3})$ iterations.
Offline Fictitious Self-Play for Competitive Games
Offline Reinforcement Learning (RL) enables policy improvement from fixed datasets without online interactions, making it highly suitable for real-world applications lacking efficient simulators. Despite its success in the single-agent setting, offline multi-agent RL remains a challenge, especially in competitive games. Firstly, unaware of the game structure, it is impossible to interact with the opponents and conduct a major learning paradigm, self-play, for competitive games. Secondly, real-world datasets cannot cover all the state and action space in the game, resulting in barriers to identifying Nash equilibrium (NE). To address these issues, this paper introduces OFF-FSP, the first practical model-free offline RL algorithm for competitive games. We start by simulating interactions with various opponents by adjusting the weights of the fixed dataset with importance sampling. This technique allows us to learn the best responses to different opponents and employ the Offline Self-Play learning framework. To overcome the challenge of partial coverage, we combine the single-agent offline RL method with Fictitious Self-Play (FSP) to approximate NE by constraining the approximate best responses away from out-of-distribution actions. Experiments on matrix games, extensive-form poker, and board games demonstrate that OFF-FSP achieves significantly lower exploitability than state-of-the-art baselines. Finally, we validate OFF-FSP on a real-world human-robot competitive task, demonstrating its potential for solving complex, hard-to-simulate real-world problems.
A Hybrid ABM-PDE Framework for Real-World Infectious Disease Simulations
This paper presents a hybrid modeling approach that couples an Agent-Based Model (ABM) with a partial differential equation (PDE) model in an epidemic setting to simulate the spatial spread of infectious diseases using a compartmental structure with seven health states. The goal is to reduce the computational complexity of a full-ABM by introducing a coupled ABM-PDE model that offers significantly faster simulations while maintaining comparable accuracy. Our results demonstrate that the hybrid model not only reduces the overall simulation runtime (defined as the number of runs required for stable results multiplied by the duration of a single run) but also achieves smaller errors across both 25% and 100% population samples. The coupling mechanism ensures consistency at the model interface: agents crossing from the ABM into the PDE domain are removed and represented as density contributions, while surplus density in the PDE domain is used to generate agents with plausible trajectories derived from mobile phone data. We evaluate the hybrid model using real-world mobility and infection data for the Berlin-Brandenburg region in Germany, showing that it captures the core epidemiological dynamics while enabling efficient large-scale simulations.
Scaling Multi-Agent Epistemic Planning through GNN-Derived Heuristics
Multi-agent Epistemic Planning (MEP) is an autonomous planning framework for reasoning about both the physical world and the beliefs of agents, with applications in domains where information flow and awareness among agents are critical. The richness of MEP requires states to be represented as Kripke structures, i.e., directed labeled graphs. This representation limits the applicability of existing heuristics, hindering the scalability of epistemic solvers, which must explore an exponential search space without guidance, resulting often in intractability. To address this, we exploit Graph Neural Networks (GNNs) to learn patterns and relational structures within epistemic states, to guide the planning process. GNNs, which naturally capture the graph-like nature of Kripke models, allow us to derive meaningful estimates of state quality -- e.g., the distance from the nearest goal -- by generalizing knowledge obtained from previously solved planning instances. We integrate these predictive heuristics into an epistemic planning pipeline and evaluate them against standard baselines, showing improvements in the scalability of multi-agent epistemic planning.
Abmax: A JAX-based Agent-based Modeling Framework
Agent-based modeling (ABM) is a principal approach for studying complex systems. By decomposing a system into simpler, interacting agents, agent-based modeling (ABM) allows researchers to observe the emergence of complex phenomena. High-performance array computing libraries like JAX can help scale such computational models to a large number of agents by using automatic vectorization and just-in-time (JIT) compilation. One of the caveats of using JAX to achieve such scaling is that the shapes of arrays used in the computational model should remain immutable throughout the simulation. In the context of agent-based modeling (ABM), this can pose constraints on certain agent manipulation operations that require flexible data structures. A subset of which is represented by the ability to update a dynamically selected number of agents by applying distinct changes to them during a simulation. To this effect, we introduce Abmax, an ABM framework based on JAX that implements multiple just-in-time (JIT) compilable algorithms to provide this functionality. On the canonical predation model benchmark, Abmax achieves runtime performance comparable to state-of-the-art implementations. Further, we show that this functionality can also be vectorized, making it possible to run many similar agent-based models in parallel. We also present two examples in the form of a traffic-flow model and a financial market model to show the use case of Abmax.
comment: 8 pages, 7 figures, 4 tables, 2 algorithms
Multi-Agent Autonomous Driving Systems with Large Language Models: A Survey of Recent Advances
Autonomous Driving Systems (ADSs) are revolutionizing transportation by reducing human intervention, improving operational efficiency, and enhancing safety. Large Language Models (LLMs) have been integrated into ADSs to support high-level decision-making through their powerful reasoning, instruction-following, and communication abilities. However, LLM-based single-agent ADSs face three major challenges: limited perception, insufficient collaboration, and high computational demands. To address these issues, recent advances in LLM-based multi-agent ADSs leverage language-driven communication and coordination to enhance inter-agent collaboration. This paper provides a frontier survey of this emerging intersection between NLP and multi-agent ADSs. We begin with a background introduction to related concepts, followed by a categorization of existing LLM-based methods based on different agent interaction modes. We then discuss agent-human interactions in scenarios where LLM-based agents engage with humans. Finally, we summarize key applications, datasets, and challenges to support future research.
Time Travel is Cheating: Going Live with DeepFund for Real-Time Fund Investment Benchmarking NeurIPS 2025
Large Language Models (LLMs) have demonstrated notable capabilities across financial tasks, including financial report summarization, earnings call transcript analysis, and asset classification. However, their real-world effectiveness in managing complex fund investment remains inadequately assessed. A fundamental limitation of existing benchmarks for evaluating LLM-driven trading strategies is their reliance on historical back-testing, inadvertently enabling LLMs to "time travel"-leveraging future information embedded in their training corpora, thus resulting in possible information leakage and overly optimistic performance estimates. To address this issue, we introduce DeepFund, a live fund benchmark tool designed to rigorously evaluate LLM in real-time market conditions. Utilizing a multi-agent architecture, DeepFund connects directly with real-time stock market data-specifically data published after each model pretraining cutoff-to ensure fair and leakage-free evaluations. Empirical tests on nine flagship LLMs from leading global institutions across multiple investment dimensions-including ticker-level analysis, investment decision-making, portfolio management, and risk control-reveal significant practical challenges. Notably, even cutting-edge models such as DeepSeek-V3 and Claude-3.7-Sonnet incur net trading losses within DeepFund real-time evaluation environment, underscoring the present limitations of LLMs for active fund management. Our code is available at https://github.com/HKUSTDial/DeepFund.
comment: NeurIPS 2025 Datasets and Benchmarks Track
Coordination Requires Simplification: Thermodynamic Bounds on Multi-Objective Compromise in Natural and Artificial Intelligence
Information-processing systems that coordinate multiple agents and objectives face fundamental thermodynamic constraints. We show that solutions with maximum utility to act as coordination focal points have a much higher selection pressure for being findable across agents rather than accuracy. We derive that the information-theoretic minimum description length of coordination protocols to precision $\varepsilon$ scales as $L(P)\geq NK\log_2 K+N^2d^2\log (1/\varepsilon)$ for $N$ agents with $d$ potentially conflicting objectives and internal model complexity $K$. This scaling forces progressive simplification, with coordination dynamics changing the environment itself and shifting optimization across hierarchical levels. Moving from established focal points requires re-coordination, creating persistent metastable states and hysteresis until significant environmental shifts trigger phase transitions through spontaneous symmetry breaking. We operationally define coordination temperature to predict critical phenomena and estimate coordination work costs, identifying measurable signatures across systems from neural networks to restaurant bills to bureaucracies. Extending the topological version of Arrow's theorem on the impossibility of consistent preference aggregation, we find it recursively binds whenever preferences are combined. This potentially explains the indefinite cycling in multi-objective gradient descent and alignment faking in Large Language Models trained with reinforcement learning with human feedback. We term this framework Thermodynamic Coordination Theory (TCT), which demonstrates that coordination requires radical information loss.
comment: 15 pages, 1 figure, 9 pages supplementary material, submitted to Journal of Physics: Complexity
OrbitZoo: Real Orbital Systems Challenges for Reinforcement Learning NeurIPS 2025
The increasing number of satellites and orbital debris has made space congestion a critical issue, threatening satellite safety and sustainability. Challenges such as collision avoidance, station-keeping, and orbital maneuvering require advanced techniques to handle dynamic uncertainties and multi-agent interactions. Reinforcement learning (RL) has shown promise in this domain, enabling adaptive, autonomous policies for space operations; however, many existing RL frameworks rely on custom-built environments developed from scratch, which often use simplified models and require significant time to implement and validate the orbital dynamics, limiting their ability to fully capture real-world complexities. To address this, we introduce OrbitZoo, a versatile multi-agent RL environment built on a high-fidelity industry standard library, that enables realistic data generation, supports scenarios like collision avoidance and cooperative maneuvers, and ensures robust and accurate orbital dynamics. The environment is validated against a real satellite constellation, Starlink, achieving a Mean Absolute Percentage Error (MAPE) of 0.16% compared to real-world data. This validation ensures reliability for generating high-fidelity simulations and enabling autonomous and independent satellite operations.
comment: Accepted at the 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
Evolution of AI Agent Registry Solutions: Centralized, Enterprise, and Distributed Approaches
Autonomous AI agents now operate across cloud, enterprise, and decentralized domains, creating demand for registry infrastructures that enable trustworthy discovery, capability negotiation, and identity assurance. We analyze five prominent approaches: (1) MCP Registry (centralized publication of mcp.json descriptors), (2) A2A Agent Cards (decentralized self-describing JSON capability manifests), (3) AGNTCY Agent Directory Service (IPFS Kademlia DHT content routing extended for semantic taxonomy-based content discovery, OCI artifact storage, and Sigstore-backed integrity), (4) Microsoft Entra Agent ID (enterprise SaaS directory with policy and zero-trust integration), and (5) NANDA Index AgentFacts (cryptographically verifiable, privacy-preserving fact model with credentialed assertions). Using four evaluation dimensions: security, authentication, scalability, and maintainability, we surface architectural trade-offs between centralized control, enterprise governance, and distributed resilience. We conclude with design recommendations for an emerging Internet of AI Agents requiring verifiable identity, adaptive discovery flows, and interoperable capability semantics.
Foragax: An Agent-Based Modelling Framework Based on JAX
Foraging for resources is a ubiquitous activity conducted by living organisms in a shared environment to maintain their homeostasis. Modelling multi-agent foraging in-silico allows us to study both individual and collective emergent behaviour in a tractable manner. Agent-based modelling has proven to be effective in simulating such tasks, though scaling the simulations to accommodate large numbers of agents with complex dynamics remains challenging. In this work, we present Foragax, a general-purpose, scalable, hardware-accelerated, multi-agent foraging toolkit. Leveraging the JAX library, our toolkit can simulate thousands of agents foraging in a common environment, in an end-to-end vectorized and differentiable manner. The toolkit provides agent-based modelling tools to model various foraging tasks, including options to design custom spatial and temporal agent dynamics, control policies, sensor models, and boundary conditions. Further, the number of agents during such simulations can be increased or decreased based on custom rules. While applied to foraging, the toolkit can also be used to model and simulate a wide range of other multi-agent scenarios.
comment: A new version of the framework with more complete features is available in the form of ABMax (arXiv:2508.16508)
Decentralizing Multi-Agent Reinforcement Learning with Temporal Causal Information
Reinforcement learning (RL) algorithms can find an optimal policy for a single agent to accomplish a particular task. However, many real-world problems require multiple agents to collaborate in order to achieve a common goal. For example, a robot executing a task in a warehouse may require the assistance of a drone to retrieve items from high shelves. In Decentralized Multi-Agent RL (DMARL), agents learn independently and then combine their policies at execution time, but often must satisfy constraints on compatibility of local policies to ensure that they can achieve the global task when combined. In this paper, we study how providing high-level symbolic knowledge to agents can help address unique challenges of this setting, such as privacy constraints, communication limitations, and performance concerns. In particular, we extend the formal tools used to check the compatibility of local policies with the team task, making decentralized training with theoretical guarantees usable in more scenarios. Furthermore, we empirically demonstrate that symbolic knowledge about the temporal evolution of events in the environment can significantly expedite the learning process in DMARL.
comment: Code available at https://github.com/corazza/tcdmarl
Systems and Control (CS)
Autonomous Legged Mobile Manipulation for Lunar Surface Operations via Constrained Reinforcement Learning
Robotics plays a pivotal role in planetary science and exploration, where autonomous and reliable systems are crucial due to the risks and challenges inherent to space environments. The establishment of permanent lunar bases demands robotic platforms capable of navigating and manipulating in the harsh lunar terrain. While wheeled rovers have been the mainstay for planetary exploration, their limitations in unstructured and steep terrains motivate the adoption of legged robots, which offer superior mobility and adaptability. This paper introduces a constrained reinforcement learning framework designed for autonomous quadrupedal mobile manipulators operating in lunar environments. The proposed framework integrates whole-body locomotion and manipulation capabilities while explicitly addressing critical safety constraints, including collision avoidance, dynamic stability, and power efficiency, in order to ensure robust performance under lunar-specific conditions, such as reduced gravity and irregular terrain. Experimental results demonstrate the framework's effectiveness in achieving precise 6D task-space end-effector pose tracking, achieving an average positional accuracy of 4 cm and orientation accuracy of 8.1 degrees. The system consistently respects both soft and hard constraints, exhibiting adaptive behaviors optimized for lunar gravity conditions. This work effectively bridges adaptive learning with essential mission-critical safety requirements, paving the way for advanced autonomous robotic explorers for future lunar missions.
comment: This is the authors version of the paper accepted for publication in The IEEE International Conference on Space Robotics 2025. The final version link will be added here after conference proceedings are published
Variational Quantum Eigensolver Models of Molecular Quantum Dot Cellular Automata
Molecular quantum-dot Cellular Automata (QCA) may provide low-power, high-speed computational hardware for processing classical information. Simulation and modeling play an important role in the design of QCA circuits because fully-coherent models of QCA scale exponentially with the number of devices, and such models are severely limited in size. For larger circuits, approximations become necessary. In the era of fault-tolerant quantum computation, however, it may become possible to model large QCA circuits without such limitations. Presently, this work explores the use of the noisy-intermediate scale quantum (NISQ) variational quantum eigensolver (VQE) method for estimating the ground state of QCA circuits. This is relevant because the computational result of a QCA calculation is encoded in the circuit's ground state. In this study, VQE is used to model logic circuits, including binary wires, inverters, and majority gates. VQE models are performed ideal simulators, noisy simulators, and actual quantum hardware. This study demonstrates that VQE may indeed be used to model molecular QCA circuits. It is observed that using modern NISQ hardware, results are still quite sensitive to noise, so measures should be taken to minimize noise. These include simplifying the ansatz circuit whenever possible, and using low-noise hardware.
comment: 18 pages, 26 figures, submitted to the Journal of Applied Physics
Learning Robust Agile Flight Control with Stability Guarantees
In the evolving landscape of high-speed agile quadrotor flight, achieving precise trajectory tracking at the platform's operational limits is paramount. Controllers must handle actuator constraints, exhibit robustness to disturbances, and remain computationally efficient for safety-critical applications. In this work, we present a novel neural-augmented feedback controller for agile flight control. The controller addresses individual limitations of existing state-of-the-art control paradigms and unifies their strengths. We demonstrate the controller's capabilities, including the accurate tracking of highly aggressive trajectories that surpass the feasibility of the actuators. Notably, the controller provides universal stability guarantees, enhancing its robustness and tracking performance even in exceedingly disturbance-prone settings. Its nonlinear feedback structure is highly efficient enabling fast computation at high update rates. Moreover, the learning process in simulation is both fast and stable, and the controller's inherent robustness allows direct deployment to real-world platforms without the need for training augmentations or fine-tuning.
Enhancing Robust Multi-Market Participation of Renewable-Based VPPs through Flexible Resources
In the transition toward a sustainable power system, renewable-based Virtual Power Plants (RVPPs) have emerged as a promising solution to the challenges of integrating renewable energy sources into electricity markets. Their viability, however, depends on effective market participation strategies and the ability to manage uncertainties while leveraging flexible resources. This paper analyzes the impact of different flexible resources - such as concentrated solar power plants, hydro plants, biomass plants, and flexible demand - on the participation of RVPPs in energy and reserve markets. Multiple sources of uncertainty in generation, consumption, and electricity prices are addressed using a two-stage robust optimization approach. The contribution of different technologies to RVPP profitability is evaluated through a marginal contribution method, ensuring fair allocation of profits among them according to their actual role in energy and reserve provision across markets. Simulations for an RVPP in southern Spain demonstrate how strategic decisions and the availability of flexible resources influence viability, market participation, and unit scheduling.
Privacy-Preserving Distributed Estimation with Limited Data Rate
This paper focuses on the privacy-preserving distributed estimation problem with a limited data rate, where the observations are the sensitive information. Specifically, a binary-valued quantizer-based privacy-preserving distributed estimation algorithm is developed, which improves the algorithm's privacy-preserving capability and simultaneously reduces the communication costs. The algorithm's privacy-preserving capability, measured by the Fisher information matrix, is dynamically enhanced over time. Notably, the Fisher information matrix of the output signals with respect to the sensitive information converges to zero at a polynomial rate, and the improvement in privacy brought by the quantizers is quantitatively characterized as a multiplicative effect. Regarding the communication costs, each sensor transmits only 1 bit of information to its neighbours at each time step. Additionally, the assumption on the negligible quantization error for real-valued messages is not required. While achieving the requirements of privacy preservation and reducing communication costs, the algorithm ensures that its estimates converge almost surely to the true value of the unknown parameter by establishing a co-design guideline for the time-varying privacy noises and step-sizes. A polynomial almost sure convergence rate is obtained, and then the trade-off between privacy and convergence rate is established. Numerical examples demonstrate the main results.
Optimising Communication Control Factors for Energy Consumption in Rural LOS V2X
Connected braking can reduce fatal collisions in connected and autonomous vehicles (CAVs) by using reliable, low-latency 5G New Radio (NR) links, especially NR Sidelink Vehicle-to-Everything (V2X). In rural areas, road side units are sparse and power-constrained or off-grid, so energy efficiency must be considered alongside safety. This paper studies how three communication control factors including subcarrier spacing ($\mathrm{SCS}$), modulation and coding scheme ($\mathrm{MCS}$), and transmit power ($P_{\mathrm{t}}$) should be configured to balance safety and energy consumption in rural line-of-sight (LOS) scenarios in light and heavy traffic scenarios. Safety is quantified by the packet receive ratio ($\mathrm{PRR}$) against the minimum communication distance $D_{\mathrm{comm}}$, defined as the distance that the vehicle travels during the transmission of the safety message. Results show that, under heavy traffic, increasing $P_{\mathrm{t}}$ and selecting a low-rate $\mathrm{MCS}$ at $\mathrm{SCS} = 30$ kHz sustains high $\mathrm{PRR}$ at $D_{\mathrm{comm}}$, albeit with higher energy cost. In light traffic, maintaining lower $P_\mathrm{t}$ with low $\mathrm{MCS}$ levels achieves a favorable reliability-energy trade-off while preserving acceptable $\mathrm{PRR}$ at $D_{\mathrm{comm}}$. These findings demonstrate the necessity of adaptive, energy-aware strategy to guarantee both safety and energy efficiency in rural V2X systems.
Temporal Variabilities Limit Convergence Rates in Gradient-Based Online Optimization
This paper investigates the fundamental performance limits of gradient-based algorithms for time-varying optimization. Leveraging the internal model principle and root locus techniques, we show that temporal variabilities impose intrinsic limits on the achievable rate of convergence. For a problem with condition ratio $\kappa$ and time variation whose model has degree $n$, we show that the worst-case convergence rate of any minimal-order gradient-based algorithm is $\rho_\text{TV} = (\frac{\kappa-1}{\kappa+1})^{1/n}$. This bound reveals a fundamental tradeoff between problem conditioning, temporal complexity, and rate of convergence. We further construct explicit controllers that attain the bound for low-degree models of time variation.
DarTwin made precise by SysMLv2 -- An Experiment
The new SysMLv2 adds mechanisms for the built-in specification of domain-specific concepts and language extensions. This feature promises to facilitate the creation of Domain-Specific Languages (DSLs) and interfacing with existing system descriptions and technical designs. In this paper, we review these features and evaluate SysMLv2's capabilities using concrete use cases. We develop DarTwin DSL, a DSL that formalizes the existing DarTwin notation for Digital Twin (DT) evolution, through SysMLv2, thereby supposedly enabling the wide application of DarTwin's evolution templates using any SysMLv2 tool. We demonstrate DarTwin DSL, but also point out limitations in the currently available tooling of SysMLv2 in terms of graphical notation capabilities. This work contributes to the growing field of Model-Driven Engineering (MDE) for DTs and combines it with the release of SysMLv2, thus integrating a systematic approach with DT evolution management in systems engineering.
Micro-Macro Backstepping Control of Large-Scale Hyperbolic Systems (Extended Version)
We introduce a control design and analysis framework for micro-macro, boundary control of large-scale, $n+m$ hyperbolic PDE systems. Specifically, we develop feedback laws for stabilization of hyperbolic systems at the micro level (i.e., of the large-scale system) that employ a) measurements obtained from the $n+m$ system (i.e., at micro level) and kernels constructed based on an $\infty+\infty$ continuum system counterpart (i.e., at macro level), or b) kernels and measurements both stemming from a continuum counterpart, or c) averaged-continuum kernels/measurements. We also address (d)) stabilization of the continuum (macro) system, employing continuum kernels and measurements. Towards addressing d) we derive in a constructive manner an $\infty+\infty$ continuum approximation of $n+m$ hyperbolic systems and establish that its solutions approximate, for large $n$ and $m$, the solutions of the $n+m$ system. We then construct a feedback law for stabilization of the $\infty+\infty$ system via introduction of a continuum-PDE backstepping transformation. We establish well-posedness of the resulting 4-D kernel equations and prove closed-loop stability via construction of a novel Lyapunov functional. Furthermore, under control configuration a) we establish that the closed-loop system is exponentially stable provided that $n$ and $m$ are large, by proving that the exact, stabilizing $n+m$ control kernels can be accurately approximated by the continuum kernels. While under control configurations b) and c), we establish closed-loop stability capitalizing on the established solutions' and kernels' approximation properties via employment of infinite-dimensional ISS arguments. We provide two numerical simulation examples to illustrate the effectiveness and potential limitations of our design approach.
comment: 22 pages, 5 figures
The value of storage in electricity distribution: The role of storage
Electricity distribution companies deploy battery storage to defer grid upgrades by reducing peak demand. In deregulated jurisdictions, such storage often sits idle because regulatory constraints bar participation in electricity markets. Here, we develop an optimization framework that, to our knowledge, provides the first formal model of market participation constraints within storage investment and operation planning. Applying the framework to a Massachusetts case study, we find that market participation could deliver similar savings as peak demand reduction. Under current conditions, market participation does not increase storage investment, but at very low storage costs, could incentivize deployment beyond local distribution needs. This might run contrary to the separation of distribution from generation in deregulated markets. Our framework can identify investment levels appropriate for local distribution needs.
High-Parallel FPGA-Based Discrete Simulated Bifurcation for Large-Scale Optimization
Combinatorial Optimization (CO) problems exhibit exponential complexity, making their resolution challenging. Simulated Adiabatic Bifurcation (aSB) is a quantum-inspired algorithm to obtain approximate solutions to largescale CO problems written in the Ising form. It explores the solution space by emulating the adiabatic evolution of a network of Kerr-nonlinear parametric oscillators (KPOs), where each oscillator represents a variable in the problem. The optimal solution corresponds to the ground state of this system. A key advantage of this approach is the possibility of updating multiple variables simultaneously, making it particularly suited for hardware implementation. To enhance solution quality and convergence speed, variations of the algorithm have been proposed in the literature, including ballistic (bSB), discrete (dSB), and thermal (HbSB) versions. In this work, we have comprehensively analyzed dSB, bSB, and HbSB using dedicated software models, evaluating the feasibility of using a fixed-point representation for hardware implementation. We then present an opensource hardware architecture implementing the dSB algorithm for Field-Programmable Gate Arrays (FPGAs). The design allows users to adjust the degree of algorithmic parallelization based on their specific requirements. A proof-of-concept implementation that solves 256-variable problems was achieved on an AMD Kria KV260 SoM, a low-tier FPGA, validated using well-known max-cut and knapsack problems.
Pooling Probabilistic Forecasts for Cooperative Wind Power Offering SC
Wind power producers can benefit from forming coalitions to participate cooperatively in electricity markets. To support such collaboration, various profit allocation rules rooted in cooperative game theory have been proposed. However, existing approaches overlook the lack of coherence among producers regarding forecast information, which may lead to ambiguity in offering and allocations. In this paper, we introduce a ``reconcile-then-optimize'' framework for cooperative market offerings. This framework first aligns the individual forecasts into a coherent joint forecast before determining market offers. With such forecasts, we formulate and solve a two-stage stochastic programming problem to derive both the aggregate offer and the corresponding scenario-based dual values for each trading hour. Based on these dual values, we construct a profit allocation rule that is budget-balanced and stable. Finally, we validate the proposed method through empirical case studies, demonstrating its practical effectiveness and theoretical soundness.
comment: submission to PSCC 2026, 7 pages
A Unidirectionally Connected FAS Approach for 6-DOF Quadrotor Control
This paper proposes a unidirectionally connected fully actuated system (UC-FAS) approach for the sub-stabilization and tracking control of 6-DOF quadrotors, tackling limitations both in state-space and FAS framework to some extent. The framework systematically converts underactuated quadrotor dynamics into a UC-FAS model, unifying the existing different FAS transformation ways. By eliminating estimation of the high-order derivatives of control inputs, a drawback of current methods, the UC-FAS model simplifies controller design and enables direct eigenstructure assignment for closed-loop dynamics. Simulations demonstrate precise 6-DOF tracking performance. This work bridges theoretical FAS approach advancements with practical implementation needs, offering a standardized paradigm for nonlinear quadrotor control.
comment: This paper has been submitted to 2026 IFAC World Congress. Corresponding author: Guang-Ren Duan
Ultrafast Grid Impedance Identification in $dq$-Asymmetric Three-Phase Power Systems
We propose a non-parametric frequency-domain method to identify small-signal $dq$-asymmetric grid impedances, over a wide frequency band, using grid-connected converters. Existing identification methods are faced with significant trade-offs: e.g., passive approaches rely on ambient harmonics and rare grid events and thus can only provide estimates at a few frequencies, while many active approaches that intentionally perturb grid operation require long time series measurement and specialized equipment. Although active time-domain methods reduce the measurement time, they either make crude simplifying assumptions or require laborious model order tuning. Our approach effectively addresses these challenges: it does not require specialized excitation signals or hardware and achieves ultrafast ($<1$ s) identification, drastically reducing measurement time. Being non-parametric, our approach also makes no assumptions on the grid structure. A detailed electromagnetic transient simulation is used to validate the method and demonstrate its clear superiority over existing alternatives.
Physics-Informed Reinforcement Learning for Large-Scale EV Smart Charging Considering Distribution Network Voltage Constraints
Electric Vehicles (EVs) offer substantial flexibility for grid services, yet large-scale, uncoordinated charging can threaten voltage stability in distribution networks. Existing Reinforcement Learning (RL) approaches for smart charging often disregard physical grid constraints or have limited performance for complex large-scale tasks, limiting their scalability and real-world applicability. This paper introduces a physics-informed (PI) RL algorithm that integrates a differentiable power flow model and voltage-based reward design into the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm, enabling EVs to deliver real-time voltage support while meeting user demands. The resulting PI-TD3 algorithm achieves faster convergence, improved sample efficiency, and reliable voltage magnitude regulation under uncertain and overloaded conditions. Benchmarks on the IEEE 34-bus and 123-bus networks show that the proposed PI-TD3 outperforms both model-free RL and optimization-based baselines in grid constraint management, user satisfaction, and economic metrics, even as the system scales to hundreds of EVs. These advances enable robust, scalable, and practical EV charging strategies that enhance grid resilience and support distribution networks operation.
Empowering Prosumers: Incentive Design for Local Electricity Markets Under Generalized Uncertainty and Grid Constraints
Since the 1990s, widespread introduction of central (wholesale) electricity markets has been seen across multiple continents, driven by the search for efficient operation of the power grid through competition. The increase of renewables has made significant impacts both on central electricity markets and distribution-level grids as renewable power generation is often connected to the latter. These stochastic renewable technologies have both advantages and disadvantages. On one hand they offer very low marginal cost and carbon emissions, while on the other hand, their output is uncertain, requiring flexible backup power with high marginal cost. Flexibility from end-prosumers or smaller market participants is therefore seen as a key enabler of large-scale integration of renewables. However, current central electricity markets do not directly include uncertainty into the market clearing and do not account for physical constraints of distribution grids. In this paper we propose a local electricity market framework based on probabilistic locational marginal pricing, effectively accounting for uncertainties in production, consumption and grid variables. The model includes a representation of the grid using the lindistflow equations and accounts for the propagation of uncertainty using general Polynomial Chaos (gPC). A two-stage convex model is proposed; in the day-ahead stage, probability distributions of prices are calculated for every timestep, where the expected values represent the day-ahead (spot) prices. In the real-time stage, uncertainties are realized (measured) and a trivial calculation reveals the real-time price. Through four instructive case-studies we highlight the effectiveness of the method to incentivize end-prosumers' participation in the market, while ensuring that their behavior does not have an adverse impact on the operation of the grid.
Human-in-the-Loop Bandwidth Estimation for Quality of Experience Optimization in Real-Time Video Communication AAAI
The quality of experience (QoE) delivered by video conferencing systems is significantly influenced by accurately estimating the time-varying available bandwidth between the sender and receiver. Bandwidth estimation for real-time communications remains an open challenge due to rapidly evolving network architectures, increasingly complex protocol stacks, and the difficulty of defining QoE metrics that reliably improve user experience. In this work, we propose a deployed, human-in-the-loop, data-driven framework for bandwidth estimation to address these challenges. Our approach begins with training objective QoE reward models derived from subjective user evaluations to measure audio and video quality in real-time video conferencing systems. Subsequently, we collect roughly $1$M network traces with objective QoE rewards from real-world Microsoft Teams calls to curate a bandwidth estimation training dataset. We then introduce a novel distributional offline reinforcement learning (RL) algorithm to train a neural-network-based bandwidth estimator aimed at improving QoE for users. Our real-world A/B test demonstrates that the proposed approach reduces the subjective poor call ratio by $11.41\%$ compared to the baseline bandwidth estimator. Furthermore, the proposed offline RL algorithm is benchmarked on D4RL tasks to demonstrate its generalization beyond bandwidth estimation.
comment: Accepted for publication in the proceedings of the AAAI Conference on Artificial Intelligence 2026 (IAAI Technical Track on Deployed Highly Innovative Applications of AI)
Sleepy Chauffeur Detection and Alert Techniques for Road Safety
The most startling of the contemporary problems is the sleepiness of chauffeur which causes lots of car accidents. Prevention of those impending accidents by detecting and alerting the sleepy chauffeur is vital, otherwise that would lead to loss of lives and various traumas along with severe injuries. The slumber or sleep may be caused by huge stress, pressure, relentless work load or alcoholism, for which sleep deprivation occurs and the chauffeur while driving gets drowsy. So far, considerable amount of systems has been developed to detect drowsiness of drivers, most of which mainly depend on image processing algorithms using cameras. Some of them also incorporate artificial intelligence and machine learning based algorithms. This paper presents a review of the existing systems and also proposes an easy and cheap system using sensors and Arduino, capable of detecting sleepiness and generates siren alarm and send alert message to take precautionary measures.
comment: 8 pages, 5 figures, International Journal on Recent Innovation in Microelectronics and Microcontrollers Applications Vol. 1, Issue 1 - 2018
Hybrid Terrain-Aware Path Planning: Integrating VD--RRT\(^{*}\) Exploration and VD--D\(^{*}\) Lite Repair
Autonomous ground vehicles operating off-road must plan curvature-feasible paths while accounting for spatially varying soil strength and slope hazards in real time. We present a continuous state--cost metric that combines a Bekker pressure--sinkage model with elevation-derived slope and attitude penalties. The resulting terrain cost field is analytic, bounded, and monotonic in soil modulus and slope, ensuring well-posed discretization and stable updates under sensor noise. This metric is evaluated on a lattice with exact steering primitives: Dubins and Reeds--Shepp motions for differential drive and time-parameterized bicycle arcs for Ackermann steering. Global exploration is performed using Vehicle-Dynamics RRT\(^{*}\), while local repair is managed by Vehicle-Dynamics D\(^{*}\) Lite, enabling millisecond-scale replanning without heuristic smoothing. By separating the terrain--vehicle model from the planner, the framework provides a reusable basis for deterministic, sampling-based, or learning-driven planning in deformable terrain. Hardware trials on an off-road platform demonstrate real-time navigation across soft soil and slope transitions, supporting reliable autonomy in unstructured environments.
Towards xApp Conflict Evaluation with Explainable Machine Learning and Causal Inference in O-RAN
The Open Radio Access Network (O-RAN) architecture enables a flexible, vendor-neutral deployment of 5G networks by disaggregating base station components and supporting third-party xApps for near real-time RAN control. However, the concurrent operation of multiple xApps can lead to conflicting control actions, which may cause network performance degradation. In this work, we propose a framework for xApp conflict management that combines explainable machine learning and causal inference to evaluate the causal relationships between RAN Control Parameters (RCPs) and Key Performance Indicators (KPIs). We use model explainability tools such as SHAP to identify RCPs that jointly affect the same KPI, signaling potential conflicts, and represent these interactions as a causal Directed Acyclic Graph (DAG). We then estimate the causal impact of each of these RCPs on their associated KPIs using metrics such as Average Treatment Effect (ATE) and Conditional Average Treatment Effect (CATE). This approach offers network operators guided insights into identifying conflicts and quantifying their impacts, enabling more informed and effective conflict resolution strategies across diverse xApp deployments.
Information Shapes Koopman Representation
The Koopman operator provides a powerful framework for modeling dynamical systems and has attracted growing interest from the machine learning community. However, its infinite-dimensional nature makes identifying suitable finite-dimensional subspaces challenging, especially for deep architectures. We argue that these difficulties come from suboptimal representation learning, where latent variables fail to balance expressivity and simplicity. This tension is closely related to the information bottleneck (IB) dilemma: constructing compressed representations that are both compact and predictive. Rethinking Koopman learning through this lens, we demonstrate that latent mutual information promotes simplicity, yet an overemphasis on simplicity may cause latent space to collapse onto a few dominant modes. In contrast, expressiveness is sustained by the von Neumann entropy, which prevents such collapse and encourages mode diversity. This insight leads us to propose an information-theoretic Lagrangian formulation that explicitly balances this tradeoff. Furthermore, we propose a new algorithm based on the Lagrangian formulation that encourages both simplicity and expressiveness, leading to a stable and interpretable Koopman representation. Beyond quantitative evaluations, we further visualize the learned manifolds under our representations, observing empirical results consistent with our theoretical predictions. Finally, we validate our approach across a diverse range of dynamical systems, demonstrating improved performance over existing Koopman learning methods. The implementation is publicly available at https://github.com/Wenxuan52/InformationKoopman.
Data to Certificate: Guaranteed Cost Control with Quantization-Aware System Identification
Cloud-assisted system identification and control have emerged as practical solutions for low-power, resource-constrained control systems such as micro-UAVs. In a typical cloud-assisted setting, state and input data are transmitted from local agents to a central computer over low-bandwidth wireless links, leading to quantization. This paper investigates the impact of state and input data quantization on a linear time invariant (LTI) system identification, derives a worst-case bound on the identification error, and develops a robust controller for guaranteed cost control. We establish a fundamental bound on the model error that depends only on the quantized data and quantization resolution, and develop a linear matrix inequality (LMI) based guaranteed cost robust controller under this error bound.
comment: 8 pages, 3 figures
Comparison of Forced and Unforced Rendezvous, Proximity Operations, and Docking Under Model Mismatch
This paper compares the required fuel usage for forced and unforced motion of a chaser satellite engaged in Rendezvous, Proximity Operations, and Docking (RPOD) maneuvers. Improved RPOD models are vital, particularly as the space industry expands and demands for improved fuel efficiency, cost effectiveness, and mission life span increase. This paper specifically examines the Clohessy- Wiltshire (CW) Equations and the extent of model mismatch by comparing pre- dicted trajectories from this model with a more computationally complex, higher fidelity RPOD model. This paper assesses several test cases of similar mission parameters, in each case comparing natural motion circumnavigation (NMC) with comparable forced motion circumnavigation. The Guidance, Navigation, and Con- trol (GNC) impulse maneuvers required to maintain the supposedly zero fuel CW trajectories is representative of the extent of CW model mismatch. This paper demonstrates that unforced motions are not inherently more fuel efficient than forced motions, thus permitting extended orbital operations given the higher fuel efficiency.
comment: 12 pages, 4 figures, AAS/AIAA Space Flight Mechanics
Identifying Best Candidates for Busbar Splitting
Rising electricity demand and the growing integration of renewables are intensifying congestion in transmission grids. Grid topology optimization through busbar splitting (BuS) and optimal transmission switching can alleviate grid congestion and reduce the generation costs in a power system. However, BuS optimization requires a large number of binary variables, and analyzing all the substations for potential new topological actions is computationally intractable, particularly in large grids. To tackle this issue, we propose a set of metrics to identify and rank promising candidates for BuS, focusing on finding buses where topology optimization can reduce generation costs. To assess the effect of BuS on the identified buses, we use a combined mixed-integer convex-quadratic BuS model to compute the optimal topology and test it with the non-linear non-convex AC optimal power flow (OPF) simulation to show its AC feasibility. By testing and validating the proposed metrics on test cases of different sizes, we show that they are able to identify busbars that reduce the total generation costs when their topology is optimized. Thus, the metrics enable effective selection of busbars for BuS, with no need to test every busbar in the grid, one at a time.
Competitive EV charging station location with queues
Electric vehicle (EV) public charging infrastructure planning faces significant challenges in competitive markets, where multiple service providers affect congestion and user behavior. This work extends existing modeling frameworks by incorporating the presence of competitors' stations and more realistic queueing systems. First, we analyze three finite queueing systems, M/M/1/K, M/M/s/K, and M/Er/s/K, with varying numbers of servers (charging outlets) and service time distributions, deriving analytic expressions for user behavior metrics. Second, we embed the queueing-based user behavior model into a bilevel program, where the upper level locates new charging stations to maximize accessibility (throughput), and the lower level captures users' station choices via a user equilibrium. Third, we apply a reformulation from competitive congested user-choice facility location models to approximately solve the bilevel problem and introduce a surrogate-based heuristic to enhance scalability. Fourth, we showcase our methodology on a real-world case study of an urban area in Montreal (Canada), offering managerial insights into how user-choice behavior assumptions and competition affect throughput and location decisions. The results demonstrate that our model yields (re)location strategies that outperform the existing network. More broadly, this approach provides a tool for incorporating charging service quality-through queueing metrics-and existing competition into station planning.
Model predictive control lowers barriers to adoption of heat-pump water heaters: A field study
Electric heat-pump water heaters (HPWHs) could reduce the energy costs, emissions, and power grid impacts associated with water heating, the second-largest energy use in United States housing. However, most HPWHs today require 240 V circuits to power the backup resistance heating elements they use to maintain comfort during large water draws. Installing a 240 V circuit can increase the up-front cost of a HPWH by half or more. This paper develops and field-tests the first control system that enables a 120 V HPWH to efficiently maintain comfort without resistance heating elements. The novel model predictive control (MPC) system enables pre-heating in anticipation of large water draws, which it forecasts using an ensemble of machine learning predictors. By shifting electrical load over time, MPC also reduces energy costs on average by 23% and 28% under time-of-use pricing and hourly pricing, respectively, relative to a 240 V HPWH with standard controls. Compared to the increasingly common practice in 120 V HPWHs of storing water at a constant, high temperature (60 {\deg}C) to ensure comfort, MPC saves 37% energy on average. In addition to demonstrating MPC's benefits in a real, occupied house, this paper discusses implementation challenges and costs. A simple payback analysis suggests that a 120 V HPWH, operated by the MPC system developed here, would be economically attractive in most installation scenarios.
Enhancing Profit and CO2 Mitigation: Commercial Direct Air Capture Design and Operation with Power Market Volatility
Current decarbonization efforts are falling short of meeting the net-zero greenhouse gas (GHG) emission target, highlighting the need for substantial carbon dioxide removal methods such as direct air capture (DAC). However, integrating DACs poses challenges due to their enormous power consumption. This study assesses the commercial operation of various DAC technologies that earn revenue using monetized carbon incentives while purchasing electricity from wholesale power markets. We model four commercial DAC technologies and examine their operation in three representative locations including California, Texas, and New York. Our findings reveal that commercial DAC operations can take financial advantage of the volatile power market to operate only during low-price periods strategically, offering a pathway to facilitate a cost-efficient decarbonization transition. The ambient operational environment such as temperature and relative humidity has non-trivial impact on abatement capacity. Profit-driven decisions introduce climate-economic trade-offs that might decrease the capacity factor of DAC and reduce total CO2 removal. These implications extend throughout the entire lifecycle of DAC developments and influence power systems and policies related to full-scale DAC implementation. Our study shows that DAC technologies with shorter cycle spans and higher flexibility can better exploit the electricity price volatility, while power markets demonstrate persistent low-price windows that often synergize with low grid emission periods, like during the solar "duck curve" in California. An optimal incentive design exists for profit-driven operations while carbon-tax policy in electricity pricing is counterproductive for DAC systems.
comment: 16 pages, 8 figure, Submitted and under review for Engineering
Non-Gaussian Distribution Steering in Nonlinear Dynamics with Conjugate Unscented Transformation
In highly nonlinear systems such as the ones commonly found in astrodynamics, Gaussian distributions generally evolve into non-Gaussian distributions. This paper introduces a method for effectively controlling non-Gaussian distributions in nonlinear environments using optimized linear feedback control. This paper utilizes Conjugate Unscented Transformation to quantify the higher-order statistical moments of non-Gaussian distributions. The formulation focuses on controlling and constraining the sigma points associated with the uncertainty quantification, which would thereby reflect the control of the entire distribution and constraints on the moments themselves. This paper develops an algorithm to solve this problem with sequential convex programming, and it is demonstrated through a two-body and three-body example. The examples show that individual moments can be directly controlled, and the moments are accurately approximated for non-Gaussian distributions throughout the controller's time horizon in nonlinear dynamics.
Gaussian Process Implicit Surfaces as Control Barrier Functions for Safe Robot Navigation
Level set methods underpin modern safety techniques such as control barrier functions (CBFs), while also serving as implicit surface representations for geometric shapes via distance fields. Inspired by these two paradigms, we propose a unified framework where the implicit surface itself acts as a CBF. We leverage Gaussian process (GP) implicit surface (GPIS) to represent the safety boundaries, using safety samples which are derived from sensor measurements to condition the GP. The GP posterior mean defines the implicit safety surface (safety belief), while the posterior variance provides a robust safety margin. Although GPs have favorable properties such as uncertainty estimation and analytical tractability, they scale cubically with data. To alleviate this issue, we develop a sparse solution called sparse Gaussian CBFs. To the best of our knowledge, GPIS have not been explicitly used to synthesize CBFs. We validate the approach on collision avoidance tasks in two settings: a simulated 7-DOF manipulator operating around the Stanford bunny, and a quadrotor navigating in 3D around a physical chair. In both cases, Gaussian CBFs (with and without sparsity) enable safe interaction and collision-free execution of trajectories that would otherwise intersect the objects.
comment: 8 pages, 7 figures, under review
A Wideband Composite Sequence Impedance Model for Evaluation of Interactions in Unbalanced Power-Electronic-Based Power Systems
This paper proposes a wideband composite sequence impedance model (WCSIM)-based analysis method to evaluate the interactions in power-electronic-based power systems subjected to unbalanced grid faults or with unbalanced loads. The WCSIM-based method intuitively assesses the impact of the small-signal interconnection among the positive-, negative-, and zero-sequence circuits on the interaction stability of unbalanced power systems. The effectiveness of this method is demonstrated using a permanent magnet synchronous generator-based weak grid system under a single-line-to-ground fault (SLGF). Frequency scanning results and controller hardware-in-loop tests validate both the correctness of the WCSIM and the effectiveness of the WCSIM-based analysis method.
comment: This work will be submitted to the IEEE for possible publication
ExaModelsPower.jl: A GPU-Compatible Modeling Library for Nonlinear Power System Optimization
As GPU-accelerated mathematical programming techniques mature, there is growing interest in utilizing them to address the computational challenges of power system optimization. This paper introduces ExaModelsPower.jl, an open-source modeling library for creating GPU-compatible nonlinear AC optimal power flow models. Built on ExaModels.jl, ExaModelsPower.jl provides a high-level interface that automatically generates all necessary callback functions for GPU solvers. The library is designed for large-scale problem instances, which may include multiple time periods and security constraints. Using ExaModelsPower.jl, we benchmark GPU and CPU solvers on open-source test cases. Our results show that GPU solvers can deliver up to two orders of magnitude speedups compared to alternative tools on CPU for problems with more than 20,000 variables and a solution precision of up to $10^{-4}$, while performance for smaller instances or tighter tolerances may vary.
Optimization of High-Order Quarter-Wave Plate for Birefringence Suppression in FOCS
Fiber optic current sensors (FOCS) are widely adopted in modern power grids due to high sensitivity, excellent insulation, and strong immunity to electromagnetic interference. This prominence necessitates precise investigation into their error sources and corresponding optimization. This study examines reflective FOCS based on the Faraday effect. A theoretical model is established to simulate phase error caused by linear birefringence from the quarter-wave plate. Conventional methods using circular birefringence are analyzed, revealing inherent limitations. Innovatively, a compensation strategy employing high-order quarter-wave plates is proposed to effectively eliminate linear birefringence effects. This approach significantly enhances the accuracy and practicality of FOCS in precision metrology.
The Algorithmic Regulator
The regulator theorem states that, under certain conditions, any optimal controller must embody a model of the system it regulates, grounding the idea that controllers embed, explicitly or implicitly, internal models of the controlled. This principle underpins neuroscience and predictive brain theories like the Free-Energy Principle or Kolmogorov/Algorithmic Agent theory. However, the theorem is only proven in limited settings. Here, we treat the deterministic, closed, coupled world-regulator system $(W,R)$ as a single self-delimiting program $p$ via a constant-size wrapper that produces the world output string~$x$ fed to the regulator. We analyze regulation from the viewpoint of the algorithmic complexity of the output, $K(x)$. We define $R$ to be a \emph{good algorithmic regulator} if it \emph{reduces} the algorithmic complexity of the readout relative to a null (unregulated) baseline $\varnothing$, i.e., \[ \Delta = K\big(O_{W,\varnothing}\big) - K\big(O_{W,R}\big) > 0. \] We then prove that the larger $\Delta$ is, the more world-regulator pairs with high mutual algorithmic information are favored. More precisely, a complexity gap $\Delta > 0$ yields \[ \Pr\big((W,R)\mid x\big) \le C\,2^{\,M(W{:}R)}\,2^{-\Delta}, \] making low $M(W{:}R)$ exponentially unlikely as $\Delta$ grows. This is an AIT version of the idea that ``the regulator contains a model of the world.'' The framework is distribution-free, applies to individual sequences, and complements the Internal Model Principle. Beyond this necessity claim, the same coding-theorem calculus singles out a \emph{canonical scalar objective} and implicates a \emph{planner}. On the realized episode, a regulator behaves \emph{as if} it minimized the conditional description length of the readout.
comment: 2 Figures
Integrative, Scalable Modeling of Hydrological Systems with MBSE and HFGT
Worsening global challenges in the Anthropocene demand complex, adaptive solutions grounded in a systems-level understanding of coupled social and environmental dynamics. However, existing modeling approaches often fall short due to disciplinary silos, limited scalability, and the absence of shared ontological frameworks. Model-Based Systems Engineering (MBSE), when integrated with Hetero-functional Graph Theory (HFGT), offers a powerful methodology for modeling systems of systems while preserving subsystem heterogeneity and enabling cross-disciplinary integration. This paper presents the first application of the MBSE-HFGT methodology to environmental systems, using a series of worked examples involving flow through lake and land segments. These examples demonstrate how the approach enables consistent, scalable, and integrative modeling of complex environmental processes.
A Cyber Insurance Policy for Hedging Against Load-Altering Attacks and Extreme Load Variations in Distribution Grids
Uncertainties in renewable energy resources (RES) and load variations can lead to elevated system operational costs. Moreover, the emergence of large-scale distributed threats, such as load-altering attacks (LAAs), can induce substantial load variations, further exacerbating these costs. Although traditional defense measures can reduce the likelihood of such attacks, considerable residual risks remain. Thus, this paper proposes a cyber insurance framework designed to hedge against additional operational costs resulting from LAAs and substantial load variations in renewable-rich grids. The insurance framework determines both the insurance coverage and premium based on the Value at Risk (VaR) and Tail Value at Risk (TVaR). These risk metrics are calculated using the system failure probability and the probability density function (PDF) of the system operation cost. The system failure probability is assessed through a semi-Markov process (SMP), while the cost distribution is estimated through a cost minimization model of a distribution grid combined with a Monte-Carlo simulation to capture load variability. Furthermore, we employ a bi-level optimization scheme that identifies the specific load distribution leading to the maximum system cost, thereby enhancing the accuracy of the operation cost PDF estimation. The effectiveness and scalability of the proposed cyber insurance policy are evaluated considering a modified IEEE-118 test bus system and the IEEE European low-voltage (LV) test feeders model. The case study shows that with a relatively low premium, the network operator can hedge against additional operational costs caused by malicious load manipulations.
Product-oriented Product-Process-Resource Asset Network and its Representation in AutomationML for Asset Administration Shell
Current products, especially in the automotive sector, pose complex technical systems having a multi-disciplinary mechatronic nature. Industrial standards supporting system engineering and production typically (i) address the production phase only, but do not cover the complete product life cycle, and (ii) focus on production processes and resources rather than the products themselves. The presented approach is motivated by incorporating the impacts of the end-of-life phase of the product life cycle into the engineering phase. This paper proposes a modeling approach coming up from the Product-Process-Resource (PPR) modeling paradigm. It combines requirements on (i) respecting the product structure as a basis for the model, and (ii) incorporates repairing, remanufacturing, or upcycling within cyber-physical production systems. The proposed model called PoPAN should accompany the product during the entire life cycle as a digital shadow encapsulated within the Asset Administration Shell of a product. To facilitate the adoption of the proposed paradigm, the paper also proposes serialization of the model in the AutomationML data format. The model is demonstrated on a use-case for disassembling electric vehicle batteries to support their remanufacturing for stationary battery applications.
comment: \copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
RIS-Assisted Millimeter Wave Communications for Indoor Scenarios: Modeling and Coverage Analysis
Millimeter wave (mmWave) communications and reconfigurable intelligent surfaces (RIS) are two critical technologies for next-generation networks, especially in dense indoor environments. However, existing analyses often oversimplify the indoor environment by neglecting some of the key characteristics, such as height variations, boundary effects, blockage effects, and user spatial distributions. In this paper, we develop an improved stochastic geometry-based model for RIS-assisted mmWave communications in indoor scenarios like conference centers, hospitals, and shopping malls. The proposed model incorporates the height factor for all the nodes in the network (e.g., transmitters, users, RISs, and obstacles) and captures the user clustering behavior in these scenarios. In addition, the boundary effect is also being considered for line-of-sight (LOS) probability calculation. Analytical expressions for distance distributions, LOS probabilities, and the coverage probability (CP) are derived. The CP is then validated through Monte Carlo simulations. Our results reveal deployment insights by approximating and simplifying the derived CP expressions, showing how transmitter density, obstacle density, RIS density, and user cluster radius impact network coverage. Notably, we show that RISs significantly improve coverage when transmitters or transmit power are limited but offer marginal benefits when transmitter density is high. These findings provide practical guidelines for the design and deployment of RIS-assisted indoor mmWave networks.
Eco-driving Incentive Mechanisms for Mitigating Emissions in Urban Transportation
This paper develops incentive mechanisms for promoting eco-driving with the overarching goal of minimizing emissions in transportation networks. The system operator provides drivers with energy-efficient driving guidance throughout their trips and measures compliance through vehicle telematics that capture how closely drivers follow this guidance. Drivers optimize their behaviors based on personal trade-offs between travel times and emissions. To design effective incentives, the operator elicits driver preferences regarding trip urgency and willingness to eco-drive, while determining optimal budget allocations and eco-driving recommendations. Two distinct settings based on driver behavior are analyzed. When drivers report their preferences truthfully, an incentive mechanism ensuring obedience (drivers find it optimal to follow recommendations) is designed by implementing eco-driving recommendations as a Nash equilibrium. When drivers may report strategically, the mechanism is extended to be both obedient and truthful (drivers find it optimal to report truthfully). Unlike existing works that focus on congestion or routing decisions in transportation networks, our framework explicitly targets emissions reduction by incentivizing drivers. The proposed mechanism addresses both strategic behavior and network effects arising from driver interactions, without requiring the operator to reveal system parameters to the drivers. Numerical simulations demonstrate the effects of budget constraints, driver types, and strategic misreporting on equilibrium outcomes and emissions reduction.
comment: 12 pages, 6 figures
Globally Stable Discrete Time PID Passivity-based Control of Power Converters: Simulation and Experimental Results
The key idea behind PID Passivity-based Control (PID-PBC) is to leverage the passivity property of PIDs (for all positive gains) and wrap the PID controller around a passive output to ensure global stability in closed-loop. However, the practical applicability of PID-PBC is stymied by two key facts: (i) the vast majority of practical implementations of PIDs is carried-out in discrete time -- discretizing the continuous time dynamical system of the PID; (ii) the well-known problem that passivity is not preserved upon discretization, even with small sampling times. Therefore, two aspects of the PID-PBC must be revisited for its safe practical application. First, we propose a discretization of the PID that ensures its passivity. Second, since the output that is identified as passive for the continuous time system is not necessarily passive for its discrete time version, we construct a new output that ensures the passivity property for the discretization of the system. In this paper, we provide a constructive answer to both issues for the case of power converter models. Instrumental to achieve this objective is the use of the implicit midpoint discretization method -- which is a symplectic integration technique that preserves system invariants. Since the reference value for the output to be regulated in power converters is non-zero, we are henceforth interested in the property of passivity of the incremental model -- currently known as shifted passivity. Therefore, we demonstrate that the resulting discrete-time PID-PBC defines a passive map for the incremental model and establish shifted passivity for the discretized power converter model. Combining these properties, we prove global stability for the feedback interconnection of the power converter with the discretized PID-PBC. The paper also presents simulations and experiments that demonstrate the performance of the proposed discretization.
Padé Approximant Neural Networks for Enhanced Electric Motor Fault Diagnosis Using Vibration and Acoustic Data
Purpose: The primary aim of this study is to enhance fault diagnosis in induction machines by leveraging the Pad\'e Approximant Neuron (PAON) model. While accelerometers and microphones are standard in motor condition monitoring, deep learning models with nonlinear neuron architectures offer promising improvements in diagnostic performance. This research investigates whether Pad\'e Approximant Neural Networks (Pad\'eNets) can outperform conventional Convolutional Neural Networks (CNNs) and Self-Organized Operational Neural Networks (Self-ONNs) in the diagnosis of electrical and mechanical faults from vibration and acoustic data. Methods: We evaluate and compare the diagnostic capabilities of three deep learning architectures: one-dimensional CNNs, Self-ONNs, and Pad\'eNets. These models are tested on the University of Ottawa's publicly available constant-speed induction motor datasets, which include both vibration and acoustic sensor data. The Pad\'eNet model is designed to introduce enhanced nonlinearity and is compatible with unbounded activation functions such as LeakyReLU. Results and Conclusion: Pad\'eNets consistently outperformed the baseline models, achieving diagnostic accuracies of 99.96%, 98.26%, 97.61%, and 98.33% for accelerometers 1, 2, 3, and the acoustic sensor, respectively. The enhanced nonlinearity of Pad\'eNets, together with their compatibility with unbounded activation functions, significantly improves fault diagnosis performance in induction motor condition monitoring.
comment: This version is the author's accepted manuscript. It has been peer-reviewed and accepted for publication in Journal of Vibration Engineering & Technologies. The final published version is available at https://doi.org/10.1007/s42417-025-02129-5
Dissipativity-Based Distributed Control and Communication Topology Co-Design for DC Microgrids with ZIP Loads
This paper presents a novel dissipativity-based distributed droop-free control and communication topology co-design approach for voltage regulation and current sharing in DC microgrids (DC MGs) with generic ``ZIP'' (constant impedance (Z), current (I) and power (P)) loads. While ZIP loads accurately capture the varied nature of the consumer loads, its constant power load (CPL) component is particularly challenging (and destabilizing) due to its non-linear form. Moreover, ensuring simultaneous voltage regulation and current sharing and co-designing controllers and topology are also challenging when designing control solutions for DC MGs. To address these three challenges, we model the DC MG as a networked system comprised of distributed generators (DGs), ZIP loads, and lines interconnected according to a static interconnection matrix. Next, we equip each DG with a local controller and a distributed global controller (over an arbitrary topology) to derive the error dynamic model of the DC MG as a networked ``error'' system, including disturbance inputs and performance outputs. Subsequently, to co-design the controllers and the topology ensuring robust (dissipative) voltage regulation and current sharing performance, we use the dissipativity and sector boundedness properties of the involved subsystems and formulate Linear Matrix Inequality (LMI) problems to be solved locally and globally. To support the feasibility of the global LMI problem, we identify and embed several crucial necessary conditions in the corresponding local LMI problems, thus providing a one-shot approach to solve the LMI problems. Overall, the proposed approach in this paper provides a unified framework for designing DC MGs. The effectiveness of the proposed solution was verified by simulating an islanded DC MG under different scenarios, demonstrating superior performance compared to traditional control approaches.
comment: arXiv admin note: substantial text overlap with arXiv:2503.04908
Dissipativity-Based Distributed Control and Communication Topology Co-Design for Voltage Regulation and Current Sharing in DC Microgrids
This paper presents a novel dissipativity-based distributed droop-free control approach for voltage regulation and current sharing in DC microgrids (MGs) comprised of an interconnected set of distributed generators (DGs), loads, and power lines. First, we describe the closed-loop DC MG as a networked system where the DGs and lines (i.e., subsystems) are interconnected via a static interconnection matrix. This interconnection matrix demonstrates how the inputs, outputs, and disturbances of DGs and lines are connected in a DC MG. Each DG is equipped with a local controller for voltage regulation and a distributed global controller for current sharing, where the local controllers ensure individual voltage tracking while the global controllers coordinate among DGs to achieve proportional current sharing. To design the distributed global controllers, we use the dissipativity properties of the subsystems and formulate a linear matrix inequality (LMI) problem. To support the feasibility of this problem, we identify a set of necessary local and global conditions to enforce in a specifically developed LMI-based local controller design process. In contrast to existing DC MG control solutions, our approach proposes a unified framework for co-designing the distributed controller and communication topology. As the co-design process is LMI-based, it can be efficiently implemented and evaluated using existing convex optimization tools. The effectiveness of the proposed solution is verified by simulating an islanded DC MG in a MATLAB/Simulink environment under different scenarios, such as load changes and topological constraint changes, and then comparing the performance with the droop control algorithm.
The Untapped Potential of Smart Charging: How EV Owners Can Save Money and Reduce Emissions Without Behavioral Change
The transportation sector is the single largest contributor to US emissions and the second largest globally. Electric vehicles (EVs) are expected to represent half of global car sales by 2035, emerging as a pivotal solution to reduce emissions and enhance grid flexibility. The electrification of buildings, manufacturing, and transportation is expected to grow electricity demand substantially over the next decade. Without effectively managed EV charging, EVs could strain energy grid infrastructure and increase electricity costs. Drawing on de-identified 2023 EV telematics data from Rivian Automotive, this study found that 72% of home charging commenced after the customer plugged in their vehicle regardless of utility time of use (TOU) tariffs or managed charging programs. In fewer than 26% of charging sessions in the sample, EV owners actively scheduled charging times to align or participate in utility tariffs or programs. With a majority of drivers concurrently plugged in during optimal charging periods yet not actively charging, the study identified an opportunity to reduce individual EV owner costs and carbon emissions through smarter charging habits without significant behavioral modifications or sacrifice in user preferences. By optimizing home charging schedules within existing plug-in and plug-out windows, the study suggests that EV owners can save an average of $140 annually and reduce the associated carbon emissions of charging their EV by as much as 28%.
A Neural Network-based Multi-timestep Command Governor for Nonlinear Systems with Constraints
The multi-timestep command governor (MCG) is an add-on algorithm that enforces constraints by modifying, at each timestep, the reference command to a pre-stabilized control system. The MCG can be interpreted as a Model-Predictive Control scheme operating on the reference command. The implementation of MCG on nonlinear systems carries a heavy computational burden as it requires solving a nonlinear program with multiple decision variables at each timestep. This paper proposes a less computationally demanding alternative, based on approximating the MCG control law using a neural network (NN) trained on offline data. However, since the NN output may not always be constraint-admissible due to training errors, its output is adjusted using a sensitivity-based method. We thus refer to the resulting control strategy as the neural network-based MCG (NN-MCG). As validation, the proposed controller is applied as a load governor for constraint management in an automotive fuel cell system. It is shown that the proposed strategy is significantly more computationally efficient than the traditional MCG, while achieving nearly identical performance if the NN is well-trained.
comment: Accepted for publication in the 2025 IEEE Conference on Control Technology and Applications (CCTA)
Systems and Control (EESS)
Autonomous Legged Mobile Manipulation for Lunar Surface Operations via Constrained Reinforcement Learning
Robotics plays a pivotal role in planetary science and exploration, where autonomous and reliable systems are crucial due to the risks and challenges inherent to space environments. The establishment of permanent lunar bases demands robotic platforms capable of navigating and manipulating in the harsh lunar terrain. While wheeled rovers have been the mainstay for planetary exploration, their limitations in unstructured and steep terrains motivate the adoption of legged robots, which offer superior mobility and adaptability. This paper introduces a constrained reinforcement learning framework designed for autonomous quadrupedal mobile manipulators operating in lunar environments. The proposed framework integrates whole-body locomotion and manipulation capabilities while explicitly addressing critical safety constraints, including collision avoidance, dynamic stability, and power efficiency, in order to ensure robust performance under lunar-specific conditions, such as reduced gravity and irregular terrain. Experimental results demonstrate the framework's effectiveness in achieving precise 6D task-space end-effector pose tracking, achieving an average positional accuracy of 4 cm and orientation accuracy of 8.1 degrees. The system consistently respects both soft and hard constraints, exhibiting adaptive behaviors optimized for lunar gravity conditions. This work effectively bridges adaptive learning with essential mission-critical safety requirements, paving the way for advanced autonomous robotic explorers for future lunar missions.
comment: This is the authors version of the paper accepted for publication in The IEEE International Conference on Space Robotics 2025. The final version link will be added here after conference proceedings are published
Variational Quantum Eigensolver Models of Molecular Quantum Dot Cellular Automata
Molecular quantum-dot Cellular Automata (QCA) may provide low-power, high-speed computational hardware for processing classical information. Simulation and modeling play an important role in the design of QCA circuits because fully-coherent models of QCA scale exponentially with the number of devices, and such models are severely limited in size. For larger circuits, approximations become necessary. In the era of fault-tolerant quantum computation, however, it may become possible to model large QCA circuits without such limitations. Presently, this work explores the use of the noisy-intermediate scale quantum (NISQ) variational quantum eigensolver (VQE) method for estimating the ground state of QCA circuits. This is relevant because the computational result of a QCA calculation is encoded in the circuit's ground state. In this study, VQE is used to model logic circuits, including binary wires, inverters, and majority gates. VQE models are performed ideal simulators, noisy simulators, and actual quantum hardware. This study demonstrates that VQE may indeed be used to model molecular QCA circuits. It is observed that using modern NISQ hardware, results are still quite sensitive to noise, so measures should be taken to minimize noise. These include simplifying the ansatz circuit whenever possible, and using low-noise hardware.
comment: 18 pages, 26 figures, submitted to the Journal of Applied Physics
Learning Robust Agile Flight Control with Stability Guarantees
In the evolving landscape of high-speed agile quadrotor flight, achieving precise trajectory tracking at the platform's operational limits is paramount. Controllers must handle actuator constraints, exhibit robustness to disturbances, and remain computationally efficient for safety-critical applications. In this work, we present a novel neural-augmented feedback controller for agile flight control. The controller addresses individual limitations of existing state-of-the-art control paradigms and unifies their strengths. We demonstrate the controller's capabilities, including the accurate tracking of highly aggressive trajectories that surpass the feasibility of the actuators. Notably, the controller provides universal stability guarantees, enhancing its robustness and tracking performance even in exceedingly disturbance-prone settings. Its nonlinear feedback structure is highly efficient enabling fast computation at high update rates. Moreover, the learning process in simulation is both fast and stable, and the controller's inherent robustness allows direct deployment to real-world platforms without the need for training augmentations or fine-tuning.
Enhancing Robust Multi-Market Participation of Renewable-Based VPPs through Flexible Resources
In the transition toward a sustainable power system, renewable-based Virtual Power Plants (RVPPs) have emerged as a promising solution to the challenges of integrating renewable energy sources into electricity markets. Their viability, however, depends on effective market participation strategies and the ability to manage uncertainties while leveraging flexible resources. This paper analyzes the impact of different flexible resources - such as concentrated solar power plants, hydro plants, biomass plants, and flexible demand - on the participation of RVPPs in energy and reserve markets. Multiple sources of uncertainty in generation, consumption, and electricity prices are addressed using a two-stage robust optimization approach. The contribution of different technologies to RVPP profitability is evaluated through a marginal contribution method, ensuring fair allocation of profits among them according to their actual role in energy and reserve provision across markets. Simulations for an RVPP in southern Spain demonstrate how strategic decisions and the availability of flexible resources influence viability, market participation, and unit scheduling.
Privacy-Preserving Distributed Estimation with Limited Data Rate
This paper focuses on the privacy-preserving distributed estimation problem with a limited data rate, where the observations are the sensitive information. Specifically, a binary-valued quantizer-based privacy-preserving distributed estimation algorithm is developed, which improves the algorithm's privacy-preserving capability and simultaneously reduces the communication costs. The algorithm's privacy-preserving capability, measured by the Fisher information matrix, is dynamically enhanced over time. Notably, the Fisher information matrix of the output signals with respect to the sensitive information converges to zero at a polynomial rate, and the improvement in privacy brought by the quantizers is quantitatively characterized as a multiplicative effect. Regarding the communication costs, each sensor transmits only 1 bit of information to its neighbours at each time step. Additionally, the assumption on the negligible quantization error for real-valued messages is not required. While achieving the requirements of privacy preservation and reducing communication costs, the algorithm ensures that its estimates converge almost surely to the true value of the unknown parameter by establishing a co-design guideline for the time-varying privacy noises and step-sizes. A polynomial almost sure convergence rate is obtained, and then the trade-off between privacy and convergence rate is established. Numerical examples demonstrate the main results.
Optimising Communication Control Factors for Energy Consumption in Rural LOS V2X
Connected braking can reduce fatal collisions in connected and autonomous vehicles (CAVs) by using reliable, low-latency 5G New Radio (NR) links, especially NR Sidelink Vehicle-to-Everything (V2X). In rural areas, road side units are sparse and power-constrained or off-grid, so energy efficiency must be considered alongside safety. This paper studies how three communication control factors including subcarrier spacing ($\mathrm{SCS}$), modulation and coding scheme ($\mathrm{MCS}$), and transmit power ($P_{\mathrm{t}}$) should be configured to balance safety and energy consumption in rural line-of-sight (LOS) scenarios in light and heavy traffic scenarios. Safety is quantified by the packet receive ratio ($\mathrm{PRR}$) against the minimum communication distance $D_{\mathrm{comm}}$, defined as the distance that the vehicle travels during the transmission of the safety message. Results show that, under heavy traffic, increasing $P_{\mathrm{t}}$ and selecting a low-rate $\mathrm{MCS}$ at $\mathrm{SCS} = 30$ kHz sustains high $\mathrm{PRR}$ at $D_{\mathrm{comm}}$, albeit with higher energy cost. In light traffic, maintaining lower $P_\mathrm{t}$ with low $\mathrm{MCS}$ levels achieves a favorable reliability-energy trade-off while preserving acceptable $\mathrm{PRR}$ at $D_{\mathrm{comm}}$. These findings demonstrate the necessity of adaptive, energy-aware strategy to guarantee both safety and energy efficiency in rural V2X systems.
Temporal Variabilities Limit Convergence Rates in Gradient-Based Online Optimization
This paper investigates the fundamental performance limits of gradient-based algorithms for time-varying optimization. Leveraging the internal model principle and root locus techniques, we show that temporal variabilities impose intrinsic limits on the achievable rate of convergence. For a problem with condition ratio $\kappa$ and time variation whose model has degree $n$, we show that the worst-case convergence rate of any minimal-order gradient-based algorithm is $\rho_\text{TV} = (\frac{\kappa-1}{\kappa+1})^{1/n}$. This bound reveals a fundamental tradeoff between problem conditioning, temporal complexity, and rate of convergence. We further construct explicit controllers that attain the bound for low-degree models of time variation.
DarTwin made precise by SysMLv2 -- An Experiment
The new SysMLv2 adds mechanisms for the built-in specification of domain-specific concepts and language extensions. This feature promises to facilitate the creation of Domain-Specific Languages (DSLs) and interfacing with existing system descriptions and technical designs. In this paper, we review these features and evaluate SysMLv2's capabilities using concrete use cases. We develop DarTwin DSL, a DSL that formalizes the existing DarTwin notation for Digital Twin (DT) evolution, through SysMLv2, thereby supposedly enabling the wide application of DarTwin's evolution templates using any SysMLv2 tool. We demonstrate DarTwin DSL, but also point out limitations in the currently available tooling of SysMLv2 in terms of graphical notation capabilities. This work contributes to the growing field of Model-Driven Engineering (MDE) for DTs and combines it with the release of SysMLv2, thus integrating a systematic approach with DT evolution management in systems engineering.
Micro-Macro Backstepping Control of Large-Scale Hyperbolic Systems (Extended Version)
We introduce a control design and analysis framework for micro-macro, boundary control of large-scale, $n+m$ hyperbolic PDE systems. Specifically, we develop feedback laws for stabilization of hyperbolic systems at the micro level (i.e., of the large-scale system) that employ a) measurements obtained from the $n+m$ system (i.e., at micro level) and kernels constructed based on an $\infty+\infty$ continuum system counterpart (i.e., at macro level), or b) kernels and measurements both stemming from a continuum counterpart, or c) averaged-continuum kernels/measurements. We also address (d)) stabilization of the continuum (macro) system, employing continuum kernels and measurements. Towards addressing d) we derive in a constructive manner an $\infty+\infty$ continuum approximation of $n+m$ hyperbolic systems and establish that its solutions approximate, for large $n$ and $m$, the solutions of the $n+m$ system. We then construct a feedback law for stabilization of the $\infty+\infty$ system via introduction of a continuum-PDE backstepping transformation. We establish well-posedness of the resulting 4-D kernel equations and prove closed-loop stability via construction of a novel Lyapunov functional. Furthermore, under control configuration a) we establish that the closed-loop system is exponentially stable provided that $n$ and $m$ are large, by proving that the exact, stabilizing $n+m$ control kernels can be accurately approximated by the continuum kernels. While under control configurations b) and c), we establish closed-loop stability capitalizing on the established solutions' and kernels' approximation properties via employment of infinite-dimensional ISS arguments. We provide two numerical simulation examples to illustrate the effectiveness and potential limitations of our design approach.
comment: 22 pages, 5 figures
The value of storage in electricity distribution: The role of storage
Electricity distribution companies deploy battery storage to defer grid upgrades by reducing peak demand. In deregulated jurisdictions, such storage often sits idle because regulatory constraints bar participation in electricity markets. Here, we develop an optimization framework that, to our knowledge, provides the first formal model of market participation constraints within storage investment and operation planning. Applying the framework to a Massachusetts case study, we find that market participation could deliver similar savings as peak demand reduction. Under current conditions, market participation does not increase storage investment, but at very low storage costs, could incentivize deployment beyond local distribution needs. This might run contrary to the separation of distribution from generation in deregulated markets. Our framework can identify investment levels appropriate for local distribution needs.
High-Parallel FPGA-Based Discrete Simulated Bifurcation for Large-Scale Optimization
Combinatorial Optimization (CO) problems exhibit exponential complexity, making their resolution challenging. Simulated Adiabatic Bifurcation (aSB) is a quantum-inspired algorithm to obtain approximate solutions to largescale CO problems written in the Ising form. It explores the solution space by emulating the adiabatic evolution of a network of Kerr-nonlinear parametric oscillators (KPOs), where each oscillator represents a variable in the problem. The optimal solution corresponds to the ground state of this system. A key advantage of this approach is the possibility of updating multiple variables simultaneously, making it particularly suited for hardware implementation. To enhance solution quality and convergence speed, variations of the algorithm have been proposed in the literature, including ballistic (bSB), discrete (dSB), and thermal (HbSB) versions. In this work, we have comprehensively analyzed dSB, bSB, and HbSB using dedicated software models, evaluating the feasibility of using a fixed-point representation for hardware implementation. We then present an opensource hardware architecture implementing the dSB algorithm for Field-Programmable Gate Arrays (FPGAs). The design allows users to adjust the degree of algorithmic parallelization based on their specific requirements. A proof-of-concept implementation that solves 256-variable problems was achieved on an AMD Kria KV260 SoM, a low-tier FPGA, validated using well-known max-cut and knapsack problems.
Pooling Probabilistic Forecasts for Cooperative Wind Power Offering SC
Wind power producers can benefit from forming coalitions to participate cooperatively in electricity markets. To support such collaboration, various profit allocation rules rooted in cooperative game theory have been proposed. However, existing approaches overlook the lack of coherence among producers regarding forecast information, which may lead to ambiguity in offering and allocations. In this paper, we introduce a ``reconcile-then-optimize'' framework for cooperative market offerings. This framework first aligns the individual forecasts into a coherent joint forecast before determining market offers. With such forecasts, we formulate and solve a two-stage stochastic programming problem to derive both the aggregate offer and the corresponding scenario-based dual values for each trading hour. Based on these dual values, we construct a profit allocation rule that is budget-balanced and stable. Finally, we validate the proposed method through empirical case studies, demonstrating its practical effectiveness and theoretical soundness.
comment: submission to PSCC 2026, 7 pages
A Unidirectionally Connected FAS Approach for 6-DOF Quadrotor Control
This paper proposes a unidirectionally connected fully actuated system (UC-FAS) approach for the sub-stabilization and tracking control of 6-DOF quadrotors, tackling limitations both in state-space and FAS framework to some extent. The framework systematically converts underactuated quadrotor dynamics into a UC-FAS model, unifying the existing different FAS transformation ways. By eliminating estimation of the high-order derivatives of control inputs, a drawback of current methods, the UC-FAS model simplifies controller design and enables direct eigenstructure assignment for closed-loop dynamics. Simulations demonstrate precise 6-DOF tracking performance. This work bridges theoretical FAS approach advancements with practical implementation needs, offering a standardized paradigm for nonlinear quadrotor control.
comment: This paper has been submitted to 2026 IFAC World Congress. Corresponding author: Guang-Ren Duan
Ultrafast Grid Impedance Identification in $dq$-Asymmetric Three-Phase Power Systems
We propose a non-parametric frequency-domain method to identify small-signal $dq$-asymmetric grid impedances, over a wide frequency band, using grid-connected converters. Existing identification methods are faced with significant trade-offs: e.g., passive approaches rely on ambient harmonics and rare grid events and thus can only provide estimates at a few frequencies, while many active approaches that intentionally perturb grid operation require long time series measurement and specialized equipment. Although active time-domain methods reduce the measurement time, they either make crude simplifying assumptions or require laborious model order tuning. Our approach effectively addresses these challenges: it does not require specialized excitation signals or hardware and achieves ultrafast ($<1$ s) identification, drastically reducing measurement time. Being non-parametric, our approach also makes no assumptions on the grid structure. A detailed electromagnetic transient simulation is used to validate the method and demonstrate its clear superiority over existing alternatives.
Physics-Informed Reinforcement Learning for Large-Scale EV Smart Charging Considering Distribution Network Voltage Constraints
Electric Vehicles (EVs) offer substantial flexibility for grid services, yet large-scale, uncoordinated charging can threaten voltage stability in distribution networks. Existing Reinforcement Learning (RL) approaches for smart charging often disregard physical grid constraints or have limited performance for complex large-scale tasks, limiting their scalability and real-world applicability. This paper introduces a physics-informed (PI) RL algorithm that integrates a differentiable power flow model and voltage-based reward design into the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm, enabling EVs to deliver real-time voltage support while meeting user demands. The resulting PI-TD3 algorithm achieves faster convergence, improved sample efficiency, and reliable voltage magnitude regulation under uncertain and overloaded conditions. Benchmarks on the IEEE 34-bus and 123-bus networks show that the proposed PI-TD3 outperforms both model-free RL and optimization-based baselines in grid constraint management, user satisfaction, and economic metrics, even as the system scales to hundreds of EVs. These advances enable robust, scalable, and practical EV charging strategies that enhance grid resilience and support distribution networks operation.
Empowering Prosumers: Incentive Design for Local Electricity Markets Under Generalized Uncertainty and Grid Constraints
Since the 1990s, widespread introduction of central (wholesale) electricity markets has been seen across multiple continents, driven by the search for efficient operation of the power grid through competition. The increase of renewables has made significant impacts both on central electricity markets and distribution-level grids as renewable power generation is often connected to the latter. These stochastic renewable technologies have both advantages and disadvantages. On one hand they offer very low marginal cost and carbon emissions, while on the other hand, their output is uncertain, requiring flexible backup power with high marginal cost. Flexibility from end-prosumers or smaller market participants is therefore seen as a key enabler of large-scale integration of renewables. However, current central electricity markets do not directly include uncertainty into the market clearing and do not account for physical constraints of distribution grids. In this paper we propose a local electricity market framework based on probabilistic locational marginal pricing, effectively accounting for uncertainties in production, consumption and grid variables. The model includes a representation of the grid using the lindistflow equations and accounts for the propagation of uncertainty using general Polynomial Chaos (gPC). A two-stage convex model is proposed; in the day-ahead stage, probability distributions of prices are calculated for every timestep, where the expected values represent the day-ahead (spot) prices. In the real-time stage, uncertainties are realized (measured) and a trivial calculation reveals the real-time price. Through four instructive case-studies we highlight the effectiveness of the method to incentivize end-prosumers' participation in the market, while ensuring that their behavior does not have an adverse impact on the operation of the grid.
Human-in-the-Loop Bandwidth Estimation for Quality of Experience Optimization in Real-Time Video Communication AAAI
The quality of experience (QoE) delivered by video conferencing systems is significantly influenced by accurately estimating the time-varying available bandwidth between the sender and receiver. Bandwidth estimation for real-time communications remains an open challenge due to rapidly evolving network architectures, increasingly complex protocol stacks, and the difficulty of defining QoE metrics that reliably improve user experience. In this work, we propose a deployed, human-in-the-loop, data-driven framework for bandwidth estimation to address these challenges. Our approach begins with training objective QoE reward models derived from subjective user evaluations to measure audio and video quality in real-time video conferencing systems. Subsequently, we collect roughly $1$M network traces with objective QoE rewards from real-world Microsoft Teams calls to curate a bandwidth estimation training dataset. We then introduce a novel distributional offline reinforcement learning (RL) algorithm to train a neural-network-based bandwidth estimator aimed at improving QoE for users. Our real-world A/B test demonstrates that the proposed approach reduces the subjective poor call ratio by $11.41\%$ compared to the baseline bandwidth estimator. Furthermore, the proposed offline RL algorithm is benchmarked on D4RL tasks to demonstrate its generalization beyond bandwidth estimation.
comment: Accepted for publication in the proceedings of the AAAI Conference on Artificial Intelligence 2026 (IAAI Technical Track on Deployed Highly Innovative Applications of AI)
Sleepy Chauffeur Detection and Alert Techniques for Road Safety
The most startling of the contemporary problems is the sleepiness of chauffeur which causes lots of car accidents. Prevention of those impending accidents by detecting and alerting the sleepy chauffeur is vital, otherwise that would lead to loss of lives and various traumas along with severe injuries. The slumber or sleep may be caused by huge stress, pressure, relentless work load or alcoholism, for which sleep deprivation occurs and the chauffeur while driving gets drowsy. So far, considerable amount of systems has been developed to detect drowsiness of drivers, most of which mainly depend on image processing algorithms using cameras. Some of them also incorporate artificial intelligence and machine learning based algorithms. This paper presents a review of the existing systems and also proposes an easy and cheap system using sensors and Arduino, capable of detecting sleepiness and generates siren alarm and send alert message to take precautionary measures.
comment: 8 pages, 5 figures, International Journal on Recent Innovation in Microelectronics and Microcontrollers Applications Vol. 1, Issue 1 - 2018
Hybrid Terrain-Aware Path Planning: Integrating VD--RRT\(^{*}\) Exploration and VD--D\(^{*}\) Lite Repair
Autonomous ground vehicles operating off-road must plan curvature-feasible paths while accounting for spatially varying soil strength and slope hazards in real time. We present a continuous state--cost metric that combines a Bekker pressure--sinkage model with elevation-derived slope and attitude penalties. The resulting terrain cost field is analytic, bounded, and monotonic in soil modulus and slope, ensuring well-posed discretization and stable updates under sensor noise. This metric is evaluated on a lattice with exact steering primitives: Dubins and Reeds--Shepp motions for differential drive and time-parameterized bicycle arcs for Ackermann steering. Global exploration is performed using Vehicle-Dynamics RRT\(^{*}\), while local repair is managed by Vehicle-Dynamics D\(^{*}\) Lite, enabling millisecond-scale replanning without heuristic smoothing. By separating the terrain--vehicle model from the planner, the framework provides a reusable basis for deterministic, sampling-based, or learning-driven planning in deformable terrain. Hardware trials on an off-road platform demonstrate real-time navigation across soft soil and slope transitions, supporting reliable autonomy in unstructured environments.
Towards xApp Conflict Evaluation with Explainable Machine Learning and Causal Inference in O-RAN
The Open Radio Access Network (O-RAN) architecture enables a flexible, vendor-neutral deployment of 5G networks by disaggregating base station components and supporting third-party xApps for near real-time RAN control. However, the concurrent operation of multiple xApps can lead to conflicting control actions, which may cause network performance degradation. In this work, we propose a framework for xApp conflict management that combines explainable machine learning and causal inference to evaluate the causal relationships between RAN Control Parameters (RCPs) and Key Performance Indicators (KPIs). We use model explainability tools such as SHAP to identify RCPs that jointly affect the same KPI, signaling potential conflicts, and represent these interactions as a causal Directed Acyclic Graph (DAG). We then estimate the causal impact of each of these RCPs on their associated KPIs using metrics such as Average Treatment Effect (ATE) and Conditional Average Treatment Effect (CATE). This approach offers network operators guided insights into identifying conflicts and quantifying their impacts, enabling more informed and effective conflict resolution strategies across diverse xApp deployments.
Information Shapes Koopman Representation
The Koopman operator provides a powerful framework for modeling dynamical systems and has attracted growing interest from the machine learning community. However, its infinite-dimensional nature makes identifying suitable finite-dimensional subspaces challenging, especially for deep architectures. We argue that these difficulties come from suboptimal representation learning, where latent variables fail to balance expressivity and simplicity. This tension is closely related to the information bottleneck (IB) dilemma: constructing compressed representations that are both compact and predictive. Rethinking Koopman learning through this lens, we demonstrate that latent mutual information promotes simplicity, yet an overemphasis on simplicity may cause latent space to collapse onto a few dominant modes. In contrast, expressiveness is sustained by the von Neumann entropy, which prevents such collapse and encourages mode diversity. This insight leads us to propose an information-theoretic Lagrangian formulation that explicitly balances this tradeoff. Furthermore, we propose a new algorithm based on the Lagrangian formulation that encourages both simplicity and expressiveness, leading to a stable and interpretable Koopman representation. Beyond quantitative evaluations, we further visualize the learned manifolds under our representations, observing empirical results consistent with our theoretical predictions. Finally, we validate our approach across a diverse range of dynamical systems, demonstrating improved performance over existing Koopman learning methods. The implementation is publicly available at https://github.com/Wenxuan52/InformationKoopman.
Data to Certificate: Guaranteed Cost Control with Quantization-Aware System Identification
Cloud-assisted system identification and control have emerged as practical solutions for low-power, resource-constrained control systems such as micro-UAVs. In a typical cloud-assisted setting, state and input data are transmitted from local agents to a central computer over low-bandwidth wireless links, leading to quantization. This paper investigates the impact of state and input data quantization on a linear time invariant (LTI) system identification, derives a worst-case bound on the identification error, and develops a robust controller for guaranteed cost control. We establish a fundamental bound on the model error that depends only on the quantized data and quantization resolution, and develop a linear matrix inequality (LMI) based guaranteed cost robust controller under this error bound.
comment: 8 pages, 3 figures
Comparison of Forced and Unforced Rendezvous, Proximity Operations, and Docking Under Model Mismatch
This paper compares the required fuel usage for forced and unforced motion of a chaser satellite engaged in Rendezvous, Proximity Operations, and Docking (RPOD) maneuvers. Improved RPOD models are vital, particularly as the space industry expands and demands for improved fuel efficiency, cost effectiveness, and mission life span increase. This paper specifically examines the Clohessy- Wiltshire (CW) Equations and the extent of model mismatch by comparing pre- dicted trajectories from this model with a more computationally complex, higher fidelity RPOD model. This paper assesses several test cases of similar mission parameters, in each case comparing natural motion circumnavigation (NMC) with comparable forced motion circumnavigation. The Guidance, Navigation, and Con- trol (GNC) impulse maneuvers required to maintain the supposedly zero fuel CW trajectories is representative of the extent of CW model mismatch. This paper demonstrates that unforced motions are not inherently more fuel efficient than forced motions, thus permitting extended orbital operations given the higher fuel efficiency.
comment: 12 pages, 4 figures, AAS/AIAA Space Flight Mechanics
Identifying Best Candidates for Busbar Splitting
Rising electricity demand and the growing integration of renewables are intensifying congestion in transmission grids. Grid topology optimization through busbar splitting (BuS) and optimal transmission switching can alleviate grid congestion and reduce the generation costs in a power system. However, BuS optimization requires a large number of binary variables, and analyzing all the substations for potential new topological actions is computationally intractable, particularly in large grids. To tackle this issue, we propose a set of metrics to identify and rank promising candidates for BuS, focusing on finding buses where topology optimization can reduce generation costs. To assess the effect of BuS on the identified buses, we use a combined mixed-integer convex-quadratic BuS model to compute the optimal topology and test it with the non-linear non-convex AC optimal power flow (OPF) simulation to show its AC feasibility. By testing and validating the proposed metrics on test cases of different sizes, we show that they are able to identify busbars that reduce the total generation costs when their topology is optimized. Thus, the metrics enable effective selection of busbars for BuS, with no need to test every busbar in the grid, one at a time.
Competitive EV charging station location with queues
Electric vehicle (EV) public charging infrastructure planning faces significant challenges in competitive markets, where multiple service providers affect congestion and user behavior. This work extends existing modeling frameworks by incorporating the presence of competitors' stations and more realistic queueing systems. First, we analyze three finite queueing systems, M/M/1/K, M/M/s/K, and M/Er/s/K, with varying numbers of servers (charging outlets) and service time distributions, deriving analytic expressions for user behavior metrics. Second, we embed the queueing-based user behavior model into a bilevel program, where the upper level locates new charging stations to maximize accessibility (throughput), and the lower level captures users' station choices via a user equilibrium. Third, we apply a reformulation from competitive congested user-choice facility location models to approximately solve the bilevel problem and introduce a surrogate-based heuristic to enhance scalability. Fourth, we showcase our methodology on a real-world case study of an urban area in Montreal (Canada), offering managerial insights into how user-choice behavior assumptions and competition affect throughput and location decisions. The results demonstrate that our model yields (re)location strategies that outperform the existing network. More broadly, this approach provides a tool for incorporating charging service quality-through queueing metrics-and existing competition into station planning.
Model predictive control lowers barriers to adoption of heat-pump water heaters: A field study
Electric heat-pump water heaters (HPWHs) could reduce the energy costs, emissions, and power grid impacts associated with water heating, the second-largest energy use in United States housing. However, most HPWHs today require 240 V circuits to power the backup resistance heating elements they use to maintain comfort during large water draws. Installing a 240 V circuit can increase the up-front cost of a HPWH by half or more. This paper develops and field-tests the first control system that enables a 120 V HPWH to efficiently maintain comfort without resistance heating elements. The novel model predictive control (MPC) system enables pre-heating in anticipation of large water draws, which it forecasts using an ensemble of machine learning predictors. By shifting electrical load over time, MPC also reduces energy costs on average by 23% and 28% under time-of-use pricing and hourly pricing, respectively, relative to a 240 V HPWH with standard controls. Compared to the increasingly common practice in 120 V HPWHs of storing water at a constant, high temperature (60 {\deg}C) to ensure comfort, MPC saves 37% energy on average. In addition to demonstrating MPC's benefits in a real, occupied house, this paper discusses implementation challenges and costs. A simple payback analysis suggests that a 120 V HPWH, operated by the MPC system developed here, would be economically attractive in most installation scenarios.
Enhancing Profit and CO2 Mitigation: Commercial Direct Air Capture Design and Operation with Power Market Volatility
Current decarbonization efforts are falling short of meeting the net-zero greenhouse gas (GHG) emission target, highlighting the need for substantial carbon dioxide removal methods such as direct air capture (DAC). However, integrating DACs poses challenges due to their enormous power consumption. This study assesses the commercial operation of various DAC technologies that earn revenue using monetized carbon incentives while purchasing electricity from wholesale power markets. We model four commercial DAC technologies and examine their operation in three representative locations including California, Texas, and New York. Our findings reveal that commercial DAC operations can take financial advantage of the volatile power market to operate only during low-price periods strategically, offering a pathway to facilitate a cost-efficient decarbonization transition. The ambient operational environment such as temperature and relative humidity has non-trivial impact on abatement capacity. Profit-driven decisions introduce climate-economic trade-offs that might decrease the capacity factor of DAC and reduce total CO2 removal. These implications extend throughout the entire lifecycle of DAC developments and influence power systems and policies related to full-scale DAC implementation. Our study shows that DAC technologies with shorter cycle spans and higher flexibility can better exploit the electricity price volatility, while power markets demonstrate persistent low-price windows that often synergize with low grid emission periods, like during the solar "duck curve" in California. An optimal incentive design exists for profit-driven operations while carbon-tax policy in electricity pricing is counterproductive for DAC systems.
comment: 16 pages, 8 figure, Submitted and under review for Engineering
Non-Gaussian Distribution Steering in Nonlinear Dynamics with Conjugate Unscented Transformation
In highly nonlinear systems such as the ones commonly found in astrodynamics, Gaussian distributions generally evolve into non-Gaussian distributions. This paper introduces a method for effectively controlling non-Gaussian distributions in nonlinear environments using optimized linear feedback control. This paper utilizes Conjugate Unscented Transformation to quantify the higher-order statistical moments of non-Gaussian distributions. The formulation focuses on controlling and constraining the sigma points associated with the uncertainty quantification, which would thereby reflect the control of the entire distribution and constraints on the moments themselves. This paper develops an algorithm to solve this problem with sequential convex programming, and it is demonstrated through a two-body and three-body example. The examples show that individual moments can be directly controlled, and the moments are accurately approximated for non-Gaussian distributions throughout the controller's time horizon in nonlinear dynamics.
Gaussian Process Implicit Surfaces as Control Barrier Functions for Safe Robot Navigation
Level set methods underpin modern safety techniques such as control barrier functions (CBFs), while also serving as implicit surface representations for geometric shapes via distance fields. Inspired by these two paradigms, we propose a unified framework where the implicit surface itself acts as a CBF. We leverage Gaussian process (GP) implicit surface (GPIS) to represent the safety boundaries, using safety samples which are derived from sensor measurements to condition the GP. The GP posterior mean defines the implicit safety surface (safety belief), while the posterior variance provides a robust safety margin. Although GPs have favorable properties such as uncertainty estimation and analytical tractability, they scale cubically with data. To alleviate this issue, we develop a sparse solution called sparse Gaussian CBFs. To the best of our knowledge, GPIS have not been explicitly used to synthesize CBFs. We validate the approach on collision avoidance tasks in two settings: a simulated 7-DOF manipulator operating around the Stanford bunny, and a quadrotor navigating in 3D around a physical chair. In both cases, Gaussian CBFs (with and without sparsity) enable safe interaction and collision-free execution of trajectories that would otherwise intersect the objects.
comment: 8 pages, 7 figures, under review
A Wideband Composite Sequence Impedance Model for Evaluation of Interactions in Unbalanced Power-Electronic-Based Power Systems
This paper proposes a wideband composite sequence impedance model (WCSIM)-based analysis method to evaluate the interactions in power-electronic-based power systems subjected to unbalanced grid faults or with unbalanced loads. The WCSIM-based method intuitively assesses the impact of the small-signal interconnection among the positive-, negative-, and zero-sequence circuits on the interaction stability of unbalanced power systems. The effectiveness of this method is demonstrated using a permanent magnet synchronous generator-based weak grid system under a single-line-to-ground fault (SLGF). Frequency scanning results and controller hardware-in-loop tests validate both the correctness of the WCSIM and the effectiveness of the WCSIM-based analysis method.
comment: This work will be submitted to the IEEE for possible publication
ExaModelsPower.jl: A GPU-Compatible Modeling Library for Nonlinear Power System Optimization
As GPU-accelerated mathematical programming techniques mature, there is growing interest in utilizing them to address the computational challenges of power system optimization. This paper introduces ExaModelsPower.jl, an open-source modeling library for creating GPU-compatible nonlinear AC optimal power flow models. Built on ExaModels.jl, ExaModelsPower.jl provides a high-level interface that automatically generates all necessary callback functions for GPU solvers. The library is designed for large-scale problem instances, which may include multiple time periods and security constraints. Using ExaModelsPower.jl, we benchmark GPU and CPU solvers on open-source test cases. Our results show that GPU solvers can deliver up to two orders of magnitude speedups compared to alternative tools on CPU for problems with more than 20,000 variables and a solution precision of up to $10^{-4}$, while performance for smaller instances or tighter tolerances may vary.
Optimization of High-Order Quarter-Wave Plate for Birefringence Suppression in FOCS
Fiber optic current sensors (FOCS) are widely adopted in modern power grids due to high sensitivity, excellent insulation, and strong immunity to electromagnetic interference. This prominence necessitates precise investigation into their error sources and corresponding optimization. This study examines reflective FOCS based on the Faraday effect. A theoretical model is established to simulate phase error caused by linear birefringence from the quarter-wave plate. Conventional methods using circular birefringence are analyzed, revealing inherent limitations. Innovatively, a compensation strategy employing high-order quarter-wave plates is proposed to effectively eliminate linear birefringence effects. This approach significantly enhances the accuracy and practicality of FOCS in precision metrology.
The Algorithmic Regulator
The regulator theorem states that, under certain conditions, any optimal controller must embody a model of the system it regulates, grounding the idea that controllers embed, explicitly or implicitly, internal models of the controlled. This principle underpins neuroscience and predictive brain theories like the Free-Energy Principle or Kolmogorov/Algorithmic Agent theory. However, the theorem is only proven in limited settings. Here, we treat the deterministic, closed, coupled world-regulator system $(W,R)$ as a single self-delimiting program $p$ via a constant-size wrapper that produces the world output string~$x$ fed to the regulator. We analyze regulation from the viewpoint of the algorithmic complexity of the output, $K(x)$. We define $R$ to be a \emph{good algorithmic regulator} if it \emph{reduces} the algorithmic complexity of the readout relative to a null (unregulated) baseline $\varnothing$, i.e., \[ \Delta = K\big(O_{W,\varnothing}\big) - K\big(O_{W,R}\big) > 0. \] We then prove that the larger $\Delta$ is, the more world-regulator pairs with high mutual algorithmic information are favored. More precisely, a complexity gap $\Delta > 0$ yields \[ \Pr\big((W,R)\mid x\big) \le C\,2^{\,M(W{:}R)}\,2^{-\Delta}, \] making low $M(W{:}R)$ exponentially unlikely as $\Delta$ grows. This is an AIT version of the idea that ``the regulator contains a model of the world.'' The framework is distribution-free, applies to individual sequences, and complements the Internal Model Principle. Beyond this necessity claim, the same coding-theorem calculus singles out a \emph{canonical scalar objective} and implicates a \emph{planner}. On the realized episode, a regulator behaves \emph{as if} it minimized the conditional description length of the readout.
comment: 2 Figures
Product-oriented Product-Process-Resource Asset Network and its Representation in AutomationML for Asset Administration Shell
Current products, especially in the automotive sector, pose complex technical systems having a multi-disciplinary mechatronic nature. Industrial standards supporting system engineering and production typically (i) address the production phase only, but do not cover the complete product life cycle, and (ii) focus on production processes and resources rather than the products themselves. The presented approach is motivated by incorporating the impacts of the end-of-life phase of the product life cycle into the engineering phase. This paper proposes a modeling approach coming up from the Product-Process-Resource (PPR) modeling paradigm. It combines requirements on (i) respecting the product structure as a basis for the model, and (ii) incorporates repairing, remanufacturing, or upcycling within cyber-physical production systems. The proposed model called PoPAN should accompany the product during the entire life cycle as a digital shadow encapsulated within the Asset Administration Shell of a product. To facilitate the adoption of the proposed paradigm, the paper also proposes serialization of the model in the AutomationML data format. The model is demonstrated on a use-case for disassembling electric vehicle batteries to support their remanufacturing for stationary battery applications.
comment: \copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Integrative, Scalable Modeling of Hydrological Systems with MBSE and HFGT
Worsening global challenges in the Anthropocene demand complex, adaptive solutions grounded in a systems-level understanding of coupled social and environmental dynamics. However, existing modeling approaches often fall short due to disciplinary silos, limited scalability, and the absence of shared ontological frameworks. Model-Based Systems Engineering (MBSE), when integrated with Hetero-functional Graph Theory (HFGT), offers a powerful methodology for modeling systems of systems while preserving subsystem heterogeneity and enabling cross-disciplinary integration. This paper presents the first application of the MBSE-HFGT methodology to environmental systems, using a series of worked examples involving flow through lake and land segments. These examples demonstrate how the approach enables consistent, scalable, and integrative modeling of complex environmental processes.
A Cyber Insurance Policy for Hedging Against Load-Altering Attacks and Extreme Load Variations in Distribution Grids
Uncertainties in renewable energy resources (RES) and load variations can lead to elevated system operational costs. Moreover, the emergence of large-scale distributed threats, such as load-altering attacks (LAAs), can induce substantial load variations, further exacerbating these costs. Although traditional defense measures can reduce the likelihood of such attacks, considerable residual risks remain. Thus, this paper proposes a cyber insurance framework designed to hedge against additional operational costs resulting from LAAs and substantial load variations in renewable-rich grids. The insurance framework determines both the insurance coverage and premium based on the Value at Risk (VaR) and Tail Value at Risk (TVaR). These risk metrics are calculated using the system failure probability and the probability density function (PDF) of the system operation cost. The system failure probability is assessed through a semi-Markov process (SMP), while the cost distribution is estimated through a cost minimization model of a distribution grid combined with a Monte-Carlo simulation to capture load variability. Furthermore, we employ a bi-level optimization scheme that identifies the specific load distribution leading to the maximum system cost, thereby enhancing the accuracy of the operation cost PDF estimation. The effectiveness and scalability of the proposed cyber insurance policy are evaluated considering a modified IEEE-118 test bus system and the IEEE European low-voltage (LV) test feeders model. The case study shows that with a relatively low premium, the network operator can hedge against additional operational costs caused by malicious load manipulations.
RIS-Assisted Millimeter Wave Communications for Indoor Scenarios: Modeling and Coverage Analysis
Millimeter wave (mmWave) communications and reconfigurable intelligent surfaces (RIS) are two critical technologies for next-generation networks, especially in dense indoor environments. However, existing analyses often oversimplify the indoor environment by neglecting some of the key characteristics, such as height variations, boundary effects, blockage effects, and user spatial distributions. In this paper, we develop an improved stochastic geometry-based model for RIS-assisted mmWave communications in indoor scenarios like conference centers, hospitals, and shopping malls. The proposed model incorporates the height factor for all the nodes in the network (e.g., transmitters, users, RISs, and obstacles) and captures the user clustering behavior in these scenarios. In addition, the boundary effect is also being considered for line-of-sight (LOS) probability calculation. Analytical expressions for distance distributions, LOS probabilities, and the coverage probability (CP) are derived. The CP is then validated through Monte Carlo simulations. Our results reveal deployment insights by approximating and simplifying the derived CP expressions, showing how transmitter density, obstacle density, RIS density, and user cluster radius impact network coverage. Notably, we show that RISs significantly improve coverage when transmitters or transmit power are limited but offer marginal benefits when transmitter density is high. These findings provide practical guidelines for the design and deployment of RIS-assisted indoor mmWave networks.
Eco-driving Incentive Mechanisms for Mitigating Emissions in Urban Transportation
This paper develops incentive mechanisms for promoting eco-driving with the overarching goal of minimizing emissions in transportation networks. The system operator provides drivers with energy-efficient driving guidance throughout their trips and measures compliance through vehicle telematics that capture how closely drivers follow this guidance. Drivers optimize their behaviors based on personal trade-offs between travel times and emissions. To design effective incentives, the operator elicits driver preferences regarding trip urgency and willingness to eco-drive, while determining optimal budget allocations and eco-driving recommendations. Two distinct settings based on driver behavior are analyzed. When drivers report their preferences truthfully, an incentive mechanism ensuring obedience (drivers find it optimal to follow recommendations) is designed by implementing eco-driving recommendations as a Nash equilibrium. When drivers may report strategically, the mechanism is extended to be both obedient and truthful (drivers find it optimal to report truthfully). Unlike existing works that focus on congestion or routing decisions in transportation networks, our framework explicitly targets emissions reduction by incentivizing drivers. The proposed mechanism addresses both strategic behavior and network effects arising from driver interactions, without requiring the operator to reveal system parameters to the drivers. Numerical simulations demonstrate the effects of budget constraints, driver types, and strategic misreporting on equilibrium outcomes and emissions reduction.
comment: 12 pages, 6 figures
Globally Stable Discrete Time PID Passivity-based Control of Power Converters: Simulation and Experimental Results
The key idea behind PID Passivity-based Control (PID-PBC) is to leverage the passivity property of PIDs (for all positive gains) and wrap the PID controller around a passive output to ensure global stability in closed-loop. However, the practical applicability of PID-PBC is stymied by two key facts: (i) the vast majority of practical implementations of PIDs is carried-out in discrete time -- discretizing the continuous time dynamical system of the PID; (ii) the well-known problem that passivity is not preserved upon discretization, even with small sampling times. Therefore, two aspects of the PID-PBC must be revisited for its safe practical application. First, we propose a discretization of the PID that ensures its passivity. Second, since the output that is identified as passive for the continuous time system is not necessarily passive for its discrete time version, we construct a new output that ensures the passivity property for the discretization of the system. In this paper, we provide a constructive answer to both issues for the case of power converter models. Instrumental to achieve this objective is the use of the implicit midpoint discretization method -- which is a symplectic integration technique that preserves system invariants. Since the reference value for the output to be regulated in power converters is non-zero, we are henceforth interested in the property of passivity of the incremental model -- currently known as shifted passivity. Therefore, we demonstrate that the resulting discrete-time PID-PBC defines a passive map for the incremental model and establish shifted passivity for the discretized power converter model. Combining these properties, we prove global stability for the feedback interconnection of the power converter with the discretized PID-PBC. The paper also presents simulations and experiments that demonstrate the performance of the proposed discretization.
Padé Approximant Neural Networks for Enhanced Electric Motor Fault Diagnosis Using Vibration and Acoustic Data
Purpose: The primary aim of this study is to enhance fault diagnosis in induction machines by leveraging the Pad\'e Approximant Neuron (PAON) model. While accelerometers and microphones are standard in motor condition monitoring, deep learning models with nonlinear neuron architectures offer promising improvements in diagnostic performance. This research investigates whether Pad\'e Approximant Neural Networks (Pad\'eNets) can outperform conventional Convolutional Neural Networks (CNNs) and Self-Organized Operational Neural Networks (Self-ONNs) in the diagnosis of electrical and mechanical faults from vibration and acoustic data. Methods: We evaluate and compare the diagnostic capabilities of three deep learning architectures: one-dimensional CNNs, Self-ONNs, and Pad\'eNets. These models are tested on the University of Ottawa's publicly available constant-speed induction motor datasets, which include both vibration and acoustic sensor data. The Pad\'eNet model is designed to introduce enhanced nonlinearity and is compatible with unbounded activation functions such as LeakyReLU. Results and Conclusion: Pad\'eNets consistently outperformed the baseline models, achieving diagnostic accuracies of 99.96%, 98.26%, 97.61%, and 98.33% for accelerometers 1, 2, 3, and the acoustic sensor, respectively. The enhanced nonlinearity of Pad\'eNets, together with their compatibility with unbounded activation functions, significantly improves fault diagnosis performance in induction motor condition monitoring.
comment: This version is the author's accepted manuscript. It has been peer-reviewed and accepted for publication in Journal of Vibration Engineering & Technologies. The final published version is available at https://doi.org/10.1007/s42417-025-02129-5
Dissipativity-Based Distributed Control and Communication Topology Co-Design for DC Microgrids with ZIP Loads
This paper presents a novel dissipativity-based distributed droop-free control and communication topology co-design approach for voltage regulation and current sharing in DC microgrids (DC MGs) with generic ``ZIP'' (constant impedance (Z), current (I) and power (P)) loads. While ZIP loads accurately capture the varied nature of the consumer loads, its constant power load (CPL) component is particularly challenging (and destabilizing) due to its non-linear form. Moreover, ensuring simultaneous voltage regulation and current sharing and co-designing controllers and topology are also challenging when designing control solutions for DC MGs. To address these three challenges, we model the DC MG as a networked system comprised of distributed generators (DGs), ZIP loads, and lines interconnected according to a static interconnection matrix. Next, we equip each DG with a local controller and a distributed global controller (over an arbitrary topology) to derive the error dynamic model of the DC MG as a networked ``error'' system, including disturbance inputs and performance outputs. Subsequently, to co-design the controllers and the topology ensuring robust (dissipative) voltage regulation and current sharing performance, we use the dissipativity and sector boundedness properties of the involved subsystems and formulate Linear Matrix Inequality (LMI) problems to be solved locally and globally. To support the feasibility of the global LMI problem, we identify and embed several crucial necessary conditions in the corresponding local LMI problems, thus providing a one-shot approach to solve the LMI problems. Overall, the proposed approach in this paper provides a unified framework for designing DC MGs. The effectiveness of the proposed solution was verified by simulating an islanded DC MG under different scenarios, demonstrating superior performance compared to traditional control approaches.
comment: arXiv admin note: substantial text overlap with arXiv:2503.04908
Dissipativity-Based Distributed Control and Communication Topology Co-Design for Voltage Regulation and Current Sharing in DC Microgrids
This paper presents a novel dissipativity-based distributed droop-free control approach for voltage regulation and current sharing in DC microgrids (MGs) comprised of an interconnected set of distributed generators (DGs), loads, and power lines. First, we describe the closed-loop DC MG as a networked system where the DGs and lines (i.e., subsystems) are interconnected via a static interconnection matrix. This interconnection matrix demonstrates how the inputs, outputs, and disturbances of DGs and lines are connected in a DC MG. Each DG is equipped with a local controller for voltage regulation and a distributed global controller for current sharing, where the local controllers ensure individual voltage tracking while the global controllers coordinate among DGs to achieve proportional current sharing. To design the distributed global controllers, we use the dissipativity properties of the subsystems and formulate a linear matrix inequality (LMI) problem. To support the feasibility of this problem, we identify a set of necessary local and global conditions to enforce in a specifically developed LMI-based local controller design process. In contrast to existing DC MG control solutions, our approach proposes a unified framework for co-designing the distributed controller and communication topology. As the co-design process is LMI-based, it can be efficiently implemented and evaluated using existing convex optimization tools. The effectiveness of the proposed solution is verified by simulating an islanded DC MG in a MATLAB/Simulink environment under different scenarios, such as load changes and topological constraint changes, and then comparing the performance with the droop control algorithm.
The Untapped Potential of Smart Charging: How EV Owners Can Save Money and Reduce Emissions Without Behavioral Change
The transportation sector is the single largest contributor to US emissions and the second largest globally. Electric vehicles (EVs) are expected to represent half of global car sales by 2035, emerging as a pivotal solution to reduce emissions and enhance grid flexibility. The electrification of buildings, manufacturing, and transportation is expected to grow electricity demand substantially over the next decade. Without effectively managed EV charging, EVs could strain energy grid infrastructure and increase electricity costs. Drawing on de-identified 2023 EV telematics data from Rivian Automotive, this study found that 72% of home charging commenced after the customer plugged in their vehicle regardless of utility time of use (TOU) tariffs or managed charging programs. In fewer than 26% of charging sessions in the sample, EV owners actively scheduled charging times to align or participate in utility tariffs or programs. With a majority of drivers concurrently plugged in during optimal charging periods yet not actively charging, the study identified an opportunity to reduce individual EV owner costs and carbon emissions through smarter charging habits without significant behavioral modifications or sacrifice in user preferences. By optimizing home charging schedules within existing plug-in and plug-out windows, the study suggests that EV owners can save an average of $140 annually and reduce the associated carbon emissions of charging their EV by as much as 28%.
A Neural Network-based Multi-timestep Command Governor for Nonlinear Systems with Constraints
The multi-timestep command governor (MCG) is an add-on algorithm that enforces constraints by modifying, at each timestep, the reference command to a pre-stabilized control system. The MCG can be interpreted as a Model-Predictive Control scheme operating on the reference command. The implementation of MCG on nonlinear systems carries a heavy computational burden as it requires solving a nonlinear program with multiple decision variables at each timestep. This paper proposes a less computationally demanding alternative, based on approximating the MCG control law using a neural network (NN) trained on offline data. However, since the NN output may not always be constraint-admissible due to training errors, its output is adjusted using a sensitivity-based method. We thus refer to the resulting control strategy as the neural network-based MCG (NN-MCG). As validation, the proposed controller is applied as a load governor for constraint management in an automotive fuel cell system. It is shown that the proposed strategy is significantly more computationally efficient than the traditional MCG, while achieving nearly identical performance if the NN is well-trained.
comment: Accepted for publication in the 2025 IEEE Conference on Control Technology and Applications (CCTA)
Robotics
Phys2Real: Fusing VLM Priors with Interactive Online Adaptation for Uncertainty-Aware Sim-to-Real Manipulation
Learning robotic manipulation policies directly in the real world can be expensive and time-consuming. While reinforcement learning (RL) policies trained in simulation present a scalable alternative, effective sim-to-real transfer remains challenging, particularly for tasks that require precise dynamics. To address this, we propose Phys2Real, a real-to-sim-to-real RL pipeline that combines vision-language model (VLM)-inferred physical parameter estimates with interactive adaptation through uncertainty-aware fusion. Our approach consists of three core components: (1) high-fidelity geometric reconstruction with 3D Gaussian splatting, (2) VLM-inferred prior distributions over physical parameters, and (3) online physical parameter estimation from interaction data. Phys2Real conditions policies on interpretable physical parameters, refining VLM predictions with online estimates via ensemble-based uncertainty quantification. On planar pushing tasks of a T-block with varying center of mass (CoM) and a hammer with an off-center mass distribution, Phys2Real achieves substantial improvements over a domain randomization baseline: 100% vs 79% success rate for the bottom-weighted T-block, 57% vs 23% in the challenging top-weighted T-block, and 15% faster average task completion for hammer pushing. Ablation studies indicate that the combination of VLM and interaction information is essential for success. Project website: https://phys2real.github.io/ .
Ego-Vision World Model for Humanoid Contact Planning
Enabling humanoid robots to exploit physical contact, rather than simply avoid collisions, is crucial for autonomy in unstructured environments. Traditional optimization-based planners struggle with contact complexity, while on-policy reinforcement learning (RL) is sample-inefficient and has limited multi-task ability. We propose a framework combining a learned world model with sampling-based Model Predictive Control (MPC), trained on a demonstration-free offline dataset to predict future outcomes in a compressed latent space. To address sparse contact rewards and sensor noise, the MPC uses a learned surrogate value function for dense, robust planning. Our single, scalable model supports contact-aware tasks, including wall support after perturbation, blocking incoming objects, and traversing height-limited arches, with improved data efficiency and multi-task capability over on-policy RL. Deployed on a physical humanoid, our system achieves robust, real-time contact planning from proprioception and ego-centric depth images. Website: https://ego-vcp.github.io/
ManiAgent: An Agentic Framework for General Robotic Manipulation
While Vision-Language-Action (VLA) models have demonstrated impressive capabilities in robotic manipulation, their performance in complex reasoning and long-horizon task planning is limited by data scarcity and model capacity. To address this, we introduce ManiAgent, an agentic architecture for general manipulation tasks that achieves end-to-end output from task descriptions and environmental inputs to robotic manipulation actions. In this framework, multiple agents involve inter-agent communication to perform environmental perception, sub-task decomposition and action generation, enabling efficient handling of complex manipulation scenarios. Evaluations show ManiAgent achieves an 86.8% success rate on the SimplerEnv benchmark and 95.8% on real-world pick-and-place tasks, enabling efficient data collection that yields VLA models with performance comparable to those trained on human-annotated datasets.The project webpage is available at https://yi-yang929.github.io/ManiAgent/.
comment: 8 pages, 6 figures, conference
Smooth Spatiotemporal Tube Synthesis for Prescribed-Time Reach-Avoid-Stay Control
In this work, we address the issue of controller synthesis for a control-affine nonlinear system to meet prescribed time reach-avoid-stay specifications. Our goal is to improve upon previous methods based on spatiotemporal tubes (STTs) by eliminating the need for circumvent functions, which often lead to abrupt tube modifications and high control effort. We propose an adaptive framework that constructs smooth STTs around static unsafe sets, enabling continuous avoidance while guiding the system toward the target within the prescribed time. A closed-form, approximation-free control law is derived to ensure the system trajectory remains within the tube and satisfies the RAS task. The effectiveness of the proposed approach is demonstrated through a case study, showing a significant reduction in control effort compared to prior methods.
Calibrated Dynamic Modeling for Force and Payload Estimation in Hydraulic Machinery
Accurate real-time estimation of end effector interaction forces in hydraulic excavators is a key enabler for advanced automation in heavy machinery. Accurate knowledge of these forces allows improved, precise grading and digging maneuvers. To address these challenges, we introduce a high-accuracy, retrofittable 2D force- and payload estimation algorithm that does not impose additional requirements on the operator regarding trajectory, acceleration or the use of the slew joint. The approach is designed for retrofittability, requires minimal calibration and no prior knowledge of machine-specific dynamic characteristics. Specifically, we propose a method for identifying a dynamic model, necessary to estimate both end effector interaction forces and bucket payload during normal operation. Our optimization-based payload estimation achieves a full-scale payload accuracy of 1%. On a standard 25 t excavator, the online force measurement from pressure and inertial measurements achieves a direction accuracy of 13 degree and a magnitude accuracy of 383 N. The method's accuracy and generalization capability are validated on two excavator platforms of different type and weight classes. We benchmark our payload estimation against a classical quasistatic method and a commercially available system. Our system outperforms both in accuracy and precision.
SCOOP'D: Learning Mixed-Liquid-Solid Scooping via Sim2Real Generative Policy
Scooping items with tools such as spoons and ladles is common in daily life, ranging from assistive feeding to retrieving items from environmental disaster sites. However, developing a general and autonomous robotic scooping policy is challenging since it requires reasoning about complex tool-object interactions. Furthermore, scooping often involves manipulating deformable objects, such as granular media or liquids, which is challenging due to their infinite-dimensional configuration spaces and complex dynamics. We propose a method, SCOOP'D, which uses simulation from OmniGibson (built on NVIDIA Omniverse) to collect scooping demonstrations using algorithmic procedures that rely on privileged state information. Then, we use generative policies via diffusion to imitate demonstrations from observational input. We directly apply the learned policy in diverse real-world scenarios, testing its performance on various item quantities, item characteristics, and container types. In zero-shot deployment, our method demonstrates promising results across 465 trials in diverse scenarios, including objects of different difficulty levels that we categorize as "Level 1" and "Level 2." SCOOP'D outperforms all baselines and ablations, suggesting that this is a promising approach to acquiring robotic scooping skills. Project page is at https://scoopdiff.github.io/.
comment: Project page is at https://scoopdiff.github.io/
Robot Soccer Kit: Omniwheel Tracked Soccer Robots for Education
Recent developments of low cost off-the-shelf programmable components, their modularity, and also rapid prototyping made educational robotics flourish, as it is accessible in most schools today. They allow to illustrate and embody theoretical problems in practical and tangible applications, and gather multidisciplinary skills. They also give a rich natural context for project-oriented pedagogy. However, most current robot kits all are limited to egocentric aspect of the robots perception. This makes it difficult to access more high-level problems involving e.g. coordinates or navigation. In this paper we introduce an educational holonomous robot kit that comes with an external tracking system, which lightens the constraint on embedded systems, but allows in the same time to discover high-level aspects of robotics, otherwise unreachable.
NaviGait: Navigating Dynamically Feasible Gait Libraries using Deep Reinforcement Learning
Reinforcement learning (RL) has emerged as a powerful method to learn robust control policies for bipedal locomotion. Yet, it can be difficult to tune desired robot behaviors due to unintuitive and complex reward design. In comparison, offline trajectory optimization methods, like Hybrid Zero Dynamics, offer more tuneable, interpretable, and mathematically grounded motion plans for high-dimensional legged systems. However, these methods often remain brittle to real-world disturbances like external perturbations. In this work, we present NaviGait, a hierarchical framework that combines the structure of trajectory optimization with the adaptability of RL for robust and intuitive locomotion control. NaviGait leverages a library of offline-optimized gaits and smoothly interpolates between them to produce continuous reference motions in response to high-level commands. The policy provides both joint-level and velocity command residual corrections to modulate and stabilize the reference trajectories in the gait library. One notable advantage of NaviGait is that it dramatically simplifies reward design by encoding rich motion priors from trajectory optimization, reducing the need for finely tuned shaping terms and enabling more stable and interpretable learning. Our experimental results demonstrate that NaviGait enables faster training compared to conventional and imitation-based RL, and produces motions that remain closest to the original reference. Overall, by decoupling high-level motion generation from low-level correction, NaviGait offers a more scalable and generalizable approach for achieving dynamic and robust locomotion.
Simultaneous Calibration of Noise Covariance and Kinematics for State Estimation of Legged Robots via Bi-level Optimization
Accurate state estimation is critical for legged and aerial robots operating in dynamic, uncertain environments. A key challenge lies in specifying process and measurement noise covariances, which are typically unknown or manually tuned. In this work, we introduce a bi-level optimization framework that jointly calibrates covariance matrices and kinematic parameters in an estimator-in-the-loop manner. The upper level treats noise covariances and model parameters as optimization variables, while the lower level executes a full-information estimator. Differentiating through the estimator allows direct optimization of trajectory-level objectives, resulting in accurate and consistent state estimates. We validate our approach on quadrupedal and humanoid robots, demonstrating significantly improved estimation accuracy and uncertainty calibration compared to hand-tuned baselines. Our method unifies state estimation, sensor, and kinematics calibration into a principled, data-driven framework applicable across diverse robotic platforms.
IntersectioNDE: Learning Complex Urban Traffic Dynamics based on Interaction Decoupling Strategy SC 2025
Realistic traffic simulation is critical for ensuring the safety and reliability of autonomous vehicles (AVs), especially in complex and diverse urban traffic environments. However, existing data-driven simulators face two key challenges: a limited focus on modeling dense, heterogeneous interactions at urban intersections - which are prevalent, crucial, and practically significant in countries like China, featuring diverse agents including motorized vehicles (MVs), non-motorized vehicles (NMVs), and pedestrians - and the inherent difficulty in robustly learning high-dimensional joint distributions for such high-density scenes, often leading to mode collapse and long-term simulation instability. We introduce City Crossings Dataset (CiCross), a large-scale dataset collected from a real-world urban intersection, uniquely capturing dense, heterogeneous multi-agent interactions, particularly with a substantial proportion of MVs, NMVs and pedestrians. Based on this dataset, we propose IntersectioNDE (Intersection Naturalistic Driving Environment), a data-driven simulator tailored for complex urban intersection scenarios. Its core component is the Interaction Decoupling Strategy (IDS), a training paradigm that learns compositional dynamics from agent subsets, enabling the marginal-to-joint simulation. Integrated into a scene-aware Transformer network with specialized training techniques, IDS significantly enhances simulation robustness and long-term stability for modeling heterogeneous interactions. Experiments on CiCross show that IntersectioNDE outperforms baseline methods in simulation fidelity, stability, and its ability to replicate complex, distribution-level urban traffic dynamics.
comment: Accepted by ITSC 2025
DQ-NMPC: Dual-Quaternion NMPC for Quadrotor Flight
MAVs have great potential to assist humans in complex tasks, with applications ranging from logistics to emergency response. Their agility makes them ideal for operations in complex and dynamic environments. However, achieving precise control in agile flights remains a significant challenge, particularly due to the underactuated nature of quadrotors and the strong coupling between their translational and rotational dynamics. In this work, we propose a novel NMPC framework based on dual-quaternions (DQ-NMPC) for quadrotor flight. By representing both quadrotor dynamics and the pose error directly on the dual-quaternion manifold, our approach enables a compact and globally non-singular formulation that captures the quadrotor coupled dynamics. We validate our approach through simulations and real-world experiments, demonstrating better numerical conditioning and significantly improved tracking performance, with reductions in position and orientation errors of up to 56.11% and 56.77%, compared to a conventional baseline NMPC method. Furthermore, our controller successfully handles aggressive trajectories, reaching maximum speeds up to 13.66 m/s and accelerations reaching 4.2 g within confined space conditions of dimensions 11m x 4.5m x 3.65m under which the baseline controller fails.
comment: Accepted to IEEE Robotics and Automation Letters
Context-Aware Model-Based Reinforcement Learning for Autonomous Racing
Autonomous vehicles have shown promising potential to be a groundbreaking technology for improving the safety of road users. For these vehicles, as well as many other safety-critical robotic technologies, to be deployed in real-world applications, we require algorithms that can generalize well to unseen scenarios and data. Model-based reinforcement learning algorithms (MBRL) have demonstrated state-of-the-art performance and data efficiency across a diverse set of domains. However, these algorithms have also shown susceptibility to changes in the environment and its transition dynamics. In this work, we explore the performance and generalization capabilities of MBRL algorithms for autonomous driving, specifically in the simulated autonomous racing environment, Roboracer (formerly F1Tenth). We frame the head-to-head racing task as a learning problem using contextual Markov decision processes and parameterize the driving behavior of the adversaries using the context of the episode, thereby also parameterizing the transition and reward dynamics. We benchmark the behavior of MBRL algorithms in this environment and propose a novel context-aware extension of the existing literature, cMask. We demonstrate that context-aware MBRL algorithms generalize better to out-of-distribution adversary behaviors relative to context-free approaches. We also demonstrate that cMask displays strong generalization capabilities, as well as further performance improvement relative to other context-aware MBRL approaches when racing against adversaries with in-distribution behaviors.
comment: Accepted to IEEE ICAR 2025
Constraint-Aware Reinforcement Learning via Adaptive Action Scaling
Safe reinforcement learning (RL) seeks to mitigate unsafe behaviors that arise from exploration during training by reducing constraint violations while maintaining task performance. Existing approaches typically rely on a single policy to jointly optimize reward and safety, which can cause instability due to conflicting objectives, or they use external safety filters that override actions and require prior system knowledge. In this paper, we propose a modular cost-aware regulator that scales the agent's actions based on predicted constraint violations, preserving exploration through smooth action modulation rather than overriding the policy. The regulator is trained to minimize constraint violations while avoiding degenerate suppression of actions. Our approach integrates seamlessly with off-policy RL methods such as SAC and TD3, and achieves state-of-the-art return-to-cost ratios on Safety Gym locomotion tasks with sparse costs, reducing constraint violations by up to 126 times while increasing returns by over an order of magnitude compared to prior methods.
Coordinated Strategies in Realistic Air Combat by Hierarchical Multi-Agent Reinforcement Learning
Achieving mission objectives in a realistic simulation of aerial combat is highly challenging due to imperfect situational awareness and nonlinear flight dynamics. In this work, we introduce a novel 3D multi-agent air combat environment and a Hierarchical Multi-Agent Reinforcement Learning framework to tackle these challenges. Our approach combines heterogeneous agent dynamics, curriculum learning, league-play, and a newly adapted training algorithm. To this end, the decision-making process is organized into two abstraction levels: low-level policies learn precise control maneuvers, while high-level policies issue tactical commands based on mission objectives. Empirical results show that our hierarchical approach improves both learning efficiency and combat performance in complex dogfight scenarios.
comment: 2025 IEEE International Conference on Agentic AI (ICA)
A Faster and More Reliable Middleware for Autonomous Driving Systems
Ensuring safety in high-speed autonomous vehicles requires rapid control loops and tightly bounded delays from perception to actuation. Many open-source autonomy systems rely on ROS 2 middleware; when multiple sensor and control nodes share one compute unit, ROS 2 and its DDS transports add significant (de)serialization, copying, and discovery overheads, shrinking the available time budget. We present Sensor-in-Memory (SIM), a shared-memory transport designed for intra-host pipelines in autonomous vehicles. SIM keeps sensor data in native memory layouts (e.g., cv::Mat, PCL), uses lock-free bounded double buffers that overwrite old data to prioritize freshness, and integrates into ROS 2 nodes with four lines of code. Unlike traditional middleware, SIM operates beside ROS 2 and is optimized for applications where data freshness and minimal latency outweigh guaranteed completeness. SIM provides sequence numbers, a writer heartbeat, and optional checksums to ensure ordering, liveness, and basic integrity. On an NVIDIA Jetson Orin Nano, SIM reduces data-transport latency by up to 98% compared to ROS 2 zero-copy transports such as FastRTPS and Zenoh, lowers mean latency by about 95%, and narrows 95th/99th-percentile tail latencies by around 96%. In tests on a production-ready Level 4 vehicle running Autoware.Universe, SIM increased localization frequency from 7.5 Hz to 9.5 Hz. Applied across all latency-critical modules, SIM cut average perception-to-decision latency from 521.91 ms to 290.26 ms, reducing emergency braking distance at 40 mph (64 km/h) on dry concrete by 13.6 ft (4.14 m).
comment: 8 pages,7 figures, 8 tables
A Modular AIoT Framework for Low-Latency Real-Time Robotic Teleoperation in Smart Cities
This paper presents an AI-driven IoT robotic teleoperation system designed for real-time remote manipulation and intelligent visual monitoring, tailored for smart city applications. The architecture integrates a Flutter-based cross-platform mobile interface with MQTT-based control signaling and WebRTC video streaming via the LiveKit framework. A YOLOv11-nano model is deployed for lightweight object detection, enabling real-time perception with annotated visual overlays delivered to the user interface. Control commands are transmitted via MQTT to an ESP8266-based actuator node, which coordinates multi-axis robotic arm motion through an Arduino Mega2560 controller. The backend infrastructure is hosted on DigitalOcean, ensuring scalable cloud orchestration and stable global communication. Latency evaluations conducted under both local and international VPN scenarios (including Hong Kong, Japan, and Belgium) demonstrate actuator response times as low as 0.2 seconds and total video latency under 1.2 seconds, even across high-latency networks. This low-latency dual-protocol design ensures responsive closed-loop interaction and robust performance in distributed environments. Unlike conventional teleoperation platforms, the proposed system emphasizes modular deployment, real-time AI sensing, and adaptable communication strategies, making it well-suited for smart city scenarios such as remote infrastructure inspection, public equipment servicing, and urban automation. Future enhancements will focus on edge-device deployment, adaptive routing, and integration with city-scale IoT networks to enhance resilience and scalability.
Path and Motion Optimization for Efficient Multi-Location Inspection with Humanoid Robots
This paper proposes a novel framework for humanoid robots to execute inspection tasks with high efficiency and millimeter-level precision. The approach combines hierarchical planning, time-optimal standing position generation, and integrated \ac{mpc} to achieve high speed and precision. A hierarchical planning strategy, leveraging \ac{ik} and \ac{mip}, reduces computational complexity by decoupling the high-dimensional planning problem. A novel MIP formulation optimizes standing position selection and trajectory length, minimizing task completion time. Furthermore, an MPC system with simplified kinematics and single-step position correction ensures millimeter-level end-effector tracking accuracy. Validated through simulations and experiments on the Kuavo 4Pro humanoid platform, the framework demonstrates low time cost and a high success rate in multi-location tasks, enabling efficient and precise execution of complex industrial operations.
HiMaCon: Discovering Hierarchical Manipulation Concepts from Unlabeled Multi-Modal Data NeurIPS 2025
Effective generalization in robotic manipulation requires representations that capture invariant patterns of interaction across environments and tasks. We present a self-supervised framework for learning hierarchical manipulation concepts that encode these invariant patterns through cross-modal sensory correlations and multi-level temporal abstractions without requiring human annotation. Our approach combines a cross-modal correlation network that identifies persistent patterns across sensory modalities with a multi-horizon predictor that organizes representations hierarchically across temporal scales. Manipulation concepts learned through this dual structure enable policies to focus on transferable relational patterns while maintaining awareness of both immediate actions and longer-term goals. Empirical evaluation across simulated benchmarks and real-world deployments demonstrates significant performance improvements with our concept-enhanced policies. Analysis reveals that the learned concepts resemble human-interpretable manipulation primitives despite receiving no semantic supervision. This work advances both the understanding of representation learning for manipulation and provides a practical approach to enhancing robotic performance in complex scenarios.
comment: Accepted at 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
Adap-RPF: Adaptive Trajectory Sampling for Robot Person Following in Dynamic Crowded Environments
Robot person following (RPF) is a core capability in human-robot interaction, enabling robots to assist users in daily activities, collaborative work, and other service scenarios. However, achieving practical RPF remains challenging due to frequent occlusions, particularly in dynamic and crowded environments. Existing approaches often rely on fixed-point following or sparse candidate-point selection with oversimplified heuristics, which cannot adequately handle complex occlusions caused by moving obstacles such as pedestrians. To address these limitations, we propose an adaptive trajectory sampling method that generates dense candidate points within socially aware zones and evaluates them using a multi-objective cost function. Based on the optimal point, a person-following trajectory is estimated relative to the predicted motion of the target. We further design a prediction-aware model predictive path integral (MPPI) controller that simultaneously tracks this trajectory and proactively avoids collisions using predicted pedestrian motions. Extensive experiments show that our method outperforms state-of-the-art baselines in smoothness, safety, robustness, and human comfort, with its effectiveness further demonstrated on a mobile robot in real-world scenarios.
comment: https://adap-rpf.github.io/
Rotor-Failure-Aware Quadrotors Flight in Unknown Environments
Rotor failures in quadrotors may result in high-speed rotation and vibration due to rotor imbalance, which introduces significant challenges for autonomous flight in unknown environments. The mainstream approaches against rotor failures rely on fault-tolerant control (FTC) and predefined trajectory tracking. To the best of our knowledge, online failure detection and diagnosis (FDD), trajectory planning, and FTC of the post-failure quadrotors in unknown and complex environments have not yet been achieved. This paper presents a rotor-failure-aware quadrotor navigation system designed to mitigate the impacts of rotor imbalance. First, a composite FDD-based nonlinear model predictive controller (NMPC), incorporating motor dynamics, is designed to ensure fast failure detection and flight stability. Second, a rotor-failure-aware planner is designed to leverage FDD results and spatial-temporal joint optimization, while a LiDAR-based quadrotor platform with four anti-torque plates is designed to enable reliable perception under high-speed rotation. Lastly, extensive benchmarks against state-of-the-art methods highlight the superior performance of the proposed approach in addressing rotor failures, including propeller unloading and motor stoppage. The experimental results demonstrate, for the first time, that our approach enables autonomous quadrotor flight with rotor failures in challenging environments, including cluttered rooms and unknown forests.
DemoHLM: From One Demonstration to Generalizable Humanoid Loco-Manipulation
Loco-manipulation is a fundamental challenge for humanoid robots to achieve versatile interactions in human environments. Although recent studies have made significant progress in humanoid whole-body control, loco-manipulation remains underexplored and often relies on hard-coded task definitions or costly real-world data collection, which limits autonomy and generalization. We present DemoHLM, a framework for humanoid loco-manipulation that enables generalizable loco-manipulation on a real humanoid robot from a single demonstration in simulation. DemoHLM adopts a hierarchy that integrates a low-level universal whole-body controller with high-level manipulation policies for multiple tasks. The whole-body controller maps whole-body motion commands to joint torques and provides omnidirectional mobility for the humanoid robot. The manipulation policies, learned in simulation via our data generation and imitation learning pipeline, command the whole-body controller with closed-loop visual feedback to execute challenging loco-manipulation tasks. Experiments show a positive correlation between the amount of synthetic data and policy performance, underscoring the effectiveness of our data generation pipeline and the data efficiency of our approach. Real-world experiments on a Unitree G1 robot equipped with an RGB-D camera validate the sim-to-real transferability of DemoHLM, demonstrating robust performance under spatial variations across ten loco-manipulation tasks.
A Primer on SO(3) Action Representations in Deep Reinforcement Learning
Many robotic control tasks require policies to act on orientations, yet the geometry of SO(3) makes this nontrivial. Because SO(3) admits no global, smooth, minimal parameterization, common representations such as Euler angles, quaternions, rotation matrices, and Lie algebra coordinates introduce distinct constraints and failure modes. While these trade-offs are well studied for supervised learning, their implications for actions in reinforcement learning remain unclear. We systematically evaluate SO(3) action representations across three standard continuous control algorithms, PPO, SAC, and TD3, under dense and sparse rewards. We compare how representations shape exploration, interact with entropy regularization, and affect training stability through empirical studies and analyze the implications of different projections for obtaining valid rotations from Euclidean network outputs. Across a suite of robotics benchmarks, we quantify the practical impact of these choices and distill simple, implementation-ready guidelines for selecting and using rotation actions. Our results highlight that representation-induced geometry strongly influences exploration and optimization and show that representing actions as tangent vectors in the local frame yields the most reliable results across algorithms.
Design and Koopman Model Predictive Control of A Soft Exoskeleton Based on Origami-Inspired Pneumatic Actuator for Knee Rehabilitation
Effective rehabilitation methods are essential for the recovery of lower limb dysfunction caused by stroke. Nowadays, robotic exoskeletons have shown great potentials in rehabilitation. Nevertheless, traditional rigid exoskeletons are usually heavy and need a lot of work to help the patients to put them on. Moreover, it also requires extra compliance control to guarantee the safety. In contrast, soft exoskeletons are easy and comfortable to wear and have intrinsic compliance, but their complex nonlinear human-robot interaction dynamics would pose significant challenges for control. In this work, based on the pneumatic actuators inspired by origami, we design a rehabilitation exoskeleton for knee that is easy and comfortable to wear. To guarantee the control performance and enable a nice human-robot interaction, we first use Deep Koopman Network to model the human-robot interaction dynamics. In particular, by viewing the electromyography (EMG) signals and the duty cycle of the PWM wave that controls the pneumatic robot's valves and pump as the inputs, the linear Koopman model accurately captures the complex human-robot interaction dynamics. Next, based on the obtained Koopman model, we further use Model Predictive Control (MPC) to control the soft robot and help the user to do rehabilitation training in real-time. The goal of the rehabilitation training is to track a given reference signal shown on the screen. Experiments show that by integrating the EMG signals into the Koopman model, we have improved the model accuracy to great extent. In addition, a personalized Koopman model trained from the individual's own data performs better than the non-personalized model. Consequently, our control framework outperforms the traditional PID control in both passive and active training modes. Hence the proposed method provides a new control framework for soft rehabilitation robots.
Flow Matching-Based Autonomous Driving Planning with Advanced Interactive Behavior Modeling NeurIPS 2025
Modeling interactive driving behaviors in complex scenarios remains a fundamental challenge for autonomous driving planning. Learning-based approaches attempt to address this challenge with advanced generative models, removing the dependency on over-engineered architectures for representation fusion. However, brute-force implementation by simply stacking transformer blocks lacks a dedicated mechanism for modeling interactive behaviors that are common in real driving scenarios. The scarcity of interactive driving data further exacerbates this problem, leaving conventional imitation learning methods ill-equipped to capture high-value interactive behaviors. We propose Flow Planner, which tackles these problems through coordinated innovations in data modeling, model architecture, and learning scheme. Specifically, we first introduce fine-grained trajectory tokenization, which decomposes the trajectory into overlapping segments to decrease the complexity of whole trajectory modeling. With a sophisticatedly designed architecture, we achieve efficient temporal and spatial fusion of planning and scene information, to better capture interactive behaviors. In addition, the framework incorporates flow matching with classifier-free guidance for multi-modal behavior generation, which dynamically reweights agent interactions during inference to maintain coherent response strategies, providing a critical boost for interactive scenario understanding. Experimental results on the large-scale nuPlan dataset and challenging interactive interPlan dataset demonstrate that Flow Planner achieves state-of-the-art performance among learning-based approaches while effectively modeling interactive behaviors in complex driving scenarios.
comment: 26 pages, 6 figures. Accepted at NeurIPS 2025
PhysHSI: Towards a Real-World Generalizable and Natural Humanoid-Scene Interaction System
Deploying humanoid robots to interact with real-world environments--such as carrying objects or sitting on chairs--requires generalizable, lifelike motions and robust scene perception. Although prior approaches have advanced each capability individually, combining them in a unified system is still an ongoing challenge. In this work, we present a physical-world humanoid-scene interaction system, PhysHSI, that enables humanoids to autonomously perform diverse interaction tasks while maintaining natural and lifelike behaviors. PhysHSI comprises a simulation training pipeline and a real-world deployment system. In simulation, we adopt adversarial motion prior-based policy learning to imitate natural humanoid-scene interaction data across diverse scenarios, achieving both generalization and lifelike behaviors. For real-world deployment, we introduce a coarse-to-fine object localization module that combines LiDAR and camera inputs to provide continuous and robust scene perception. We validate PhysHSI on four representative interactive tasks--box carrying, sitting, lying, and standing up--in both simulation and real-world settings, demonstrating consistently high success rates, strong generalization across diverse task goals, and natural motion patterns.
comment: Project website: https://why618188.github.io/physhsi/
Unveiling Uncertainty-Aware Autonomous Cooperative Learning Based Planning Strategy
In future intelligent transportation systems, autonomous cooperative planning (ACP), becomes a promising technique to increase the effectiveness and security of multi-vehicle interactions. However, multiple uncertainties cannot be fully addressed for existing ACP strategies, e.g. perception, planning, and communication uncertainties. To address these, a novel deep reinforcement learning-based autonomous cooperative planning (DRLACP) framework is proposed to tackle various uncertainties on cooperative motion planning schemes. Specifically, the soft actor-critic (SAC) with the implementation of gate recurrent units (GRUs) is adopted to learn the deterministic optimal time-varying actions with imperfect state information occurred by planning, communication, and perception uncertainties. In addition, the real-time actions of autonomous vehicles (AVs) are demonstrated via the Car Learning to Act (CARLA) simulation platform. Evaluation results show that the proposed DRLACP learns and performs cooperative planning effectively, which outperforms other baseline methods under different scenarios with imperfect AV state information.
comment: Accepted by IEEE RA-L
XGrasp: Gripper-Aware Grasp Detection with Multi-Gripper Data Generation
Most robotic grasping methods are typically designed for single gripper types, which limits their applicability in real-world scenarios requiring diverse end-effectors. We propose XGrasp, a real-time gripper-aware grasp detection framework that efficiently handles multiple gripper configurations. The proposed method addresses data scarcity by systematically augmenting existing datasets with multi-gripper annotations. XGrasp employs a hierarchical two-stage architecture. In the first stage, a Grasp Point Predictor (GPP) identifies optimal locations using global scene information and gripper specifications. In the second stage, an Angle-Width Predictor (AWP) refines the grasp angle and width using local features. Contrastive learning in the AWP module enables zero-shot generalization to unseen grippers by learning fundamental grasping characteristics. The modular framework integrates seamlessly with vision foundation models, providing pathways for future vision-language capabilities. The experimental results demonstrate competitive grasp success rates across various gripper types, while achieving substantial improvements in inference speed compared to existing gripper-aware methods. Project page: https://sites.google.com/view/xgrasp
Refinery: Active Fine-tuning and Deployment-time Optimization for Contact-Rich Policies
Simulation-based learning has enabled policies for precise, contact-rich tasks (e.g., robotic assembly) to reach high success rates (~80%) under high levels of observation noise and control error. Although such performance may be sufficient for research applications, it falls short of industry standards and makes policy chaining exceptionally brittle. A key limitation is the high variance in individual policy performance across diverse initial conditions. We introduce Refinery, an effective framework that bridges this performance gap, robustifying policy performance across initial conditions. We propose Bayesian Optimization-guided fine-tuning to improve individual policies, and Gaussian Mixture Model-based sampling during deployment to select initializations that maximize execution success. Using Refinery, we improve mean success rates by 10.98% over state-of-the-art methods in simulation-based learning for robotic assembly, reaching 91.51% in simulation and comparable performance in the real world. Furthermore, we demonstrate that these fine-tuned policies can be chained to accomplish long-horizon, multi-part assembly$\unicode{x2013}$successfully assembling up to 8 parts without requiring explicit multi-step training.
comment: in submission. 8 pages, 6 figures. Website: https://refinery-2025.github.io/refinery/
Into the Unknown: Towards using Generative Models for Sampling Priors of Environment Uncertainty for Planning in Configuration Spaces
Priors are vital for planning under partial observability, yet difficult to obtain in practice. We present a sampling-based pipeline that leverages large-scale pretrained generative models to produce probabilistic priors capturing environmental uncertainty and spatio-semantic relationships in a zero-shot manner. Conditioned on partial observations, the pipeline recovers complete RGB-D point cloud samples with occupancy and target semantics, formulated to be directly useful in configuration-space planning. We establish a Matterport3D benchmark of rooms partially visible through doorways, where a robot must navigate to an unobserved target object. Effective priors for this setting must represent both occupancy and target-location uncertainty in unobserved regions. Experiments show that our approach recovers commonsense spatial semantics consistent with ground truth, yielding diverse, clean 3D point clouds usable in motion planning, highlight the promise of generative models as a rich source of priors for robotic planning.
comment: Under Review
AMO-HEAD: Adaptive MARG-Only Heading Estimation for UAVs under Magnetic Disturbances
Accurate and robust heading estimation is crucial for unmanned aerial vehicles (UAVs) when conducting indoor inspection tasks. However, the cluttered nature of indoor environments often introduces severe magnetic disturbances, which can significantly degrade heading accuracy. To address this challenge, this paper presents an Adaptive MARG-Only Heading (AMO-HEAD) estimation approach for UAVs operating in magnetically disturbed environments. AMO-HEAD is a lightweight and computationally efficient Extended Kalman Filter (EKF) framework that leverages inertial and magnetic sensors to achieve reliable heading estimation. In the proposed approach, gyroscope angular rate measurements are integrated to propagate the quaternion state, which is subsequently corrected using accelerometer and magnetometer data. The corrected quaternion is then used to compute the UAV's heading. An adaptive process noise covariance method is introduced to model and compensate for gyroscope measurement noise, bias drift, and discretization errors arising from the Euler method integration. To mitigate the effects of external magnetic disturbances, a scaling factor is applied based on real-time magnetic deviation detection. A theoretical observability analysis of the proposed AMO-HEAD is performed using the Lie derivative. Extensive experiments were conducted in real world indoor environments with customized UAV platforms. The results demonstrate the effectiveness of the proposed algorithm in providing precise heading estimation under magnetically disturbed conditions.
Game-Theoretic Risk-Shaped Reinforcement Learning for Safe Autonomous Driving
Ensuring safety in autonomous driving (AD) remains a significant challenge, especially in highly dynamic and complex traffic environments where diverse agents interact and unexpected hazards frequently emerge. Traditional reinforcement learning (RL) methods often struggle to balance safety, efficiency, and adaptability, as they primarily focus on reward maximization without explicitly modeling risk or safety constraints. To address these limitations, this study proposes a novel game-theoretic risk-shaped RL (GTR2L) framework for safe AD. GTR2L incorporates a multi-level game-theoretic world model that jointly predicts the interactive behaviors of surrounding vehicles and their associated risks, along with an adaptive rollout horizon that adjusts dynamically based on predictive uncertainty. Furthermore, an uncertainty-aware barrier mechanism enables flexible modulation of safety boundaries. A dedicated risk modeling approach is also proposed, explicitly capturing both epistemic and aleatoric uncertainty to guide constrained policy optimization and enhance decision-making in complex environments. Extensive evaluations across diverse and safety-critical traffic scenarios show that GTR2L significantly outperforms state-of-the-art baselines, including human drivers, in terms of success rate, collision and violation reduction, and driving efficiency. The code is available at https://github.com/DanielHu197/GTR2L.
DKPMV: Dense Keypoints Fusion from Multi-View RGB Frames for 6D Pose Estimation of Textureless Objects ICRA 2026
6D pose estimation of textureless objects is valuable for industrial robotic applications, yet remains challenging due to the frequent loss of depth information. Current multi-view methods either rely on depth data or insufficiently exploit multi-view geometric cues, limiting their performance. In this paper, we propose DKPMV, a pipeline that achieves dense keypoint-level fusion using only multi-view RGB images as input. We design a three-stage progressive pose optimization strategy that leverages dense multi-view keypoint geometry information. To enable effective dense keypoint fusion, we enhance the keypoint network with attentional aggregation and symmetry-aware training, improving prediction accuracy and resolving ambiguities on symmetric objects. Extensive experiments on the ROBI dataset demonstrate that DKPMV outperforms state-of-the-art multi-view RGB approaches and even surpasses the RGB-D methods in the majority of cases. The code will be available soon.
comment: 12 pages, 9 figures, submitted to ICRA 2026
TabVLA: Targeted Backdoor Attacks on Vision-Language-Action Models
With the growing deployment of Vision-Language-Action (VLA) models in real-world embodied AI systems, their increasing vulnerability to backdoor attacks poses a serious safety threat. A backdoored VLA agent can be covertly triggered by a pre-injected backdoor to execute adversarial actions, potentially causing system failures or even physical harm. Although backdoor attacks on VLA models have been explored, prior work has focused only on untargeted attacks, leaving the more practically threatening scenario of targeted manipulation unexamined. In this paper, we study targeted backdoor attacks on VLA models and introduce TabVLA, a novel framework that enables such attacks via black-box fine-tuning. TabVLA explores two deployment-relevant inference-time threat models: input-stream editing and in-scene triggering. It formulates poisoned data generation as an optimization problem to improve attack effectivess. Experiments with OpenVLA-7B on the LIBERO benchmark reveal that the vision channel is the principal attack surface: targeted backdoors succeed with minimal poisoning, remain robust across variations in trigger design, and are degraded only by positional mismatches between fine-tuning and inference triggers. We also investigate a potential detection-based defense against TabVLA, which reconstructs latent visual triggers from the input stream to flag activation-conditioned backdoor samples. Our work highlights the vulnerability of VLA models to targeted backdoor manipulation and underscores the need for more advanced defenses.
comment: 8 pages, 8 tables, 1 figure. Under review
More than A Point: Capturing Uncertainty with Adaptive Affordance Heatmaps for Spatial Grounding in Robotic Tasks
Many language-guided robotic systems rely on collapsing spatial reasoning into discrete points, making them brittle to perceptual noise and semantic ambiguity. To address this challenge, we propose RoboMAP, a framework that represents spatial targets as continuous, adaptive affordance heatmaps. This dense representation captures the uncertainty in spatial grounding and provides richer information for downstream policies, thereby significantly enhancing task success and interpretability. RoboMAP surpasses the previous state-of-the-art on a majority of grounding benchmarks with up to a 50x speed improvement, and achieves an 82\% success rate in real-world manipulation. Across extensive simulated and physical experiments, it demonstrates robust performance and shows strong zero-shot generalization to navigation. More details and videos can be found at https://robo-map.github.io.
comment: More details and videos can be found at https://robo-map.github.io. Xiu Li (Corresponding author: Xiu Li)
Towards a Unified Understanding of Robot Manipulation: A Comprehensive Survey
Embodied intelligence has witnessed remarkable progress in recent years, driven by advances in computer vision, natural language processing, and the rise of large-scale multimodal models. Among its core challenges, robot manipulation stands out as a fundamental yet intricate problem, requiring the seamless integration of perception, planning, and control to enable interaction within diverse and unstructured environments. This survey presents a comprehensive overview of robotic manipulation, encompassing foundational background, task-organized benchmarks and datasets, and a unified taxonomy of existing methods. We extend the classical division between high-level planning and low-level control by broadening high-level planning to include language, code, motion, affordance, and 3D representations, while introducing a new taxonomy of low-level learning-based control grounded in training paradigms such as input modeling, latent learning, and policy learning. Furthermore, we provide the first dedicated taxonomy of key bottlenecks, focusing on data collection, utilization, and generalization, and conclude with an extensive review of real-world applications. Compared with prior surveys, our work offers both a broader scope and deeper insight, serving as an accessible roadmap for newcomers and a structured reference for experienced researchers. All related resources, including research papers, open-source datasets, and projects, are curated for the community at https://github.com/BaiShuanghao/Awesome-Robotics-Manipulation.
An Adaptive Transition Framework for Game-Theoretic Based Takeover
The transition of control from autonomous systems to human drivers is critical in automated driving systems, particularly due to the out-of-the-loop (OOTL) circumstances that reduce driver readiness and increase reaction times. Existing takeover strategies are based on fixed time-based transitions, which fail to account for real-time driver performance variations. This paper proposes an adaptive transition strategy that dynamically adjusts the control authority based on both the time and tracking ability of the driver trajectory. Shared control is modeled as a cooperative differential game, where control authority is modulated through time-varying objective functions instead of blending control torques directly. To ensure a more natural takeover, a driver-specific state-tracking matrix is introduced, allowing the transition to align with individual control preferences. Multiple transition strategies are evaluated using a cumulative trajectory error metric. Human-in-the-loop control scenarios of the standardized ISO lane change maneuvers demonstrate that adaptive transitions reduce trajectory deviations and driver control effort compared to conventional strategies. Experiments also confirm that continuously adjusting control authority based on real-time deviations enhances vehicle stability while reducing driver effort during takeover.
QuayPoints: A Reasoning Framework to Bridge the Information Gap Between Global and Local Planning in Autonomous Racing
Autonomous racing requires tight integration between perception, planning and control to minimize latency as well as timely decision making. A standard autonomy pipeline comprising a global planner, local planner, and controller loses information as the higher-level racing context is sequentially propagated downstream into specific task-oriented context. In particular, the global planner's understanding of optimality is typically reduced to a sparse set of waypoints, leaving the local planner to make reactive decisions with limited context. This paper investigates whether additional global insights, specifically time-optimality information, can be meaningfully passed to the local planner to improve downstream decisions. We introduce a framework that preserves essential global knowledge and conveys it to the local planner through QuayPoints regions where deviations from the optimal raceline result in significant compromises to optimality. QuayPoints enable local planners to make more informed global decisions when deviating from the raceline, such as during strategic overtaking. To demonstrate this, we integrate QuayPoints into an existing planner and show that it consistently overtakes opponents traveling at up to 75% of the ego vehicle's speed across four distinct race tracks.
comment: This work has been submitted to the IEEE for possible publication
GRIP: A Unified Framework for Grid-Based Relay and Co-Occurrence-Aware Planning in Dynamic Environments
Robots navigating dynamic, cluttered, and semantically complex environments must integrate perception, symbolic reasoning, and spatial planning to generalize across diverse layouts and object categories. Existing methods often rely on static priors or limited memory, constraining adaptability under partial observability and semantic ambiguity. We present GRIP, Grid-based Relay with Intermediate Planning, a unified, modular framework with three scalable variants: GRIP-L (Lightweight), optimized for symbolic navigation via semantic occupancy grids; GRIP-F (Full), supporting multi-hop anchor chaining and LLM-based introspection; and GRIP-R (Real-World), enabling physical robot deployment under perceptual uncertainty. GRIP integrates dynamic 2D grid construction, open-vocabulary object grounding, co-occurrence-aware symbolic planning, and hybrid policy execution using behavioral cloning, D* search, and grid-conditioned control. Empirical results on AI2-THOR and RoboTHOR benchmarks show that GRIP achieves up to 9.6% higher success rates and over $2\times$ improvement in path efficiency (SPL and SAE) on long-horizon tasks. Qualitative analyses reveal interpretable symbolic plans in ambiguous scenes. Real-world deployment on a Jetbot further validates GRIP's generalization under sensor noise and environmental variation. These results position GRIP as a robust, scalable, and explainable framework bridging simulation and real-world navigation.
comment: 17 pages, 5 figures, 8 tables
Holistic Evaluation of Multimodal LLMs on Spatial Intelligence
Multimodal models have achieved remarkable progress in recent years. Nevertheless, they continue to exhibit notable limitations in spatial understanding and reasoning, the very capability that anchors artificial general intelligence in the physical world. With the recent release of GPT-5, allegedly the most powerful AI model to date, it is timely to examine where the leading models (GPT, Gemini, Grok, Seed, Qwen, and Intern) stand on the path toward spatial intelligence. We first propose a holistic taxonomy of spatial tasks that unifies existing benchmarks and a standardized protocol for the fair evaluation of state-of-the-art proprietary and open-source models across eight key benchmarks, at a cost exceeding ten billion total tokens. Our empirical study then reveals that (1) GPT-5 demonstrates unprecedented strength in spatial intelligence (SI), yet (2) still falls short of human performance significantly across a broad spectrum of SI-tasks. Moreover, we (3) show that SI-tasks expose greater model capability deficiency than non-SI tasks, to the extent that (4) proprietary models do not exhibit a decisive advantage when facing the most difficult ones. In addition, we conduct a qualitative evaluation across a diverse set of scenarios that are intuitive for humans, yet fail even the most advanced multimodal models.
Guiding Energy-Efficient Locomotion through Impact Mitigation Rewards
Animals achieve energy-efficient locomotion by their implicit passive dynamics, a marvel that has captivated roboticists for decades.Recently, methods incorporated Adversarial Motion Prior (AMP) and Reinforcement learning (RL) shows promising progress to replicate Animals' naturalistic motion. However, such imitation learning approaches predominantly capture explicit kinematic patterns, so-called gaits, while overlooking the implicit passive dynamics. This work bridges this gap by incorporating a reward term guided by Impact Mitigation Factor (IMF), a physics-informed metric that quantifies a robot's ability to passively mitigate impacts. By integrating IMF with AMP, our approach enables RL policies to learn both explicit motion trajectories from animal reference motion and the implicit passive dynamic. We demonstrate energy efficiency improvements of up to 32%, as measured by the Cost of Transport (CoT), across both AMP and handcrafted reward structure.
ORN-CBF: Learning Observation-conditioned Residual Neural Control Barrier Functions via Hypernetworks
Control barrier functions (CBFs) have been demonstrated as an effective method for safety-critical control of autonomous systems. Although CBFs are simple to deploy, their design remains challenging, motivating the development of learning-based approaches. Yet, issues such as suboptimal safe sets, applicability in partially observable environments, and lack of rigorous safety guarantees persist. In this work, we propose observation-conditioned neural CBFs based on Hamilton-Jacobi (HJ) reachability analysis, which approximately recover the maximal safe sets. We exploit certain mathematical properties of the HJ value function, ensuring that the predicted safe set never intersects with the observed failure set. Moreover, we leverage a hypernetwork-based architecture that is particularly suitable for the design of observation-conditioned safety filters. The proposed method is examined both in simulation and hardware experiments for a ground robot and a quadcopter. The results show improved success rates and generalization to out-of-domain environments compared to the baselines.
Task-Optimized Convolutional Recurrent Networks Align with Tactile Processing in the Rodent Brain NeurIPS 2025
Tactile sensing remains far less understood in neuroscience and less effective in artificial systems compared to more mature modalities such as vision and language. We bridge these gaps by introducing a novel Encoder-Attender-Decoder (EAD) framework to systematically explore the space of task-optimized temporal neural networks trained on realistic tactile input sequences from a customized rodent whisker-array simulator. We identify convolutional recurrent neural networks (ConvRNNs) as superior encoders to purely feedforward and state-space architectures for tactile categorization. Crucially, these ConvRNN-encoder-based EAD models achieve neural representations closely matching rodent somatosensory cortex, saturating the explainable neural variability and revealing a clear linear relationship between supervised categorization performance and neural alignment. Furthermore, contrastive self-supervised ConvRNN-encoder-based EADs, trained with tactile-specific augmentations, match supervised neural fits, serving as an ethologically-relevant, label-free proxy. For neuroscience, our findings highlight nonlinear recurrent processing as important for general-purpose tactile representations in somatosensory cortex, providing the first quantitative characterization of the underlying inductive biases in this system. For embodied AI, our results emphasize the importance of recurrent EAD architectures to handle realistic tactile inputs, along with tailored self-supervised learning methods for achieving robust tactile perception with the same type of sensors animals use to sense in unstructured environments.
comment: 10 pages, 8 figures, 7 tables, NeurIPS 2025 Camera Ready Version (oral)
Investigating Memory in RL with POPGym Arcade
How should we analyze memory in deep RL? We introduce mathematical tools for fairly analyzing policies under partial observability and revealing how agents use memory to make decisions. To utilize these tools, we present POPGym Arcade, a collection of Atari-inspired, hardware-accelerated, pixel-based environments sharing a single observation and action space. Each environment provides fully and partially observable variants, enabling counterfactual studies on observability. We find that controlled studies are necessary for fair comparisons, and identify a pathology where value functions smear credit over irrelevant history. With this pathology, we demonstrate how out-of-distribution scenarios can contaminate memory, perturbing the policy far into the future, with implications for sim-to-real transfer and offline RL.
Multi-Modal Manipulation via Multi-Modal Policy Consensus
Effectively integrating diverse sensory modalities is crucial for robotic manipulation. However, the typical approach of feature concatenation is often suboptimal: dominant modalities such as vision can overwhelm sparse but critical signals like touch in contact-rich tasks, and monolithic architectures cannot flexibly incorporate new or missing modalities without retraining. Our method factorizes the policy into a set of diffusion models, each specialized for a single representation (e.g., vision or touch), and employs a router network that learns consensus weights to adaptively combine their contributions, enabling incremental of new representations. We evaluate our approach on simulated manipulation tasks in {RLBench}, as well as real-world tasks such as occluded object picking, in-hand spoon reorientation, and puzzle insertion, where it significantly outperforms feature-concatenation baselines on scenarios requiring multimodal reasoning. Our policy further demonstrates robustness to physical perturbations and sensor corruption. We further conduct perturbation-based importance analysis, which reveals adaptive shifts between modalities.
comment: 9 pages, 7 figures. Project website: https://policyconsensus.github.io
Failure Prediction at Runtime for Generative Robot Policies NeurIPS 2025
Imitation learning (IL) with generative models, such as diffusion and flow matching, has enabled robots to perform complex, long-horizon tasks. However, distribution shifts from unseen environments or compounding action errors can still cause unpredictable and unsafe behavior, leading to task failure. Early failure prediction during runtime is therefore essential for deploying robots in human-centered and safety-critical environments. We propose FIPER, a general framework for Failure Prediction at Runtime for generative IL policies that does not require failure data. FIPER identifies two key indicators of impending failure: (i) out-of-distribution (OOD) observations detected via random network distillation in the policy's embedding space, and (ii) high uncertainty in generated actions measured by a novel action-chunk entropy score. Both failure prediction scores are calibrated using a small set of successful rollouts via conformal prediction. A failure alarm is triggered when both indicators, aggregated over short time windows, exceed their thresholds. We evaluate FIPER across five simulation and real-world environments involving diverse failure modes. Our results demonstrate that FIPER better distinguishes actual failures from benign OOD situations and predicts failures more accurately and earlier than existing methods. We thus consider this work an important step towards more interpretable and safer generative robot policies. Code, data and videos are available at https://tum-lsy.github.io/fiper_website.
comment: Project page: https://tum-lsy.github.io/fiper_website. 33 pages, 12 figures. Accepted to NeurIPS 2025
Product Digital Twin Supporting End-of-life Phase of Electric Vehicle Batteries Utilizing Product-Process-Resource Asset Network
In a circular economy, products in their end-of-life phase should be either remanufactured or recycled. Both of these processes are crucial for sustainability and environmental conservation. However, manufacturers frequently do not support these processes enough in terms of not sharing relevant data about the products nor their (re-)manufacturing processes. This paper proposes to accompany each product with a digital twin technology, specifically the Product Digital Twin (PDT), which can carry information for facilitating and optimizing production and remanufacturing processes. This paper introduces a knowledge representation called Bi-Flow Product-Process-Resource Asset Network (Bi-PAN). Bi-PAN extends a well-proven Product-Process-Resource Asset Network (PAN) paradigm by integrating both assembly and disassembly workflows into a single information model. Such networks enable capturing relevant relationships across products, production resources, manufacturing processes, and specific production operations that have to be done in the manufacturing phase of a product. The proposed approach is demonstrated in a use-case of disassembling electric vehicle (EV) batteries. By utilizing PDTs with Bi-PAN knowledge models, challenges associated with disassembling of EV batteries can be solved flexibly and efficiently for various battery types, enhancing the sustainability of the EV battery life-cycle management.
comment: \copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Manipulation of Elasto-Flexible Cables with Single or Multiple UAVs
This work considers a large class of systems composed of multiple quadrotors manipulating deformable and extensible cables. The cable is described via a discretized representation, which decomposes it into linear springs interconnected through lumped-mass passive spherical joints. Sets of flat outputs are found for the systems. Numerical simulations support the findings by showing cable manipulation relying on flatness-based trajectories. Eventually, we present an experimental validation of the effectiveness of the proposed discretized cable model for a two-robot example. Moreover, a closed-loop controller based on the identified model and using cable-output feedback is experimentally tested.
Data Scaling Laws for Imitation Learning-Based End-to-End Autonomous Driving
The end-to-end autonomous driving paradigm has recently attracted lots of attention due to its scalability. However, existing methods are constrained by the limited scale of real-world data, which hinders a comprehensive exploration of the scaling laws associated with end-to-end autonomous driving. To address this issue, we collected substantial data from various driving scenarios and behaviors and conducted an extensive study on the scaling laws of existing imitation learning-based end-to-end autonomous driving paradigms. Specifically, approximately 4 million demonstrations from 23 different scenario types were gathered, amounting to over 30,000 hours of driving demonstrations. We performed open-loop evaluations and closed-loop simulation evaluations in 1,400 diverse driving demonstrations (1,300 for open-loop and 100 for closed-loop) under stringent assessment conditions. Through experimental analysis, we discovered that (1) the performance of the driving model exhibits a power-law relationship with the amount of data, but this is not the case in closed-loop evaluation. The inconsistency between the two assessments shifts our focus toward the distribution of data rather than merely expanding its volume. (2) a small increase in the quantity of long-tailed data can significantly improve the performance for the corresponding scenarios; (3) appropriate scaling of data enables the model to achieve combinatorial generalization in novel scenes and actions. Our results highlight the critical role of data scaling in improving the generalizability of models across diverse autonomous driving scenarios, assuring safe deployment in the real world.. Project repository: https://github.com/ucaszyp/Driving-Scaling-Law
PCHands: PCA-based Hand Pose Synergy Representation on Manipulators with N-DoF
We consider the problem of learning a common representation for dexterous manipulation across manipulators of different morphologies. To this end, we propose PCHands, a novel approach for extracting hand postural synergies from a large set of manipulators. We define a simplified and unified description format based on anchor positions for manipulators ranging from 2-finger grippers to 5-finger anthropomorphic hands. This enables learning a variable-length latent representation of the manipulator configuration and the alignment of the end-effector frame of all manipulators. We show that it is possible to extract principal components from this latent representation that is universal across manipulators of different structures and degrees of freedom. To evaluate PCHands, we use this compact representation to encode observation and action spaces of control policies for dexterous manipulation tasks learned with RL. In terms of learning efficiency and consistency, the proposed representation outperforms a baseline that learns the same tasks in joint space. We additionally show that PCHands performs robustly in RL from demonstration, when demonstrations are provided from a different manipulator. We further support our results with real-world experiments that involve a 2-finger gripper and a 4-finger anthropomorphic hand. Code and additional material are available at https://hsp-iit.github.io/PCHands/.
comment: 2025 IEEE-RAS 24th International Conference on Humanoid Robots
Gemini Robotics 1.5: Pushing the Frontier of Generalist Robots with Advanced Embodied Reasoning, Thinking, and Motion Transfer
General-purpose robots need a deep understanding of the physical world, advanced reasoning, and general and dexterous control. This report introduces the latest generation of the Gemini Robotics model family: Gemini Robotics 1.5, a multi-embodiment Vision-Language-Action (VLA) model, and Gemini Robotics-ER 1.5, a state-of-the-art Embodied Reasoning (ER) model. We are bringing together three major innovations. First, Gemini Robotics 1.5 features a novel architecture and a Motion Transfer (MT) mechanism, which enables it to learn from heterogeneous, multi-embodiment robot data and makes the VLA more general. Second, Gemini Robotics 1.5 interleaves actions with a multi-level internal reasoning process in natural language. This enables the robot to "think before acting" and notably improves its ability to decompose and execute complex, multi-step tasks, and also makes the robot's behavior more interpretable to the user. Third, Gemini Robotics-ER 1.5 establishes a new state-of-the-art for embodied reasoning, i.e., for reasoning capabilities that are critical for robots, such as visual and spatial understanding, task planning, and progress estimation. Together, this family of models takes us a step towards an era of physical agents-enabling robots to perceive, think and then act so they can solve complex multi-step tasks.
TTF-VLA: Temporal Token Fusion via Pixel-Attention Integration for Vision-Language-Action Models AAAI 2026
Vision-Language-Action (VLA) models process visual inputs independently at each timestep, discarding valuable temporal information inherent in robotic manipulation tasks. This frame-by-frame processing makes models vulnerable to visual noise while ignoring the substantial coherence between consecutive frames in manipulation sequences. We propose Temporal Token Fusion (TTF), a training-free approach that intelligently integrates historical and current visual representations to enhance VLA inference quality. Our method employs dual-dimension detection combining efficient grayscale pixel difference analysis with attention-based semantic relevance assessment, enabling selective temporal token fusion through hard fusion strategies and keyframe anchoring to prevent error accumulation. Comprehensive experiments across LIBERO, SimplerEnv, and real robot tasks demonstrate consistent improvements: 4.0 percentage points average on LIBERO (72.4\% vs 68.4\% baseline), cross-environment validation on SimplerEnv (4.8\% relative improvement), and 8.7\% relative improvement on real robot tasks. Our approach proves model-agnostic, working across OpenVLA and VLA-Cache architectures. Notably, TTF reveals that selective Query matrix reuse in attention mechanisms enhances rather than compromises performance, suggesting promising directions for direct KQV matrix reuse strategies that achieve computational acceleration while improving task success rates.
comment: Manuscript submitted to AAAI 2026, currently under review
Contrastive Representation Regularization for Vision-Language-Action Models
Vision-Language-Action (VLA) models have shown its capabilities in robot manipulation by leveraging rich representations from pre-trained Vision-Language Models (VLMs). However, their representations arguably remain suboptimal, lacking sensitivity to robotic signals such as control actions and proprioceptive states. To address the issue, we introduce Robot State-aware Contrastive Loss (RS-CL), a simple and effective representation regularization for VLA models, designed to bridge the gap between VLM representations and robotic signals. In particular, RS-CL aligns the representations more closely with the robot's proprioceptive states, by using relative distances between the states as soft supervision. Complementing the original action prediction objective, RS-CL effectively enhances control-relevant representation learning, while being lightweight and fully compatible with standard VLA training pipeline. Our empirical results demonstrate that RS-CL substantially improves the manipulation performance of state-of-the-art VLA models; it pushes the prior art from 30.8% to 41.5% on pick-and-place tasks in RoboCasa-Kitchen, through more accurate positioning during grasping and placing, and boosts success rates from 45.0% to 58.3% on challenging real-robot manipulation tasks.
comment: 20 pages, 12 figures
RoHOI: Robustness Benchmark for Human-Object Interaction Detection
Human-Object Interaction (HOI) detection is crucial for robot-human assistance, enabling context-aware support. However, models trained on clean datasets degrade in real-world conditions due to unforeseen corruptions, leading to inaccurate predictions. To address this, we introduce the first robustness benchmark for HOI detection, evaluating model resilience under diverse challenges. Despite advances, current models struggle with environmental variability, occlusions, and noise. Our benchmark, RoHOI, includes 20 corruption types based on the HICO-DET and V-COCO datasets and a new robustness-focused metric. We systematically analyze existing models in the HOI field, revealing significant performance drops under corruptions. To improve robustness, we propose a Semantic-Aware Masking-based Progressive Learning (SAMPL) strategy to guide the model to be optimized based on holistic and partial cues, thus dynamically adjusting the model's optimization to enhance robust feature learning. Extensive experiments show that our approach outperforms state-of-the-art methods, setting a new standard for robust HOI detection. Benchmarks, datasets, and code are available at https://github.com/KratosWen/RoHOI.
comment: Benchmarks, datasets, and code are available at https://github.com/KratosWen/RoHOI
DualTHOR: A Dual-Arm Humanoid Simulation Platform for Contingency-Aware Planning
Developing embodied agents capable of performing complex interactive tasks in real-world scenarios remains a fundamental challenge in embodied AI. Although recent advances in simulation platforms have greatly enhanced task diversity to train embodied Vision Language Models (VLMs), most platforms rely on simplified robot morphologies and bypass the stochastic nature of low-level execution, which limits their transferability to real-world robots. To address these issues, we present a physics-based simulation platform DualTHOR for complex dual-arm humanoid robots, built upon an extended version of AI2-THOR. Our simulator includes real-world robot assets, a task suite for dual-arm collaboration, and inverse kinematics solvers for humanoid robots. We also introduce a contingency mechanism that incorporates potential failures through physics-based low-level execution, bridging the gap to real-world scenarios. Our simulator enables a more comprehensive evaluation of the robustness and generalization of VLMs in household environments. Extensive evaluations reveal that current VLMs struggle with dual-arm coordination and exhibit limited robustness in realistic environments with contingencies, highlighting the importance of using our simulator to develop more capable VLMs for embodied tasks. The code is available at https://github.com/ds199895/DualTHOR.git.
comment: The experiments in the paper need to be further supplemented, and more methods should be considered for expansion
EROAS: 3D Efficient Reactive Obstacle Avoidance System for Autonomous Underwater Vehicles using 2.5D Forward-Looking Sonar
Autonomous Underwater Vehicles (AUVs) have advanced significantly in obstacle detection and path planning through sonar, cameras, and learning-based methods. However, safe and efficient navigation in cluttered environments remains challenging due to partial observability, turbidity, the limited field-of-view of forward-looking sonar (FLS), and occlusions that obscure obstacle geometry. To address these issues, we propose the Efficient Reactive Obstacle Avoidance Strategy (EROAS), a lightweight framework that augments a standard 2D FLS with a pivoting mechanism, effectively transforming it into a cost-efficient \emph{2.5D sonar}. This design provides vertical information on demand, extending situational awareness while minimizing computational overhead. EROAS integrates three complementary modules: first, Sonar Profile-guided Directional Decision Control (SPD2C) for rapid gap detection and generation of reference commands in both horizontal and vertical planes. Secondly, the Spatial Context Generator (SCG), which maintains a short-term obstacle memory of the past to mitigate partial observability, and finally, a Spatio-Temporal Control Barrier Function (ST-CBF) that enforces forward-invariance of safety constraints by filtering nominal references. Together, these components enable robust, reactive avoidance of obstacles in uncertain and cluttered 3D underwater settings. Simulation and hardware-in-the-loop (HIL) experiments validate the efficacy of the proposed EROAS algorithm, demonstrating improved trajectory efficiency, reduced travel time, and enhanced safety compared to conventional methods such as the Dynamic Window Approach (DWA) and Artificial Potential Fields (APF). https://github.com/AIRLabIISc/EROAS
comment: Submitted to IEEE Journal of Ocean Engineering
A Taylor Series Approach to Correction of Input Errors in Gaussian Process Regression
Gaussian Processes (GPs) are widely recognized as powerful non-parametric models for regression and classification. Traditional GP frameworks predominantly operate under the assumption that the inputs are either accurately known or subject to zero-mean noise. However, several real-world applications such as mobile sensors have imperfect localization, leading to inputs with biased errors. These biases can typically be estimated through measurements collected over time using, for example, Kalman filters. To avoid recomputation of the entire GP model when better estimates of the inputs used in the training data become available, we introduce a technique for updating a trained GP model to incorporate updated estimates of the inputs. By leveraging the differentiability of the mean and covariance functions derived from the squared exponential kernel, a second-order correction algorithm is developed to update the trained GP models. Precomputed Jacobians and Hessians of kernels enable real-time refinement of the mean and covariance predictions. The efficacy of the developed approach is demonstrated using two simulation studies, with error analyses revealing improvements in both predictive accuracy and uncertainty quantification.
comment: Improving the paper with better results and adding experimental results to publish again
Goal-Based Vision-Language Driving
Autonomous vehicles must react in milliseconds while reasoning about road geometry and traffic intent to navigate complex situations. We introduce NovaDrive, a single-branch vision-language architecture that processes front-camera images, HD-map tiles, LiDAR depth, and textual waypoints in a single branch. A lightweight, two-stage cross-attention block first aligns waypoint tokens with the HD map, then refines attention over fine-grained image and depth patches. Coupled with a novel smoothness loss that discourages abrupt steering and speed changes, this design eliminates the need for recurrent memory. We fine-tune the top 15 layers of an 11B LLaMA-3.2 vision-language backbone, enabling real-time inference. On the nuScenes / Waymo subset of the MD-NEX Outdoor benchmark, NovaDrive raises success rate to 84% (+4%), boosts path-efficiency (SPL) to 0.66 (+0.11), and reduces collision frequency from 2.6% to 1.2% (-1.4%) relative to the previous state-of-the-art. Our ablations confirm that waypoint tokens, partial VLM fine-tuning, and the cross-attention fusion each contribute the most to these gains. Beyond safety, NovaDrive's shorter routes (resulting from the novel smoothness loss) translate to lower fuel or battery usage, pointing toward leaner, more easily updated driving stacks. NovaDrive can be extended to other embodied-AI domains as well.
comment: 6 pages
Vision-Language Cross-Attention for Real-Time Autonomous Driving
Autonomous cars need geometric accuracy and semantic understanding to navigate complex environments, yet most stacks handle them separately. We present XYZ-Drive, a single vision-language model that reads a front-camera frame, a 25m $\times$ 25m overhead map, and the next waypoint, then outputs steering and speed. A lightweight goal-centered cross-attention layer lets waypoint tokens highlight relevant image and map patches, supporting both action and textual explanations, before the fused tokens enter a partially fine-tuned LLaMA-3.2 11B model. On the MD-NEX Outdoor-Driving benchmark XYZ-Drive attains 95% success and 0.80 Success weighted by Path Length (SPL), surpassing PhysNav-DG by 15%. and halving collisions, all while significantly improving efficiency by using only a single branch. Sixteen ablations explain the gains. Removing any modality (vision, waypoint, map) drops success by up to 11%, confirming their complementary roles and rich connections. Replacing goal-centered attention with simple concatenation cuts 3% in performance, showing query-based fusion injects map knowledge more effectively. Keeping the transformer frozen loses 5%, showing the importance of fine-tuning when applying VLMs for specific tasks such as autonomous driving. Coarsening map resolution from 10 cm to 40 cm blurs lane edges and raises crash rate. Overall, these results demonstrate that early, token-level fusion of intent and map layout enables accurate, transparent, real-time driving.
comment: 5 pages
HoMeR: Learning In-the-Wild Mobile Manipulation via Hybrid Imitation and Whole-Body Control
We introduce HoMeR, an imitation learning framework for mobile manipulation that combines whole-body control with hybrid action modes that handle both long-range and fine-grained motion, enabling effective performance on realistic in-the-wild tasks. At its core is a fast, kinematics-based whole-body controller that maps desired end-effector poses to coordinated motion across the mobile base and arm. Within this reduced end-effector action space, HoMeR learns to switch between absolute pose predictions for long-range movement and relative pose predictions for fine-grained manipulation, offloading low-level coordination to the controller and focusing learning on task-level decisions. We deploy HoMeR on a holonomic mobile manipulator with a 7-DoF arm in a real home. We compare HoMeR to baselines without hybrid actions or whole-body control across 3 simulated and 3 real household tasks such as opening cabinets, sweeping trash, and rearranging pillows. Across tasks, HoMeR achieves an overall success rate of 79.17% using just 20 demonstrations per task, outperforming the next best baseline by 29.17 on average. HoMeR is also compatible with vision-language models and can leverage their internet-scale priors to better generalize to novel object appearances, layouts, and cluttered scenes. In summary, HoMeR moves beyond tabletop settings and demonstrates a scalable path toward sample-efficient, generalizable manipulation in everyday indoor spaces. Code, videos, and supplementary material are available at: http://homer-manip.github.io
TriVLA: A Triple-System-Based Unified Vision-Language-Action Model with Episodic World Modeling for General Robot Control
Recent advances in vision-language models (VLMs) have enabled robots to follow open-ended instructions and demonstrate impressive commonsense reasoning. However, current vision-language-action (VLA) frameworks primarily rely on static representations and limited temporal context, restricting agents to short-horizon, reactive behaviors and hindering robust generalization in dynamic embodied environments. Inspired by cognitive neuroscience theories of episodic memory, we propose, to our knowledge, one of the first formalized episodic world models in VLA, enabling embodied robots to accumulate, recall, and predict sequential experiences. As an instantiation of this concept, our unified TriVLA realizes the episodic world model through a triple-system architecture: integrating multimodal grounding from a pretrained VLM (System 2) and temporally rich dynamics perception from a video diffusion model (System 3). This enables the agent to accumulate and recall sequential experiences, interpret current contexts, and predict future environmental evolution. Guided by episodic representations that span both the past and anticipated future, the downstream policy (System 1) generates coherent, context-aware action sequences through flow-matching and cross-modal attention mechanisms. Experimental results show that TriVLA operates efficiently at approximately 36 Hz and consistently outperforms baseline models on standard benchmarks and challenging real-world manipulation tasks. It demonstrates strong long-horizon planning and open-ended intent understanding, showcasing the advantages of episodic world model-inspired reasoning for robust, generalizable robot intelligence. Project Page: https://zhenyangliu.github.io/TriVLA/.
A High-frequency, Interaction-induced Pneumatic Oscillator Enabling Versatile Soft Robotics
Soft robots, while highly adaptable to diverse environments through various actuation methods, still face significant performance boundary due to the inherent properties of materials. These limitations manifest in the challenge of guaranteeing rapid response and large-scale movements simultaneously, ultimately restricting the robots' absolute speed and overall efficiency. In this paper, we introduce a high-frequency pneumatic oscillator (HIPO) to overcome these challenges. Through a collision-induced phase resetting mechanism, our HIPO leverages event-based nonlinearity to trigger self-oscillation of pneumatic actuator, which positively utilizes intrinsic characteristics of materials. This enables the system to spontaneously generate periodic control signals and directly produce motion responses, eliminating the need for incorporating external actuation components. By efficiently and rapidly converting internal energy of airflow into the kinetic energy of robots, HIPO achieves a frequency of up to 20 Hz. Furthermore, we demonstrate the versatility and high-performance capabilities of HIPO through bio-inspired robots: an insect-like fast-crawler (with speeds up to 50.27 cm/s), a high-frequency butterfly-like wing-flapper, and a maneuverable duck-like swimmer. By eliminating external components and seamlessly fusing signal generation, energy conversion, and motion output, HIPO unleashes rapid and efficient motion, unlocking potential for high-performance soft robotics.
Data Scaling Laws in Imitation Learning for Robotic Manipulation
Data scaling has revolutionized fields like natural language processing and computer vision, providing models with remarkable generalization capabilities. In this paper, we investigate whether similar data scaling laws exist in robotics, particularly in robotic manipulation, and whether appropriate data scaling can yield single-task robot policies that can be deployed zero-shot for any object within the same category in any environment. To this end, we conduct a comprehensive empirical study on data scaling in imitation learning. By collecting data across numerous environments and objects, we study how a policy's generalization performance changes with the number of training environments, objects, and demonstrations. Throughout our research, we collect over 40,000 demonstrations and execute more than 15,000 real-world robot rollouts under a rigorous evaluation protocol. Our findings reveal several intriguing results: the generalization performance of the policy follows a roughly power-law relationship with the number of environments and objects. The diversity of environments and objects is far more important than the absolute number of demonstrations; once the number of demonstrations per environment or object reaches a certain threshold, additional demonstrations have minimal effect. Based on these insights, we propose an efficient data collection strategy. With four data collectors working for one afternoon, we collect sufficient data to enable the policies for two tasks to achieve approximately 90% success rates in novel environments with unseen objects.
Precise Mobile Manipulation of Small Everyday Objects
Many everyday mobile manipulation tasks require precise interaction with small objects, such as grasping a knob to open a cabinet or pressing a light switch. In this paper, we develop Servoing with Vision Models (SVM), a closed-loop framework that enables a mobile manipulator to tackle such precise tasks involving the manipulation of small objects. SVM uses state-of-the-art vision foundation models to generate 3D targets for visual servoing to enable diverse tasks in novel environments. Naively doing so fails because of occlusion by the end-effector. SVM mitigates this using vision models that out-paint the end-effector, thereby significantly enhancing target localization. We demonstrate that aided by out-painting methods, open-vocabulary object detectors can serve as a drop-in module for SVM to seek semantic targets (e.g. knobs) and point tracking methods can help SVM reliably pursue interaction sites indicated by user clicks. We conduct a large-scale evaluation spanning experiments in 10 novel environments across 6 buildings including 72 different object instances. SVM obtains a 71% zero-shot success rate on manipulating unseen objects in novel environments in the real world, outperforming an open-loop control method by an absolute 42% and an imitation learning baseline trained on 1000+ demonstrations also by an absolute success rate of 50%.
comment: Project webpage: https://arjung128.github.io/svm
Smooth Model Predictive Path Integral Control without Smoothing IROS 2022
We present a sampling-based control approach that can generate smooth actions for general nonlinear systems without external smoothing algorithms. Model Predictive Path Integral (MPPI) control has been utilized in numerous robotic applications due to its appealing characteristics to solve non-convex optimization problems. However, the stochastic nature of sampling-based methods can cause significant chattering in the resulting commands. Chattering becomes more prominent in cases where the environment changes rapidly, possibly even causing the MPPI to diverge. To address this issue, we propose a method that seamlessly combines MPPI with an input-lifting strategy. In addition, we introduce a new action cost to smooth control sequence during trajectory rollouts while preserving the information theoretic interpretation of MPPI, which was derived from non-affine dynamics. We validate our method in two nonlinear control tasks with neural network dynamics: a pendulum swing-up task and a challenging autonomous driving task. The experimental results demonstrate that our method outperforms the MPPI baselines with additionally applied smoothing algorithms.
comment: Accepted to IEEE Robotics and Automation Letters (and IROS 2022). Project page: https://www.taekyung.me/smppi
Visual Affordance Prediction: Survey and Reproducibility
Affordances are the potential actions an agent can perform on an object, as observed by a camera. Visual affordance prediction is formulated differently for tasks such as grasping detection, affordance classification, affordance segmentation, and hand pose estimation. This diversity in formulations leads to inconsistent definitions that prevent fair comparisons between methods. In this paper, we propose a unified formulation of visual affordance prediction by accounting for the complete information on the objects of interest and the interaction of the agent with the objects to accomplish a task. This unified formulation allows us to comprehensively and systematically review disparate visual affordance works, highlighting strengths and limitations of both methods and datasets. We also discuss reproducibility issues, such as the unavailability of methods implementation and experimental setups details, making benchmarks for visual affordance prediction unfair and unreliable. To favour transparency, we introduce the Affordance Sheet, a document that details the solution, datasets, and validation of a method, supporting future reproducibility and fairness in the community.
comment: 18 pages, 3 figures, 13 tables. Project website at https://apicis.github.io/aff-survey/
Model Predictive Inferential Control of Neural State-Space Models for Autonomous Vehicle Motion Planning
Model predictive control (MPC) has proven useful in enabling safe and optimal motion planning for autonomous vehicles. In this paper, we investigate how to achieve MPC-based motion planning when a neural state-space model represents the vehicle dynamics. As the neural state-space model will lead to highly complex, nonlinear and nonconvex optimization landscapes, mainstream gradient-based MPC methods will be computationally too heavy to be a viable solution. In a departure, we propose the idea of model predictive inferential control (MPIC), which seeks to infer the best control decisions from the control objectives and constraints. Following the idea, we convert the MPC problem for motion planning into a Bayesian state estimation problem. Then, we develop a new particle filtering/smoothing approach to perform the estimation. This approach is implemented as banks of unscented Kalman filters/smoothers and offers high sampling efficiency, fast computation, and estimation accuracy. We evaluate the MPIC approach through a simulation study of autonomous driving in different scenarios, along with an exhaustive comparison with gradient-based MPC. The results show that the MPIC approach has considerable computational efficiency, regardless of complex neural network architectures, and shows the capability to solve large-scale MPC problems for neural state-space models.
Multiagent Systems
StoryBox: Collaborative Multi-Agent Simulation for Hybrid Bottom-Up Long-Form Story Generation Using Large Language Models
Human writers often begin their stories with an overarching mental scene, where they envision the interactions between characters and their environment. Inspired by this creative process, we propose a novel approach to long-form story generation, termed hybrid bottom-up long-form story generation, using multi-agent simulations. In our method, agents interact within a dynamic sandbox environment, where their behaviors and interactions with one another and the environment generate emergent events. These events form the foundation for the story, enabling organic character development and plot progression. Unlike traditional top-down approaches that impose rigid structures, our hybrid bottom-up approach allows for the natural unfolding of events, fostering more spontaneous and engaging storytelling. The system is capable of generating stories exceeding 10,000 words while maintaining coherence and consistency, addressing some of the key challenges faced by current story generation models. We achieve state-of-the-art performance across several metrics. This approach offers a scalable and innovative solution for creating dynamic, immersive long-form stories that evolve organically from agent-driven interactions.
comment: Project: https://storyboxproject.github.io
Coordinated Strategies in Realistic Air Combat by Hierarchical Multi-Agent Reinforcement Learning
Achieving mission objectives in a realistic simulation of aerial combat is highly challenging due to imperfect situational awareness and nonlinear flight dynamics. In this work, we introduce a novel 3D multi-agent air combat environment and a Hierarchical Multi-Agent Reinforcement Learning framework to tackle these challenges. Our approach combines heterogeneous agent dynamics, curriculum learning, league-play, and a newly adapted training algorithm. To this end, the decision-making process is organized into two abstraction levels: low-level policies learn precise control maneuvers, while high-level policies issue tactical commands based on mission objectives. Empirical results show that our hierarchical approach improves both learning efficiency and combat performance in complex dogfight scenarios.
comment: 2025 IEEE International Conference on Agentic AI (ICA)
Autonomous vehicles need social awareness to find optima in multi-agent reinforcement learning routing games
Previous work has shown that when multiple selfish Autonomous Vehicles (AVs) are introduced to future cities and start learning optimal routing strategies using Multi-Agent Reinforcement Learning (MARL), they may destabilize traffic systems, as they would require a significant amount of time to converge to the optimal solution, equivalent to years of real-world commuting. We demonstrate that moving beyond the selfish component in the reward significantly relieves this issue. If each AV, apart from minimizing its own travel time, aims to reduce its impact on the system, this will be beneficial not only for the system-wide performance but also for each individual player in this routing game. By introducing an intrinsic reward signal based on the marginal cost matrix, we significantly reduce training time and achieve convergence more reliably. Marginal cost quantifies the impact of each individual action (route-choice) on the system (total travel time). Including it as one of the components of the reward can reduce the degree of non-stationarity by aligning agents' objectives. Notably, the proposed counterfactual formulation preserves the system's equilibria and avoids oscillations. Our experiments show that training MARL algorithms with our novel reward formulation enables the agents to converge to the optimal solution, whereas the baseline algorithms fail to do so. We show these effects in both a toy network and the real-world network of Saint-Arnoult. Our results optimistically indicate that social awareness (i.e., including marginal costs in routing decisions) improves both the system-wide and individual performance of future urban systems with AVs.
A Vision for Access Control in LLM-based Agent Systems
The autonomy and contextual complexity of LLM-based agents render traditional access control (AC) mechanisms insufficient. Static, rule-based systems designed for predictable environments are fundamentally ill-equipped to manage the dynamic information flows inherent in agentic interactions. This position paper argues for a paradigm shift from binary access control to a more sophisticated model of information governance, positing that the core challenge is not merely about permission, but about governing the flow of information. We introduce Agent Access Control (AAC), a novel framework that reframes AC as a dynamic, context-aware process of information flow governance. AAC operates on two core modules: (1) multi-dimensional contextual evaluation, which assesses not just identity but also relationships, scenarios, and norms; and (2) adaptive response formulation, which moves beyond simple allow/deny decisions to shape information through redaction, summarization, and paraphrasing. This vision, powered by a dedicated AC reasoning engine, aims to bridge the gap between human-like nuanced judgment and scalable Al safety, proposing a new conceptual lens for future research in trustworthy agent design.
comment: 10 pages, 1 figure
Stronger Together: On-Policy Reinforcement Learning for Collaborative LLMs
Multi-agent systems (MAS) and reinforcement learning (RL) are widely used to enhance the agentic capabilities of large language models (LLMs). MAS improves task performance through role-based orchestration, while RL uses environmental rewards to learn stronger policies, such as GRPO-style optimization. However, applying on-policy RL to MAS remains underexplored and presents unique challenges. Algorithmically, standard GRPO grouping assumptions break down because prompts vary by role and by turn. System-wise, the training stack must support MAS-workflow rollouts and on-policy updates for both single-policy and multi-policy models. We propose AT-GRPO, which includes (i) an agent- and turn-wise grouped RL algorithm tailored to MAS and (ii) a training system that supports both single- and multi-policy regimes. Across game, planning, coding, and math tasks, AT-GRPO delivers substantial gains. On long-horizon planning, it increases accuracy from a 14.0 to 47.0 percent single-agent RL baseline to 96.0 to 99.5 percent. It also improves reasoning performance, with average gains of 3.87 to 7.62 percent on coding tasks and 9.0 to 17.93 percent on math. Code and environments are available at: https://github.com/pettingllms-ai/PettingLLMs.
Automating Structural Engineering Workflows with Large Language Model Agents
We introduce $\textbf{MASSE}$, the first Multi-Agent System for Structural Engineering, effectively integrating large language model (LLM)-based agents with real-world engineering workflows. Structural engineering is a fundamental yet traditionally stagnant domain, with core workflows remaining largely unchanged for decades despite its substantial economic impact and global market size. Recent advancements in LLMs have significantly enhanced their ability to perform complex reasoning, long-horizon planning, and precise tool utilization -- capabilities well aligned with structural engineering tasks such as interpreting design codes, executing load calculations, and verifying structural capacities. We present a proof-of-concept showing that most real-world structural engineering workflows can be fully automated through a training-free LLM-based multi-agent system. MASSE enables immediate deployment in professional environments, and our comprehensive validation on real-world case studies demonstrates that it can reduce expert workload from approximately two hours to mere minutes, while enhancing both reliability and accuracy in practical engineering scenarios.
comment: Code: https://github.com/DelosLiang/masse
Comparative Evaluation of Neural Network Architectures for Generalizable Human Spatial Preference Prediction in Unseen Built Environments
The capacity to predict human spatial preferences within built environments is instrumental for developing Cyber-Physical-Social Infrastructure Systems (CPSIS). A significant challenge in this domain is the generalizability of preference models, particularly their efficacy in predicting preferences within environmental configurations not encountered during training. While deep learning models have shown promise in learning complex spatial and contextual dependencies, it remains unclear which neural network architectures are most effective at generalizing to unseen layouts. To address this, we conduct a comparative study of Graph Neural Networks, Convolutional Neural Networks, and standard feedforward Neural Networks using synthetic data generated from a simplified and synthetic pocket park environment. Beginning with this illustrative case study, allows for controlled analysis of each model's ability to transfer learned preference patterns to unseen spatial scenarios. The models are evaluated based on their capacity to predict preferences influenced by heterogeneous physical, environmental, and social features. Generalizability score is calculated using the area under the precision-recall curve for the seen and unseen layouts. This generalizability score is appropriate for imbalanced data, providing insights into the suitability of each neural network architecture for preference-aware human behavior modeling in unseen built environments.
comment: The 15th International Workshop on Structural Health Monitoring (IWSHM)
The Social Cost of Intelligence: Emergence, Propagation, and Amplification of Stereotypical Bias in Multi-Agent Systems
Bias in large language models (LLMs) remains a persistent challenge, manifesting in stereotyping and unfair treatment across social groups. While prior research has primarily focused on individual models, the rise of multi-agent systems (MAS), where multiple LLMs collaborate and communicate, introduces new and largely unexplored dynamics in bias emergence and propagation. In this work, we present a comprehensive study of stereotypical bias in MAS, examining how internal specialization, underlying LLMs and inter-agent communication protocols influence bias robustness, propagation, and amplification. We simulate social contexts where agents represent different social groups and evaluate system behavior under various interaction and adversarial scenarios. Experiments on three bias benchmarks reveal that MAS are generally less robust than single-agent systems, with bias often emerging early through in-group favoritism. However, cooperative and debate-based communication can mitigate bias amplification, while more robust underlying LLMs improve overall system stability. Our findings highlight critical factors shaping fairness and resilience in multi-agent LLM systems.
comment: 15 pages, 19 figures, Preprint. Under review
Rationally Analyzing Shelby: Proving Incentive Compatibility in a Decentralized Storage Network
Decentralized storage is one of the most natural applications built on blockchains and a central component of the Web3 ecosystem. Yet despite a decade of active development -- from IPFS and Filecoin to more recent entrants -- most of these storage protocols have received limited formal analysis of their incentive properties. Claims of incentive compatibility are sometimes made, but rarely proven. This gap matters: without well-designed incentives, a system may distribute storage but fail to truly decentralize it. We analyze Shelby -- a storage network protocol recently proposed by Aptos Labs and Jump Crypto -- and provide the first formal proof of its incentive properties. Our game-theoretic model shows that while off-chain audits alone collapse to universal shirking, Shelby's combination of peer audits with occasional on-chain verification yields incentive compatibility under natural parameter settings. We also examine coalition behavior and outline a simple modification that strengthens the protocol's collusion-resilience.
comment: 23 pages, 1 figure
Mean-Field Games with Constraints
This paper introduces a framework of Constrained Mean-Field Games (CMFGs), where each agent solves a constrained Markov decision process (CMDP). This formulation captures scenarios in which agents' strategies are subject to feasibility, safety, or regulatory restrictions, thereby extending the scope of classical mean field game (MFG) models. We first establish the existence of CMFG equilibria under a strict feasibility assumption, and we further show uniqueness under a classical monotonicity condition. To compute equilibria, we develop Constrained Mean-Field Occupation Measure Optimization (CMFOMO), an optimization-based scheme that parameterizes occupation measures and shows that finding CMFG equilibria is equivalent to solving a single optimization problem with convex constraints and bounded variables. CMFOMO does not rely on uniqueness of the equilibria and can approximate all equilibria with arbitrary accuracy. We further prove that CMFG equilibria induce $O(1 / \sqrt{N})$-Nash equilibria in the associated constrained $N$-player games, thereby extending the classical justification of MFGs as approximations for large but finite systems. Numerical experiments on a modified Susceptible-Infected-Susceptible (SIS) epidemic model with various constraints illustrate the effectiveness and flexibility of the framework.
Empirical Study on Robustness and Resilience in Cooperative Multi-Agent Reinforcement Learning NeurIPS 2025
In cooperative Multi-Agent Reinforcement Learning (MARL), it is a common practice to tune hyperparameters in ideal simulated environments to maximize cooperative performance. However, policies tuned for cooperation often fail to maintain robustness and resilience under real-world uncertainties. Building trustworthy MARL systems requires a deep understanding of robustness, which ensures stability under uncertainties, and resilience, the ability to recover from disruptions--a concept extensively studied in control systems but largely overlooked in MARL. In this paper, we present a large-scale empirical study comprising over 82,620 experiments to evaluate cooperation, robustness, and resilience in MARL across 4 real-world environments, 13 uncertainty types, and 15 hyperparameters. Our key findings are: (1) Under mild uncertainty, optimizing cooperation improves robustness and resilience, but this link weakens as perturbations intensify. Robustness and resilience also varies by algorithm and uncertainty type. (2) Robustness and resilience do not generalize across uncertainty modalities or agent scopes: policies robust to action noise for all agents may fail under observation noise on a single agent. (3) Hyperparameter tuning is critical for trustworthy MARL: surprisingly, standard practices like parameter sharing, GAE, and PopArt can hurt robustness, while early stopping, high critic learning rates, and Leaky ReLU consistently help. By optimizing hyperparameters only, we observe substantial improvement in cooperation, robustness and resilience across all MARL backbones, with the phenomenon also generalizing to robust MARL methods across these backbones. Code and results available at https://github.com/BUAA-TrustworthyMARL/adv_marl_benchmark .
comment: 44 pages, 16 figures, NeurIPS 2025
Semantic knowledge guides innovation and drives cultural evolution
Cumulative cultural evolution enables human societies to generate increasingly complex knowledge and technology over generations. While social learning transmits innovations between individuals and generations, the cognitive processes that generate these innovations remain poorly understood. Here, we demonstrate that semantic knowledge-structured associations between concepts and their functions-provides cognitive scaffolding for cumulative innovation by guiding exploration toward plausible and meaningful actions. We tested this hypothesis using a cultural evolutionary agent-based model and a large-scale behavioural experiment (N = 1,243), in which individuals performed a task requiring the combination of items into novel innovations. Across both approaches, semantic knowledge and social learning interact synergistically to enhance innovation. Behaviorally, participants without access to semantic knowledge performed no better than chance, even when social learning was available, and relied on shallow exploration strategies. These findings suggest that semantic knowledge is a key cognitive process enabling human cumulative culture.
Talk Isn't Always Cheap: Understanding Failure Modes in Multi-Agent Debate ICML
While multi-agent debate has been proposed as a promising strategy for improving AI reasoning ability, we find that debate can sometimes be harmful rather than helpful. Prior work has primarily focused on debates within homogeneous groups of agents, whereas we explore how diversity in model capabilities influences the dynamics and outcomes of multi-agent interactions. Through a series of experiments, we demonstrate that debate can lead to a decrease in accuracy over time - even in settings where stronger (i.e., more capable) models outnumber their weaker counterparts. Our analysis reveals that models frequently shift from correct to incorrect answers in response to peer reasoning, favoring agreement over challenging flawed reasoning. We perform additional experiments investigating various potential contributing factors to these harmful shifts - including sycophancy, social conformity, and model and task type. These results highlight important failure modes in the exchange of reasons during multi-agent debate, suggesting that naive applications of debate may cause performance degradation when agents are neither incentivised nor adequately equipped to resist persuasive but incorrect reasoning.
comment: ICML MAS Workshop 2025
Simulating Persuasive Dialogues on Meat Reduction with Generative Agents
Meat reduction benefits human and planetary health, but social norms keep meat central in shared meals. To date, the development of communication strategies that promote meat reduction while minimizing social costs has required the costly involvement of human participants at each stage of the process. We present work in progress on simulating multi-round dialogues on meat reduction between Generative Agents based on large language models (LLMs). We measure our main outcome using established psychological questionnaires based on the Theory of Planned Behavior and additionally investigate Social Costs. We find evidence that our preliminary simulations produce outcomes that are (i) consistent with theoretical expectations; and (ii) valid when compared to data from previous studies with human participants. Generative agent-based models are a promising tool for identifying novel communication strategies on meat reduction -- tailored to highly specific participant groups -- to then be tested in subsequent studies with human participants.
comment: Code available at https://github.com/dess-mannheim/MeatlessAgents
Incentivize Contribution and Learn Parameters Too: Federated Learning with Strategic Data Owners
Classical federated learning (FL) assumes that the clients have a limited amount of noisy data with which they voluntarily participate and contribute towards learning a global, more accurate model in a principled manner. The learning happens in a distributed fashion without sharing the data with the center. However, these methods do not consider the incentive of an agent for participating and contributing to the process, given that data collection and running a distributed algorithm is costly for the clients. The question of rationality of contribution has been asked recently in the literature and some results exist that consider this problem. This paper addresses the question of simultaneous parameter learning and incentivizing contribution in a truthful manner, which distinguishes it from the extant literature. Our first mechanism incentivizes each client to contribute to the FL process at a Nash equilibrium and simultaneously learn the model parameters. We also ensure that agents are incentivized to truthfully reveal information in the intermediate stages of the algorithm. However, this equilibrium outcome can be away from the optimal, where clients contribute with their full data and the algorithm learns the optimal parameters. We propose a second mechanism that enables the full data contribution along with optimal parameter learning. Large scale experiments with real (federated) datasets (CIFAR-10, FEMNIST, and Twitter) show that these algorithms converge quite fast in practice, yield good welfare guarantees and better model performance for all agents.
comment: 25 pages, under review
Agent-Based Modelling for Real-World Stock Markets under Behavioral Economic Principles
The reproduction of realistic dynamics in financial markets is of great significance, as it enhances our understanding of market evolution beyond other physical processes, and facilitates the development and backtesting of investment strategies. Most existing literature approaches this issue as a time series forecasting problem, which often faces challenges such as 1) overfitting historical data, 2) failing to reconstruct stylized facts, and 3) limiting users' ability to conduct counterfactual analyses. To address these limitations, we employ agent-based modeling (ABM) for market simulation, where each trader acts as an autonomous agent guided by established behavioral-economic principles. The parameters of the agent model are subsequently calibrated using deep learning techniques. Additionally, we align our agent model with publicly available economic indices, such as the Consumer Price Index (CPI), to enhance the explainability of our system's outcomes. Our experiments demonstrate that the ABM method effectively reproduces market dynamics with a confidence level of 90%, accurately reflecting well-known stylized facts. Furthermore, the calibration process proves to be more computationally efficient compared to other existing methods that perform simulation-based inference. We also present case studies illustrating the correlation between agent parameters and economic indices.
Systems and Control (CS)
Analysis of the Geometric Heat Flow Equation: Computing Geodesics in Real-Time with Convergence Guarantees
We present an analysis on the convergence properties of the so-called geometric heat flow equation for computing geodesics (shortest-path~curves) on Riemannian manifolds. Computing geodesics numerically in real-time has become an important capability in several fields, including control and motion planning. The geometric heat flow equation involves solving a parabolic partial differential equation whose solution is a geodesic. In practice, solving this PDE numerically can be done efficiently, and tends to be more numerically stable and exhibit a better rate of convergence compared to numerical optimization. We prove that the geometric heat flow equation is globally exponentially stable in $L_2$ if the curvature of the Riemannian manifold is not too positive, and that asymptotic convergence in $L_2$ is always guaranteed. We also present a pseudospectral method that leverages Chebyshev polynomials to accurately compute geodesics in only a few milliseconds for non-contrived manifolds. Our analysis was verified with our custom pseudospectral method by computing geodesics on common non-Euclidean surfaces, and in feedback for a contraction-based controller with a non-flat metric for a nonlinear system.
Ego-Vision World Model for Humanoid Contact Planning
Enabling humanoid robots to exploit physical contact, rather than simply avoid collisions, is crucial for autonomy in unstructured environments. Traditional optimization-based planners struggle with contact complexity, while on-policy reinforcement learning (RL) is sample-inefficient and has limited multi-task ability. We propose a framework combining a learned world model with sampling-based Model Predictive Control (MPC), trained on a demonstration-free offline dataset to predict future outcomes in a compressed latent space. To address sparse contact rewards and sensor noise, the MPC uses a learned surrogate value function for dense, robust planning. Our single, scalable model supports contact-aware tasks, including wall support after perturbation, blocking incoming objects, and traversing height-limited arches, with improved data efficiency and multi-task capability over on-policy RL. Deployed on a physical humanoid, our system achieves robust, real-time contact planning from proprioception and ego-centric depth images. Website: https://ego-vcp.github.io/
Smooth Spatiotemporal Tube Synthesis for Prescribed-Time Reach-Avoid-Stay Control
In this work, we address the issue of controller synthesis for a control-affine nonlinear system to meet prescribed time reach-avoid-stay specifications. Our goal is to improve upon previous methods based on spatiotemporal tubes (STTs) by eliminating the need for circumvent functions, which often lead to abrupt tube modifications and high control effort. We propose an adaptive framework that constructs smooth STTs around static unsafe sets, enabling continuous avoidance while guiding the system toward the target within the prescribed time. A closed-form, approximation-free control law is derived to ensure the system trajectory remains within the tube and satisfies the RAS task. The effectiveness of the proposed approach is demonstrated through a case study, showing a significant reduction in control effort compared to prior methods.
IntersectioNDE: Learning Complex Urban Traffic Dynamics based on Interaction Decoupling Strategy SC 2025
Realistic traffic simulation is critical for ensuring the safety and reliability of autonomous vehicles (AVs), especially in complex and diverse urban traffic environments. However, existing data-driven simulators face two key challenges: a limited focus on modeling dense, heterogeneous interactions at urban intersections - which are prevalent, crucial, and practically significant in countries like China, featuring diverse agents including motorized vehicles (MVs), non-motorized vehicles (NMVs), and pedestrians - and the inherent difficulty in robustly learning high-dimensional joint distributions for such high-density scenes, often leading to mode collapse and long-term simulation instability. We introduce City Crossings Dataset (CiCross), a large-scale dataset collected from a real-world urban intersection, uniquely capturing dense, heterogeneous multi-agent interactions, particularly with a substantial proportion of MVs, NMVs and pedestrians. Based on this dataset, we propose IntersectioNDE (Intersection Naturalistic Driving Environment), a data-driven simulator tailored for complex urban intersection scenarios. Its core component is the Interaction Decoupling Strategy (IDS), a training paradigm that learns compositional dynamics from agent subsets, enabling the marginal-to-joint simulation. Integrated into a scene-aware Transformer network with specialized training techniques, IDS significantly enhances simulation robustness and long-term stability for modeling heterogeneous interactions. Experiments on CiCross show that IntersectioNDE outperforms baseline methods in simulation fidelity, stability, and its ability to replicate complex, distribution-level urban traffic dynamics.
comment: Accepted by ITSC 2025
A Physics-Informed Reinforcement Learning Approach for Degradation-Aware Long-Term Charging Optimization in Batteries
Batteries degrade with usage and continuous cycling. This aging is typically reflected through the resistance growth and the capacity fade of battery cells. Over the years, various charging methods have been presented in the literature that proposed current profiles in order to enable optimal, fast, and/or health-conscious charging. However, very few works have attempted to make the ubiquitous Constant Current Constant Voltage (CCCV) charging protocol adaptive to the changing battery health as it cycles. This work aims to address this gap and proposes a framework that optimizes the constant current part of the CCCV protocol adapting to long-term battery degradation. Specifically, a physics-informed Reinforcement Learning (RL) approach has been used that not only estimates a key battery degradation mechanism, namely, Loss of Active Material (LAM), but also adjusts the current magnitude of CCCV as a result of this particular degradation. The proposed framework has been implemented by combining PyBamm, an open-source battery modeling tool, and Stable-baselines where the RL agent was trained using a Proximal Policy Optimization (PPO) network. Simulation results show the potential of the proposed framework for enhancing the widely used CCCV protocol by embedding physics information in RL algorithm. A comparative study of this proposed agent has also been discussed with 2 other charging protocols generated by a non-physics-based RL agent and a constant CCCV for all the cycles.
Constraint-Aware Reinforcement Learning via Adaptive Action Scaling
Safe reinforcement learning (RL) seeks to mitigate unsafe behaviors that arise from exploration during training by reducing constraint violations while maintaining task performance. Existing approaches typically rely on a single policy to jointly optimize reward and safety, which can cause instability due to conflicting objectives, or they use external safety filters that override actions and require prior system knowledge. In this paper, we propose a modular cost-aware regulator that scales the agent's actions based on predicted constraint violations, preserving exploration through smooth action modulation rather than overriding the policy. The regulator is trained to minimize constraint violations while avoiding degenerate suppression of actions. Our approach integrates seamlessly with off-policy RL methods such as SAC and TD3, and achieves state-of-the-art return-to-cost ratios on Safety Gym locomotion tasks with sparse costs, reducing constraint violations by up to 126 times while increasing returns by over an order of magnitude compared to prior methods.
The Role of Flexible Connection in Accelerating Load Interconnection in Distribution Networks
This paper investigates the role of flexible connection in accelerating the interconnection of large loads amid rising electricity demand from data centers and electrification. Flexible connection allows new loads to defer or curtail consumption during rare, grid-constrained periods, enabling faster access without major infrastructure upgrades. To quantify how flexible connection unlocks load hosting capacity, we formulate a flexibility-aware hosting capacity analysis problem that explicitly limits the number of utility-controlled interventions per year, ensuring infrequent disruption. Efficient solution methods are developed for this nonconvex problem and applied to real load data and test feeders. Empirical results reveal that modest flexibility, i.e., few interventions with small curtailments or delays, can unlock substantial hosting capacity. Theoretical analysis further explains and generalizes these findings, highlighting the broad potential of flexible connection.
A Faster and More Reliable Middleware for Autonomous Driving Systems
Ensuring safety in high-speed autonomous vehicles requires rapid control loops and tightly bounded delays from perception to actuation. Many open-source autonomy systems rely on ROS 2 middleware; when multiple sensor and control nodes share one compute unit, ROS 2 and its DDS transports add significant (de)serialization, copying, and discovery overheads, shrinking the available time budget. We present Sensor-in-Memory (SIM), a shared-memory transport designed for intra-host pipelines in autonomous vehicles. SIM keeps sensor data in native memory layouts (e.g., cv::Mat, PCL), uses lock-free bounded double buffers that overwrite old data to prioritize freshness, and integrates into ROS 2 nodes with four lines of code. Unlike traditional middleware, SIM operates beside ROS 2 and is optimized for applications where data freshness and minimal latency outweigh guaranteed completeness. SIM provides sequence numbers, a writer heartbeat, and optional checksums to ensure ordering, liveness, and basic integrity. On an NVIDIA Jetson Orin Nano, SIM reduces data-transport latency by up to 98% compared to ROS 2 zero-copy transports such as FastRTPS and Zenoh, lowers mean latency by about 95%, and narrows 95th/99th-percentile tail latencies by around 96%. In tests on a production-ready Level 4 vehicle running Autoware.Universe, SIM increased localization frequency from 7.5 Hz to 9.5 Hz. Applied across all latency-critical modules, SIM cut average perception-to-decision latency from 521.91 ms to 290.26 ms, reducing emergency braking distance at 40 mph (64 km/h) on dry concrete by 13.6 ft (4.14 m).
comment: 8 pages,7 figures, 8 tables
Trajectory control of a suspended load with non-stopping flying carriers
This paper presents the first closed-loop control framework for cooperative payload transportation with non-stopping flying carriers. Building upon grasp-matrix formulations and internal force redundancy, we propose a feedback wrench controller that actively regulates the payload's pose while an optimization layer dynamically shapes internal-force oscillations to guarantee persistent carrier motion. Preliminary experimental results on multirotor UAVs validate the model assumptions, and numerical simulations demonstrate that the method successfully prevents carrier stagnation, achieves accurate load tracking, and generates physically feasible trajectories with smooth velocity profiles. The proposed framework not only advances the state of the art but also offers a reliable, versatile solution for future real-world applications requiring load transportation by coordinated non-stopping flying carriers.
Robust Recovery and Control of Cyber-physical Discrete Event Systems under Actuator Attacks
Critical real-world applications strongly rely on Cyber-physical systems (CPS), but their dependence on communication networks introduces significant security risks, as attackers can exploit vulnerabilities to compromise their integrity and availability. This work explores the topic of cybersecurity in the context of CPS modeled as discrete event systems (DES), focusing on recovery strategies following the detection of cyberattacks. Specifically, we address actuator enablement attacks and propose a method that preserves the system's full valid behavior under normal conditions. Upon detecting an attack, our proposed solution aims to guide the system toward a restricted yet robust behavior, ensuring operational continuity and resilience. Additionally, we introduce a property termed AE-robust recoverability, which characterizes the necessary and sufficient conditions for recovering a system from attacks while preventing further vulnerabilities. Finally, we showcase the proposed solution through a case study based on a manufacturing system.
comment: This work has been accepted for publication in the 64th IEEE Conference on Decision and Control (CDC). The final published version will be available on IEEE Xplore
Robust Closed-Form Control for MIMO Nonlinear Systems under Conflicting Time-Varying Hard and Soft Constraints
This paper introduces a novel robust closed-form control law to handle time-varying hard and soft constraints in uncertain high-relative-degree nonlinear MIMO systems. These constraints represent spatiotemporal specifications in mechanical systems' operational space, with hard constraints ensuring safety-critical requirements and soft constraints encoding performance or task objectives. Initially, all constraints are consolidated into two separate scalar time-varying hard and soft constraint functions, whose positive level sets define feasible regions. A closed-form control law is developed to enforce these constraints using appropriately designed reciprocal barriers and nonlinear transformation functions. When conflicts between hard and soft constraints arise, the control law prioritizes hard constraints by virtually relaxing soft constraints via a dynamic relaxation law. Notably, the proposed control law maintains low complexity by avoiding approximation schemes for coping with system uncertainties. Simulation results confirm the effectiveness of the proposed method.
comment: 18 pages, 6 figures
Data-Driven Estimation of Quadrotor Motor Efficiency via Residual Minimization
A data-driven framework is proposed for online estimation of quadrotor motor efficiency via residual minimization. The problem is formulated as a constrained nonlinear optimization that minimizes trajectory residuals between measured flight data and predictions generated by a quadrotor dynamics model. A sliding-window strategy enables online estimation, and the optimization is efficiently solved using an iteratively reweighted least squares (IRLS) scheme combined with a primal-dual interior-point method, with inequality constraints enforced through a logarithmic barrier function. Robust z-score weighting is employed to reject outliers, which is particularly effective in motor clipping scenarios where the proposed estimator exhibits smaller spikes than an EKF baseline. Compared to traditional filter-based approaches, the batch-mode formulation offers greater flexibility by selectively incorporating informative data segments. This structure is well-suited for onboard implementation, particularly for applications such as fault detection and isolation (FDI), health monitoring, and predictive maintenance in aerial robotic systems. Simulation results under various degradation scenarios demonstrate the accuracy and robustness of the proposed estimator.
High-Order Quarter-Wave Plate Optimization for Linear Birefringence Suppression in Reflective FOCS
Fiber optic current sensors (FOCS) are widely adopted in modern power grids due to high sensitivity, excellent insulation, and strong immunity to electromagnetic interference. This prominence necessitates precise investigation into their error sources and corresponding optimization. This study examines reflective FOCS based on the Faraday effect. A theoretical model is established to simulate phase error caused by linear birefringence from the quarter-wave plate. Conventional methods using circular birefringence are analyzed, revealing inherent limitations. Innovatively, a compensation strategy employing high-order quarter-wave plates is proposed to effectively eliminate linear birefringence effects. This approach significantly enhances the accuracy and practicality of FOCS in precision metrology.
Exponential convergence of multiagent systems with lack of connection
Finding conditions ensuring consensus, i.e. convergence to a common value, for a networked system is of crucial interest, both for theoretical reasons and applications. This goal is harder to achieve when connections between agents are temporarily lost. Here, we prove that known conditions (introduced by Moreau) ensure an exponential convergence to consensus, with explicit rate of convergence. The key result is related to the length of the graph (i.e. the number of connections to reach a common agent): if this is large, then convergence is slow. This general result also provides conditions for convergence of second-order cooperative systems with lack of connections.
Efficient LLM Inference over Heterogeneous Edge Networks with Speculative Decoding
Large language model (LLM) inference at the network edge is a promising serving paradigm that leverages distributed edge resources to run inference near users and enhance privacy. Existing edge-based LLM inference systems typically adopt autoregressive decoding (AD), which only generates one token per forward pass. This iterative process, compounded by the limited computational resources of edge nodes, results in high serving latency and constrains the system's ability to support multiple users under growing demands.To address these challenges, we propose a speculative decoding (SD)-based LLM serving framework that deploys small and large models across heterogeneous edge nodes to collaboratively deliver inference services. Specifically, the small model rapidly generates draft tokens that the large model verifies in parallel, enabling multi-token generation per forward pass and thus reducing serving latency. To improve resource utilization of edge nodes, we incorporate pipeline parallelism to overlap drafting and verification across multiple inference tasks. Based on this framework, we analyze and derive a comprehensive latency model incorporating both communication and inference latency. Then, we formulate a joint optimization problem for speculation length, task batching, and wireless communication resource allocation to minimize total serving latency. To address this problem, we derive the closed-form solutions for wireless communication resource allocation, and develop a dynamic programming algorithm for joint batching and speculation control strategies. Experimental results demonstrate that the proposed framework achieves lower serving latency compared to AD-based serving systems. In addition,the proposed joint optimization method delivers up to 44.9% latency reduction compared to benchmark schemes.
pyspect: An Extensible Toolbox for Automatic Construction of Temporal Logic Trees via Reachability Analysis
In this paper, we present pyspect, a Python toolbox that simplifies the use of reachability analysis for temporal logic problems. Currently, satisfying complex requirements in cyber-physical systems requires significant manual effort and domain expertise to develop the underlying reachability programs. This high development effort limits the broader adoption of reachability analysis for complex verification problems. To address this, pyspect provides a method-agnostic approach to performing reachability analysis for verifying a temporal logic specification via temporal logic trees (TLTs). It enables the specification of complex safety and liveness requirements using high-level logic formulations that are independent of any particular reachability technique or set representation. As a result, pyspect allows for the comparison of different reachability implementations, such as Hamilton-Jacobi and Hybrid Zonotope-based reachability analysis, for the same temporal logic specification. This design separates the concerns of implementation developers (who develop numerical procedures for reachability) and end-users (who write specifications). Through a simple vehicle example, we demonstrate how pyspect simplifies the synthesis of reachability programs, promotes specification reusability, and facilitates side-by-side comparisons of reachability techniques for complex tasks.
comment: To be published in the 64th IEEE Conference on Decision and Control
Edge-to-Cloud Computations-as-a-Service in Software-Defined Energy Networks for Smart Grids
Modern power grids face an acute mismatch between where data is generated and where it can be processed: protection relays, EV (Electric Vehicle) charging, and distributed renewables demand millisecond analytics at the edge, while energy-hungry workloads often sit in distant clouds leading to missed real-time deadlines and wasted power. We address this by proposing, to our knowledge, the first-ever SDEN (Software Defined Energy Network) for CaaS (Computations-as-a-Service) that unifies edge, fog, and cloud compute with 5G URLLC (Ultra-Reliable Low-Latency Communications), SDN (Software Defined Networking), and NFV (Network Functions Virtualization) to co-optimize energy, latency, and reliability end-to-end. Our contributions are threefold: (i) a joint task offloading formulation that couples computation placement with network capacity under explicit URLLC constraints; (ii) a feasibility preserving, lightweight greedy heuristic that scales while closely tracking optimal energy and latency trade-offs; and (iii) a tiered AI (Artificial Intelligence) pipeline-reactive at the edge, predictive in the fog, strategic in the cloud-featuring privacy-preserving, federated GNNs (Graph Neural Networks) for fault detection and microgrid coordination. Unlike prior edge-only or cloud-only schemes, SDEN turns fragmented grid compute into a single, programmable substrate that delivers dependable, energy-aware, real time analytics establishing a first-ever, software defined path to practical, grid-scale CaaS.
Utilizing Bayesian Optimization for Timetable-Independent Railway Junction Performance Determination
The efficiency of railway infrastructure is significantly influenced by the mix of trains that utilize it, as different service types have competing operational requirements. While freight services might require extended service times, passenger services demand more predictable schedules. Traditional methods for addressing long-term traffic assignment problems often rely on fixed-value capacity limitations, determined based on specific assumptions about traffic composition. This paper introduces a methodology for determining timetable-independent capacity within the traffic rate assignment problem, enabling the calculation of junction capacities under dynamic traffic distributions. We solve the underlying non-linear constrained optimization problem maximizing the traffic throughput using Bayesian optimization (BO). This setting combines a known objective function with expensive- to-compute capacity constraints, motivating an adaption of standard BO problems, where objective functions are usually unknown. We tailor the acquisition process in BO to this specific setting and increase performance by incorporating prior knowledge about the shape of the constraint functions into the Gaussian process surrogate model. Our derived approaches are benchmarked on a railway junction near Paris, significantly outperforming fixed traffic composition models and highlighting the benefits of dynamic capacity allocation.
Visible Light Communication for Vehicular Networks: A Tutorial
The advent of the fifth-generation technology promises to bring about more vertical applications and emerging services that include vehicular networks and intelligent transportation systems (ITSs). To achieve their vision of real-time and safetyapplications, vehicular networks rely on short-range to medium-range communications. One emerging technology that aims to provide reliability and high-data rate in short-range communications is the visible light communications (VLC). Due to its remarkable advantages, some studies have recently investigated the integration of VLC in vehicular networks and ITSs. Despite their attractive features, such networks also face several implementation issues. This paper provides an extended tutorial on the implementation of VLC-based vehicular networks. To begin with, we present the implementation characteristics of these systems and discuss some related issues. The underlying system considers a general structure with transmitters, channels, and receivers based on photodetectors and cameras, as well as standardization efforts and types of topologies. In addition, we discuss the impact of the sun and artificial light sources, flickering, dimming, throughput enhancement, uplink security, and mobility on practical implementation. Finally, we highlight some key challenges and potential solutions and provide some directions for future research investigations that could constitute an advancement toward the development of commercial VLC-based vehicular systems.
Establishing assembly-oriented modular product architectures through Design for Assembly enhanced Modular Function Deployment
Modular product design has become a strategic enabler for companies seeking to balance product variety, operational efficiency, and market responsiveness, making the alignment between modular architecture and manufacturing considerations increasingly critical. Modular Function Deployment (MFD) is a widely adopted method for defining modular product architectures, yet it lacks systematic support for assembly considerations during early concept and system-level development. This limitation increases the risk of delayed production ramp-up and lifecycle inefficiencies. This paper proposes a set of enhancements to MFD that integrate Design for Assembly (DFA) logic into architectural synthesis. The extended method introduces structured heuristics, assembly-oriented module drivers, a coded interface taxonomy, and quantitative metrics for assessing assembly feasibility and automation readiness. These additions preserve compatibility with standard MFD workflows while enriching decision-making with traceable, production-informed reasoning. An illustrative case study involving a handheld leaf blower demonstrates the method's usability and effectiveness. The redesigned architecture shows reduced assembly effort, simplified interfaces, and increased automation potential. By supporting early-stage evaluation of architectural alternatives through an assembly lens, the method enables faster transition to efficient volume production and provides a foundation for continuous improvement throughout the product lifecycle.
PhysHSI: Towards a Real-World Generalizable and Natural Humanoid-Scene Interaction System
Deploying humanoid robots to interact with real-world environments--such as carrying objects or sitting on chairs--requires generalizable, lifelike motions and robust scene perception. Although prior approaches have advanced each capability individually, combining them in a unified system is still an ongoing challenge. In this work, we present a physical-world humanoid-scene interaction system, PhysHSI, that enables humanoids to autonomously perform diverse interaction tasks while maintaining natural and lifelike behaviors. PhysHSI comprises a simulation training pipeline and a real-world deployment system. In simulation, we adopt adversarial motion prior-based policy learning to imitate natural humanoid-scene interaction data across diverse scenarios, achieving both generalization and lifelike behaviors. For real-world deployment, we introduce a coarse-to-fine object localization module that combines LiDAR and camera inputs to provide continuous and robust scene perception. We validate PhysHSI on four representative interactive tasks--box carrying, sitting, lying, and standing up--in both simulation and real-world settings, demonstrating consistently high success rates, strong generalization across diverse task goals, and natural motion patterns.
comment: Project website: https://why618188.github.io/physhsi/
Optimal Multi-Modal Transportation and Electric Power Flow: The Value of Coordinated Dynamic Operation
The electrification of transportation represents a critical challenge in the global transition toward net-zero emissions, as the sector often accounts for more than one-quarter of national energy consumption. Achieving this transformation requires not only widespread adoption of electric vehicles (EVs) but also their seamless integration into interdependent infrastructure systems-specifically, the transportation-electricity nexus (TEN). This paper develops an optimal multi-modal transportation and electric power flow (OMTEPF) model to evaluate the benefits of coordinated, dynamic system operation. Building on recent advances in hetero-functional graph theory, the framework enables joint optimization of five key operational decisions in intelligent TEN management: vehicle dispatch, route choice, charging station queuing, coordinated charging, and vehicle-to-grid stabilization. The mesoscopic, dynamic model explicitly represents individual EVs and their state-of-charge trajectories, thereby extending beyond the prevailing literature's focus on static, macroscopic traffic assignment. It further captures the full scope of the TEN as a system-of-systems, incorporating five distinct charging modalities: private residential, private commercial, wired public commercial, inductive public, and discharging. On the power system side, an IV-ACOPF formulation ensures globally optimal solutions to the electrical subproblems. Comparative analysis demonstrates the substantial value of coordinated TEN operation relative to the status quo of siloed, uncoordinated infrastructure management. This work provides both a novel methodological contribution and actionable insights for the co-design and operation of next-generation sustainable mobility-energy systems.
comment: 31 pages, 9 figures
Observability and parameter estimation of a generic model for aggregated distributed energy resources
We propose a novel framework for estimating the parameters of an aggregated distributed energy resources (der_a) model. First, we introduce a rigorous method to determine whether all model parameters are estimable. When they are not, our approach identifies the subset of parameters that can be estimated. The proposed framework offers new insights into the number and specific parameters that can be reliably estimated based on commonly available measurements. It also highlights the limitations of calibrating such models. Second, we introduce a Kalman filtering method to calibrate the der_a model. Since we account for nonlinear effects such as saturation and deadbands, we develop a specific mechanism to handle smoothing functions within the Kalman filter. Specifically, we consider the extended and the unscented Kalman filter. We demonstrate the effectiveness of the proposed framework on a modified IEEE 34-node distribution feeder with inverter-based resources. Our findings align with the North American Electric Reliability Corporation's parameterization guideline and underscore the importance of model calibration in accurately capturing the collective dynamics of distributed energy resources installed on distribution systems.
Quantum Deception: Honey-X Deception using Quantum Games
In this paper, we develop a framework for deception in quantum games, extending the Honey-X paradigm from classical zero-sum settings into the quantum domain. Building on a view of deception in classical games as manipulation of a player's perception of the payoff matrix, we formalize quantum deception as controlled perturbations of the payoff Hamiltonian subject to a deception budget. We show that when victims are aware of possible deception, their equilibrium strategies surprisingly coincide with those of naive victims who fully trust the deceptive Hamiltonian. This equivalence allows us to cast quantum deception as a bilevel optimization problem, which can be reformulated into a bilinear semidefinite program. To illustrate the framework, we present simulations on quantum versions of the Penny Flip game, demonstrating how quantum strategy spaces and non-classical payoffs can amplify the impact of deception relative to classical formulations.
comment: Submitted to 2026 American Control Conference (ACC), New Orleans, LA
Coherent Load Profile Synthesis with Conditional Diffusion for LV Distribution Network Scenario Generation
Limited visibility of power distribution network power flows at the low voltage level presents challenges to both distribution network operators from a planning perspective and distribution system operators from a congestion management perspective. Forestalling these challenges through scenario analysis is confounded by the lack of realistic and coherent load data across representative distribution feeders. Load profiling approaches often rely on summarising demand through typical profiles, which oversimplifies the complexity of substation-level operations and limits their applicability in specific power system studies. Sampling methods, and more recently generative models, have attempted to address this through synthesising representative loads from historical exemplars; however, while these approaches can approximate load shapes to a convincing degree of fidelity, the co-behaviour between substations, which ultimately impacts higher voltage level network operation, is often overlooked. This limitation will become even more pronounced with the increasing integration of low-carbon technologies, as estimates of base loads fail to capture load diversity. To address this gap, a Conditional Diffusion model for synthesising daily active and reactive power profiles at the low voltage distribution substation level is proposed. The evaluation of fidelity is demonstrated through conventional metrics capturing temporal and statistical realism, as well as power flow modelling. The results show synthesised load profiles are plausible both independently and as a cohort in a wider power systems context. The Conditional Diffusion model is benchmarked against both naive and state-of-the-art models to demonstrate its effectiveness in producing realistic scenarios on which to base sub-regional power distribution network planning and operations.
ORN-CBF: Learning Observation-conditioned Residual Neural Control Barrier Functions via Hypernetworks
Control barrier functions (CBFs) have been demonstrated as an effective method for safety-critical control of autonomous systems. Although CBFs are simple to deploy, their design remains challenging, motivating the development of learning-based approaches. Yet, issues such as suboptimal safe sets, applicability in partially observable environments, and lack of rigorous safety guarantees persist. In this work, we propose observation-conditioned neural CBFs based on Hamilton-Jacobi (HJ) reachability analysis, which approximately recover the maximal safe sets. We exploit certain mathematical properties of the HJ value function, ensuring that the predicted safe set never intersects with the observed failure set. Moreover, we leverage a hypernetwork-based architecture that is particularly suitable for the design of observation-conditioned safety filters. The proposed method is examined both in simulation and hardware experiments for a ground robot and a quadcopter. The results show improved success rates and generalization to out-of-domain environments compared to the baselines.
Learning Satellite Attitude Dynamics with Physics-Informed Normalising Flow
Attitude control is a fundamental aspect of spacecraft operations. Model Predictive Control (MPC) has emerged as a powerful strategy for these tasks, relying on accurate models of the system dynamics to optimize control actions over a prediction horizon. In scenarios where physics models are incomplete, difficult to derive, or computationally expensive, machine learning offers a flexible alternative by learning the system behavior directly from data. However, purely data-driven models often struggle with generalization and stability, especially when applied to inputs outside their training domain. To address these limitations, we investigate the benefits of incorporating Physics-Informed Neural Networks (PINNs) into the learning of spacecraft attitude dynamics, comparing their performance with that of purely data-driven approaches. Using a Real-valued Non-Volume Preserving (Real NVP) neural network architecture with a self-attention mechanism, we trained several models on simulated data generated with the Basilisk simulator. Two training strategies were considered: a purely data-driven baseline and a physics-informed variant to improve robustness and stability. Our results demonstrate that the inclusion of physics-based information significantly enhances the performance in terms of the mean relative error of the best architectures found by 27.08%. These advantages are particularly evident when the learned models are integrated into an MPC framework, where PINN-based models consistently outperform their purely data-driven counterparts in terms of control accuracy and robustness, yielding improvements of up to 42.86% in performance stability error and increased robustness-to-noise.
A Unified Framework for Innovation-based Stochastic and Deterministic Event Triggers
Resources such as bandwidth and energy are limited in many wireless communications use cases, especially when large numbers of sensors and fusion centers need to exchange information frequently. One opportunity to overcome resource constraints is the use of event-based transmissions and estimation to transmit only information that contributes significantly to the reconstruction of the system's state. The design of efficient triggering policies and estimators is crucial for successful event-based transmissions. While previously deterministic and stochastic event triggering policies have been treated separately, this paper unifies the two approaches and gives insights into the design of consistent trigger-matching estimators. Two different estimators are presented, and different pairs of triggers and estimators are evaluated through simulation studies.
comment: 8 pages, 5 figures, presented at FUSION 2025
Product Digital Twin Supporting End-of-life Phase of Electric Vehicle Batteries Utilizing Product-Process-Resource Asset Network
In a circular economy, products in their end-of-life phase should be either remanufactured or recycled. Both of these processes are crucial for sustainability and environmental conservation. However, manufacturers frequently do not support these processes enough in terms of not sharing relevant data about the products nor their (re-)manufacturing processes. This paper proposes to accompany each product with a digital twin technology, specifically the Product Digital Twin (PDT), which can carry information for facilitating and optimizing production and remanufacturing processes. This paper introduces a knowledge representation called Bi-Flow Product-Process-Resource Asset Network (Bi-PAN). Bi-PAN extends a well-proven Product-Process-Resource Asset Network (PAN) paradigm by integrating both assembly and disassembly workflows into a single information model. Such networks enable capturing relevant relationships across products, production resources, manufacturing processes, and specific production operations that have to be done in the manufacturing phase of a product. The proposed approach is demonstrated in a use-case of disassembling electric vehicle (EV) batteries. By utilizing PDTs with Bi-PAN knowledge models, challenges associated with disassembling of EV batteries can be solved flexibly and efficiently for various battery types, enhancing the sustainability of the EV battery life-cycle management.
comment: \copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Manipulation of Elasto-Flexible Cables with Single or Multiple UAVs
This work considers a large class of systems composed of multiple quadrotors manipulating deformable and extensible cables. The cable is described via a discretized representation, which decomposes it into linear springs interconnected through lumped-mass passive spherical joints. Sets of flat outputs are found for the systems. Numerical simulations support the findings by showing cable manipulation relying on flatness-based trajectories. Eventually, we present an experimental validation of the effectiveness of the proposed discretized cable model for a two-robot example. Moreover, a closed-loop controller based on the identified model and using cable-output feedback is experimentally tested.
Multiple input tangential interpolation-driven damage detection of a jet trainer aircraft
The problem of damage detection and identification is of interest for many aerospace and aeronautical engineering systems. However, relevant literature mostly focuses on subsystems and parts, rather than full airframes. In structural dynamics, modal parameters, such as natural frequencies and mode shapes, from any structure are the main building blocks of vibration-based damage detection. However, traditional comparisons of these parameters are often ambiguous in complex systems, complicating damage detection and assessment. The modified total modal assurance criterion (MTMAC), an index well-known in the field of finite element model updating, is extended to address this challenge and is proposed as an index for damage identification and severity assessment. To support the requirement for precise and robust modal identification of Structural Health Monitoring (SHM), the improved Loewner Framework (iLF), known for its reliability and computational performance, is pioneeringly employed within SHM. Since the MTMAC is proposed solely as a damage identification and severity assessment index, the coordinate modal assurance criterion (COMAC), also a well-established tool, but for damage localisation using mode shapes, is used for completeness. The iLF SHM capabilities are validated through comparisons with traditional methods, including least-squares complex exponential and stochastic subspace identification with canonical variate analysis on a numerical case study of a cantilever beam. Furthermore, the MTMAC is validated against the traditional vibration-based approach, which involves directly comparing natural frequencies and mode shapes. Finally, an experimental dataset from a BAE Systems Hawk T1A jet trainer ground vibration test is used to demonstrate the iLF and MTMAC capabilities on a real-life, real-size SHM problem, showing their effectiveness in detecting and assessing damage.
General formulation of an analytic, Lipschitz continuous control allocation for thrust-vectored controlled rigid-bodies
This study introduces a systematic and scalable method for arbitrary rigid-bodies equipped with vectorized thrusters. Two novel solutions are proposed: a closed-form, Lipschitz continuous mapping that ensures smooth actuator orientation references, and a convex optimization formulation capable of handling practical actuator constraints such as thrust saturation and angular rate limits. Both methods leverage the null-space structure of the allocation mapping to perform singularity avoidance while generating sub-optimal yet practical solutions. The effectiveness and generality of the proposed framework are demonstrated through numerical simulations on a 3DOF marine vessel and a 6DOF aerial quadcopter.
Topology optimization of decoupling feeding networks for antenna arrays
Near-field and radiation coupling between nearby radiating elements is unavoidable, and it is considered a limiting factor for applications in wireless communications and active sensing. This article proposes a density-based topology optimization approach to design decoupling networks for such systems. The decoupling networks are designed based on a multi-objective optimization problem with the radiating elements replaced by their time-domain impulse response for efficient computations and to enable the solution of the design problem using gradient-based optimization methods. We use the adjoint-field method to compute the gradients of the optimization objectives. Additionally, nonlinear filters are applied during the optimization procedure to impose minimum-size control on the optimized designs. We demonstrate the concept by designing the decoupling network for a two-element planar antenna array; the antenna is designed in a separate optimization problem. The optimized decoupling networks provide a signal path that destructively interferes with the coupling between the radiating elements while preserving their individual matching to the feeding ports. Compact decoupling networks capable of suppressing the mutual coupling by more than 10 dB between two closely separated planar antennas operating around 2.45 GHz are presented and validated experimentally.
comment: Accepted version of the manuscript published in IEEE Transactions on Antennas and Propagation, 2025. The final version is available at https://doi.org/10.1109/TAP.2025.3621265
A Taylor Series Approach to Correction of Input Errors in Gaussian Process Regression
Gaussian Processes (GPs) are widely recognized as powerful non-parametric models for regression and classification. Traditional GP frameworks predominantly operate under the assumption that the inputs are either accurately known or subject to zero-mean noise. However, several real-world applications such as mobile sensors have imperfect localization, leading to inputs with biased errors. These biases can typically be estimated through measurements collected over time using, for example, Kalman filters. To avoid recomputation of the entire GP model when better estimates of the inputs used in the training data become available, we introduce a technique for updating a trained GP model to incorporate updated estimates of the inputs. By leveraging the differentiability of the mean and covariance functions derived from the squared exponential kernel, a second-order correction algorithm is developed to update the trained GP models. Precomputed Jacobians and Hessians of kernels enable real-time refinement of the mean and covariance predictions. The efficacy of the developed approach is demonstrated using two simulation studies, with error analyses revealing improvements in both predictive accuracy and uncertainty quantification.
comment: Improving the paper with better results and adding experimental results to publish again
Adaptive DRL for IRS Mirror Orientation in Dynamic OWC Networks
Intelligent reflecting surfaces (IRSs) have emerged as a promising solution to mitigate line-of-sight (LoS) blockages and enhance signal coverage in optical wireless communication (OWC) systems with minimal additional power. In this work, we consider a mirror-based IRS to assist a dynamic indoor visible light communication (VLC) environment. We formulate an optimization problem that aims to maximize the sum rate by adjusting the orientation of the IRS mirrors. To enable real-time adaptability, the problem is modelled as a Markov decision process (MDP), and a deep reinforcement learning (DRL) algorithm is developed based on the deterministic policy gradient for real-time mirror-based IRS optimization in dynamic VLC networks. The proposed DRL is employed to optimize mirror orientation toward mobile users under blockage and mobility constraints. Simulation results demonstrate that our proposed DRL algorithm outperforms the conventional deep Q- learning (DQL) algorithm and achieves substantial improvements in sum rate compared to random-orientation IRS configurations
comment: 6 pages, 5 figures
PowerPlots.jl: An Open Source Power Grid Visualization and Data Analysis Framework for Academic Research
Data visualization is essential for developing an understanding of a complex system. The power grid is one of the most complex systems in the world and effective power grid research visualization software must 1) be easy to use, 2) support unique data that may arise in research, and 3) be capable of creating custom figures for publication and presentation. However, no current software addresses all three of these needs. PowerPlots is an open-source data visualization tool for power grids that does address these needs. In addition, several tools created to support this software facilitate the analysis of power grid data by transforming the data into graph topology or data-frame data formats that are more compatible for some analyses. In this work, we use PowerPlots to investigate several case studies that involve exploring power grid data. These case studies demonstrate the valuable insights that are possible when using network visualization and how it can be applied to research applications.
Adaptive Decentralized Queue Disclosure for Impatient Tenants in Edge and Non-terrestrial Systems
We study how queue-state information disclosures affect impatient tenants in multi-tenant edge systems. We propose an information-bulletin strategy in which each queue periodically broadcasts two Markov models. One is a model of steady-state service-rate behavior and the other a model of the queue length inter-change times. Tenants autonomously decide to renege or jockey based on this information. The queues observe tenant responses and adapt service rates via a learned, rule-based predictive policy designed for decentralized, partially-observed, and time-varying environments. We compare this decentralized, information-driven policy to the classical, centralized Markov Decision Process (MDP) hedging-point policy for M/M/2 systems. Numerical experiments quantify the tradeoffs in average delay, impatience and robustness to stale information. Results show that when full, instantaneous state information and stationarity hold, the hedging-point policy yields less impatience but this diminishes as information becomes partial or stale. The rule-based predictive policy on the other hand is more robust to staleness in dispatched information, making it conducive for conditions typical of edge cloud and non-terrestrial deployments.
comment: Accepted by NFV-SDN'25 Doctoral Symposium
Modeling nonuniform energy decay through the modal decomposition of acoustic radiance transfer (MoD-ART)
Modeling late reverberation in real-time interactive applications is a challenging task when multiple sound sources and listeners are present in the same environment. This is especially problematic when the environment is geometrically complex and/or features uneven energy absorption (e.g. coupled volumes), because in such cases the late reverberation is dependent on the sound sources' and listeners' positions, and therefore must be adapted to their movements in real time. We present a novel approach to the task, named modal decomposition of acoustic radiance transfer (MoD-ART), which can handle highly complex scenarios with efficiency. The approach is based on the geometrical acoustics method of acoustic radiance transfer, from which we extract a set of energy decay modes and their positional relationships with sources and listeners. In this paper, we describe the physical and mathematical significance of MoD-ART, highlighting its advantages and applicability to different scenarios. Through an analysis of the method's computational complexity, we show that it compares very favorably with ray-tracing. We also present simulation results showing that MoD-ART can capture multiple decay slopes and flutter echoes.
Model Predictive Inferential Control of Neural State-Space Models for Autonomous Vehicle Motion Planning
Model predictive control (MPC) has proven useful in enabling safe and optimal motion planning for autonomous vehicles. In this paper, we investigate how to achieve MPC-based motion planning when a neural state-space model represents the vehicle dynamics. As the neural state-space model will lead to highly complex, nonlinear and nonconvex optimization landscapes, mainstream gradient-based MPC methods will be computationally too heavy to be a viable solution. In a departure, we propose the idea of model predictive inferential control (MPIC), which seeks to infer the best control decisions from the control objectives and constraints. Following the idea, we convert the MPC problem for motion planning into a Bayesian state estimation problem. Then, we develop a new particle filtering/smoothing approach to perform the estimation. This approach is implemented as banks of unscented Kalman filters/smoothers and offers high sampling efficiency, fast computation, and estimation accuracy. We evaluate the MPIC approach through a simulation study of autonomous driving in different scenarios, along with an exhaustive comparison with gradient-based MPC. The results show that the MPIC approach has considerable computational efficiency, regardless of complex neural network architectures, and shows the capability to solve large-scale MPC problems for neural state-space models.
Systems and Control (EESS)
Analysis of the Geometric Heat Flow Equation: Computing Geodesics in Real-Time with Convergence Guarantees
We present an analysis on the convergence properties of the so-called geometric heat flow equation for computing geodesics (shortest-path~curves) on Riemannian manifolds. Computing geodesics numerically in real-time has become an important capability in several fields, including control and motion planning. The geometric heat flow equation involves solving a parabolic partial differential equation whose solution is a geodesic. In practice, solving this PDE numerically can be done efficiently, and tends to be more numerically stable and exhibit a better rate of convergence compared to numerical optimization. We prove that the geometric heat flow equation is globally exponentially stable in $L_2$ if the curvature of the Riemannian manifold is not too positive, and that asymptotic convergence in $L_2$ is always guaranteed. We also present a pseudospectral method that leverages Chebyshev polynomials to accurately compute geodesics in only a few milliseconds for non-contrived manifolds. Our analysis was verified with our custom pseudospectral method by computing geodesics on common non-Euclidean surfaces, and in feedback for a contraction-based controller with a non-flat metric for a nonlinear system.
Ego-Vision World Model for Humanoid Contact Planning
Enabling humanoid robots to exploit physical contact, rather than simply avoid collisions, is crucial for autonomy in unstructured environments. Traditional optimization-based planners struggle with contact complexity, while on-policy reinforcement learning (RL) is sample-inefficient and has limited multi-task ability. We propose a framework combining a learned world model with sampling-based Model Predictive Control (MPC), trained on a demonstration-free offline dataset to predict future outcomes in a compressed latent space. To address sparse contact rewards and sensor noise, the MPC uses a learned surrogate value function for dense, robust planning. Our single, scalable model supports contact-aware tasks, including wall support after perturbation, blocking incoming objects, and traversing height-limited arches, with improved data efficiency and multi-task capability over on-policy RL. Deployed on a physical humanoid, our system achieves robust, real-time contact planning from proprioception and ego-centric depth images. Website: https://ego-vcp.github.io/
Smooth Spatiotemporal Tube Synthesis for Prescribed-Time Reach-Avoid-Stay Control
In this work, we address the issue of controller synthesis for a control-affine nonlinear system to meet prescribed time reach-avoid-stay specifications. Our goal is to improve upon previous methods based on spatiotemporal tubes (STTs) by eliminating the need for circumvent functions, which often lead to abrupt tube modifications and high control effort. We propose an adaptive framework that constructs smooth STTs around static unsafe sets, enabling continuous avoidance while guiding the system toward the target within the prescribed time. A closed-form, approximation-free control law is derived to ensure the system trajectory remains within the tube and satisfies the RAS task. The effectiveness of the proposed approach is demonstrated through a case study, showing a significant reduction in control effort compared to prior methods.
IntersectioNDE: Learning Complex Urban Traffic Dynamics based on Interaction Decoupling Strategy SC 2025
Realistic traffic simulation is critical for ensuring the safety and reliability of autonomous vehicles (AVs), especially in complex and diverse urban traffic environments. However, existing data-driven simulators face two key challenges: a limited focus on modeling dense, heterogeneous interactions at urban intersections - which are prevalent, crucial, and practically significant in countries like China, featuring diverse agents including motorized vehicles (MVs), non-motorized vehicles (NMVs), and pedestrians - and the inherent difficulty in robustly learning high-dimensional joint distributions for such high-density scenes, often leading to mode collapse and long-term simulation instability. We introduce City Crossings Dataset (CiCross), a large-scale dataset collected from a real-world urban intersection, uniquely capturing dense, heterogeneous multi-agent interactions, particularly with a substantial proportion of MVs, NMVs and pedestrians. Based on this dataset, we propose IntersectioNDE (Intersection Naturalistic Driving Environment), a data-driven simulator tailored for complex urban intersection scenarios. Its core component is the Interaction Decoupling Strategy (IDS), a training paradigm that learns compositional dynamics from agent subsets, enabling the marginal-to-joint simulation. Integrated into a scene-aware Transformer network with specialized training techniques, IDS significantly enhances simulation robustness and long-term stability for modeling heterogeneous interactions. Experiments on CiCross show that IntersectioNDE outperforms baseline methods in simulation fidelity, stability, and its ability to replicate complex, distribution-level urban traffic dynamics.
comment: Accepted by ITSC 2025
A Physics-Informed Reinforcement Learning Approach for Degradation-Aware Long-Term Charging Optimization in Batteries
Batteries degrade with usage and continuous cycling. This aging is typically reflected through the resistance growth and the capacity fade of battery cells. Over the years, various charging methods have been presented in the literature that proposed current profiles in order to enable optimal, fast, and/or health-conscious charging. However, very few works have attempted to make the ubiquitous Constant Current Constant Voltage (CCCV) charging protocol adaptive to the changing battery health as it cycles. This work aims to address this gap and proposes a framework that optimizes the constant current part of the CCCV protocol adapting to long-term battery degradation. Specifically, a physics-informed Reinforcement Learning (RL) approach has been used that not only estimates a key battery degradation mechanism, namely, Loss of Active Material (LAM), but also adjusts the current magnitude of CCCV as a result of this particular degradation. The proposed framework has been implemented by combining PyBamm, an open-source battery modeling tool, and Stable-baselines where the RL agent was trained using a Proximal Policy Optimization (PPO) network. Simulation results show the potential of the proposed framework for enhancing the widely used CCCV protocol by embedding physics information in RL algorithm. A comparative study of this proposed agent has also been discussed with 2 other charging protocols generated by a non-physics-based RL agent and a constant CCCV for all the cycles.
Constraint-Aware Reinforcement Learning via Adaptive Action Scaling
Safe reinforcement learning (RL) seeks to mitigate unsafe behaviors that arise from exploration during training by reducing constraint violations while maintaining task performance. Existing approaches typically rely on a single policy to jointly optimize reward and safety, which can cause instability due to conflicting objectives, or they use external safety filters that override actions and require prior system knowledge. In this paper, we propose a modular cost-aware regulator that scales the agent's actions based on predicted constraint violations, preserving exploration through smooth action modulation rather than overriding the policy. The regulator is trained to minimize constraint violations while avoiding degenerate suppression of actions. Our approach integrates seamlessly with off-policy RL methods such as SAC and TD3, and achieves state-of-the-art return-to-cost ratios on Safety Gym locomotion tasks with sparse costs, reducing constraint violations by up to 126 times while increasing returns by over an order of magnitude compared to prior methods.
The Role of Flexible Connection in Accelerating Load Interconnection in Distribution Networks
This paper investigates the role of flexible connection in accelerating the interconnection of large loads amid rising electricity demand from data centers and electrification. Flexible connection allows new loads to defer or curtail consumption during rare, grid-constrained periods, enabling faster access without major infrastructure upgrades. To quantify how flexible connection unlocks load hosting capacity, we formulate a flexibility-aware hosting capacity analysis problem that explicitly limits the number of utility-controlled interventions per year, ensuring infrequent disruption. Efficient solution methods are developed for this nonconvex problem and applied to real load data and test feeders. Empirical results reveal that modest flexibility, i.e., few interventions with small curtailments or delays, can unlock substantial hosting capacity. Theoretical analysis further explains and generalizes these findings, highlighting the broad potential of flexible connection.
A Faster and More Reliable Middleware for Autonomous Driving Systems
Ensuring safety in high-speed autonomous vehicles requires rapid control loops and tightly bounded delays from perception to actuation. Many open-source autonomy systems rely on ROS 2 middleware; when multiple sensor and control nodes share one compute unit, ROS 2 and its DDS transports add significant (de)serialization, copying, and discovery overheads, shrinking the available time budget. We present Sensor-in-Memory (SIM), a shared-memory transport designed for intra-host pipelines in autonomous vehicles. SIM keeps sensor data in native memory layouts (e.g., cv::Mat, PCL), uses lock-free bounded double buffers that overwrite old data to prioritize freshness, and integrates into ROS 2 nodes with four lines of code. Unlike traditional middleware, SIM operates beside ROS 2 and is optimized for applications where data freshness and minimal latency outweigh guaranteed completeness. SIM provides sequence numbers, a writer heartbeat, and optional checksums to ensure ordering, liveness, and basic integrity. On an NVIDIA Jetson Orin Nano, SIM reduces data-transport latency by up to 98% compared to ROS 2 zero-copy transports such as FastRTPS and Zenoh, lowers mean latency by about 95%, and narrows 95th/99th-percentile tail latencies by around 96%. In tests on a production-ready Level 4 vehicle running Autoware.Universe, SIM increased localization frequency from 7.5 Hz to 9.5 Hz. Applied across all latency-critical modules, SIM cut average perception-to-decision latency from 521.91 ms to 290.26 ms, reducing emergency braking distance at 40 mph (64 km/h) on dry concrete by 13.6 ft (4.14 m).
comment: 8 pages,7 figures, 8 tables
Trajectory control of a suspended load with non-stopping flying carriers
This paper presents the first closed-loop control framework for cooperative payload transportation with non-stopping flying carriers. Building upon grasp-matrix formulations and internal force redundancy, we propose a feedback wrench controller that actively regulates the payload's pose while an optimization layer dynamically shapes internal-force oscillations to guarantee persistent carrier motion. Preliminary experimental results on multirotor UAVs validate the model assumptions, and numerical simulations demonstrate that the method successfully prevents carrier stagnation, achieves accurate load tracking, and generates physically feasible trajectories with smooth velocity profiles. The proposed framework not only advances the state of the art but also offers a reliable, versatile solution for future real-world applications requiring load transportation by coordinated non-stopping flying carriers.
Robust Recovery and Control of Cyber-physical Discrete Event Systems under Actuator Attacks
Critical real-world applications strongly rely on Cyber-physical systems (CPS), but their dependence on communication networks introduces significant security risks, as attackers can exploit vulnerabilities to compromise their integrity and availability. This work explores the topic of cybersecurity in the context of CPS modeled as discrete event systems (DES), focusing on recovery strategies following the detection of cyberattacks. Specifically, we address actuator enablement attacks and propose a method that preserves the system's full valid behavior under normal conditions. Upon detecting an attack, our proposed solution aims to guide the system toward a restricted yet robust behavior, ensuring operational continuity and resilience. Additionally, we introduce a property termed AE-robust recoverability, which characterizes the necessary and sufficient conditions for recovering a system from attacks while preventing further vulnerabilities. Finally, we showcase the proposed solution through a case study based on a manufacturing system.
comment: This work has been accepted for publication in the 64th IEEE Conference on Decision and Control (CDC). The final published version will be available on IEEE Xplore
Robust Closed-Form Control for MIMO Nonlinear Systems under Conflicting Time-Varying Hard and Soft Constraints
This paper introduces a novel robust closed-form control law to handle time-varying hard and soft constraints in uncertain high-relative-degree nonlinear MIMO systems. These constraints represent spatiotemporal specifications in mechanical systems' operational space, with hard constraints ensuring safety-critical requirements and soft constraints encoding performance or task objectives. Initially, all constraints are consolidated into two separate scalar time-varying hard and soft constraint functions, whose positive level sets define feasible regions. A closed-form control law is developed to enforce these constraints using appropriately designed reciprocal barriers and nonlinear transformation functions. When conflicts between hard and soft constraints arise, the control law prioritizes hard constraints by virtually relaxing soft constraints via a dynamic relaxation law. Notably, the proposed control law maintains low complexity by avoiding approximation schemes for coping with system uncertainties. Simulation results confirm the effectiveness of the proposed method.
comment: 18 pages, 6 figures
Data-Driven Estimation of Quadrotor Motor Efficiency via Residual Minimization
A data-driven framework is proposed for online estimation of quadrotor motor efficiency via residual minimization. The problem is formulated as a constrained nonlinear optimization that minimizes trajectory residuals between measured flight data and predictions generated by a quadrotor dynamics model. A sliding-window strategy enables online estimation, and the optimization is efficiently solved using an iteratively reweighted least squares (IRLS) scheme combined with a primal-dual interior-point method, with inequality constraints enforced through a logarithmic barrier function. Robust z-score weighting is employed to reject outliers, which is particularly effective in motor clipping scenarios where the proposed estimator exhibits smaller spikes than an EKF baseline. Compared to traditional filter-based approaches, the batch-mode formulation offers greater flexibility by selectively incorporating informative data segments. This structure is well-suited for onboard implementation, particularly for applications such as fault detection and isolation (FDI), health monitoring, and predictive maintenance in aerial robotic systems. Simulation results under various degradation scenarios demonstrate the accuracy and robustness of the proposed estimator.
High-Order Quarter-Wave Plate Optimization for Linear Birefringence Suppression in Reflective FOCS
Fiber optic current sensors (FOCS) are widely adopted in modern power grids due to high sensitivity, excellent insulation, and strong immunity to electromagnetic interference. This prominence necessitates precise investigation into their error sources and corresponding optimization. This study examines reflective FOCS based on the Faraday effect. A theoretical model is established to simulate phase error caused by linear birefringence from the quarter-wave plate. Conventional methods using circular birefringence are analyzed, revealing inherent limitations. Innovatively, a compensation strategy employing high-order quarter-wave plates is proposed to effectively eliminate linear birefringence effects. This approach significantly enhances the accuracy and practicality of FOCS in precision metrology.
Exponential convergence of multiagent systems with lack of connection
Finding conditions ensuring consensus, i.e. convergence to a common value, for a networked system is of crucial interest, both for theoretical reasons and applications. This goal is harder to achieve when connections between agents are temporarily lost. Here, we prove that known conditions (introduced by Moreau) ensure an exponential convergence to consensus, with explicit rate of convergence. The key result is related to the length of the graph (i.e. the number of connections to reach a common agent): if this is large, then convergence is slow. This general result also provides conditions for convergence of second-order cooperative systems with lack of connections.
Efficient LLM Inference over Heterogeneous Edge Networks with Speculative Decoding
Large language model (LLM) inference at the network edge is a promising serving paradigm that leverages distributed edge resources to run inference near users and enhance privacy. Existing edge-based LLM inference systems typically adopt autoregressive decoding (AD), which only generates one token per forward pass. This iterative process, compounded by the limited computational resources of edge nodes, results in high serving latency and constrains the system's ability to support multiple users under growing demands.To address these challenges, we propose a speculative decoding (SD)-based LLM serving framework that deploys small and large models across heterogeneous edge nodes to collaboratively deliver inference services. Specifically, the small model rapidly generates draft tokens that the large model verifies in parallel, enabling multi-token generation per forward pass and thus reducing serving latency. To improve resource utilization of edge nodes, we incorporate pipeline parallelism to overlap drafting and verification across multiple inference tasks. Based on this framework, we analyze and derive a comprehensive latency model incorporating both communication and inference latency. Then, we formulate a joint optimization problem for speculation length, task batching, and wireless communication resource allocation to minimize total serving latency. To address this problem, we derive the closed-form solutions for wireless communication resource allocation, and develop a dynamic programming algorithm for joint batching and speculation control strategies. Experimental results demonstrate that the proposed framework achieves lower serving latency compared to AD-based serving systems. In addition,the proposed joint optimization method delivers up to 44.9% latency reduction compared to benchmark schemes.
pyspect: An Extensible Toolbox for Automatic Construction of Temporal Logic Trees via Reachability Analysis
In this paper, we present pyspect, a Python toolbox that simplifies the use of reachability analysis for temporal logic problems. Currently, satisfying complex requirements in cyber-physical systems requires significant manual effort and domain expertise to develop the underlying reachability programs. This high development effort limits the broader adoption of reachability analysis for complex verification problems. To address this, pyspect provides a method-agnostic approach to performing reachability analysis for verifying a temporal logic specification via temporal logic trees (TLTs). It enables the specification of complex safety and liveness requirements using high-level logic formulations that are independent of any particular reachability technique or set representation. As a result, pyspect allows for the comparison of different reachability implementations, such as Hamilton-Jacobi and Hybrid Zonotope-based reachability analysis, for the same temporal logic specification. This design separates the concerns of implementation developers (who develop numerical procedures for reachability) and end-users (who write specifications). Through a simple vehicle example, we demonstrate how pyspect simplifies the synthesis of reachability programs, promotes specification reusability, and facilitates side-by-side comparisons of reachability techniques for complex tasks.
comment: To be published in the 64th IEEE Conference on Decision and Control
Edge-to-Cloud Computations-as-a-Service in Software-Defined Energy Networks for Smart Grids
Modern power grids face an acute mismatch between where data is generated and where it can be processed: protection relays, EV (Electric Vehicle) charging, and distributed renewables demand millisecond analytics at the edge, while energy-hungry workloads often sit in distant clouds leading to missed real-time deadlines and wasted power. We address this by proposing, to our knowledge, the first-ever SDEN (Software Defined Energy Network) for CaaS (Computations-as-a-Service) that unifies edge, fog, and cloud compute with 5G URLLC (Ultra-Reliable Low-Latency Communications), SDN (Software Defined Networking), and NFV (Network Functions Virtualization) to co-optimize energy, latency, and reliability end-to-end. Our contributions are threefold: (i) a joint task offloading formulation that couples computation placement with network capacity under explicit URLLC constraints; (ii) a feasibility preserving, lightweight greedy heuristic that scales while closely tracking optimal energy and latency trade-offs; and (iii) a tiered AI (Artificial Intelligence) pipeline-reactive at the edge, predictive in the fog, strategic in the cloud-featuring privacy-preserving, federated GNNs (Graph Neural Networks) for fault detection and microgrid coordination. Unlike prior edge-only or cloud-only schemes, SDEN turns fragmented grid compute into a single, programmable substrate that delivers dependable, energy-aware, real time analytics establishing a first-ever, software defined path to practical, grid-scale CaaS.
Utilizing Bayesian Optimization for Timetable-Independent Railway Junction Performance Determination
The efficiency of railway infrastructure is significantly influenced by the mix of trains that utilize it, as different service types have competing operational requirements. While freight services might require extended service times, passenger services demand more predictable schedules. Traditional methods for addressing long-term traffic assignment problems often rely on fixed-value capacity limitations, determined based on specific assumptions about traffic composition. This paper introduces a methodology for determining timetable-independent capacity within the traffic rate assignment problem, enabling the calculation of junction capacities under dynamic traffic distributions. We solve the underlying non-linear constrained optimization problem maximizing the traffic throughput using Bayesian optimization (BO). This setting combines a known objective function with expensive- to-compute capacity constraints, motivating an adaption of standard BO problems, where objective functions are usually unknown. We tailor the acquisition process in BO to this specific setting and increase performance by incorporating prior knowledge about the shape of the constraint functions into the Gaussian process surrogate model. Our derived approaches are benchmarked on a railway junction near Paris, significantly outperforming fixed traffic composition models and highlighting the benefits of dynamic capacity allocation.
Visible Light Communication for Vehicular Networks: A Tutorial
The advent of the fifth-generation technology promises to bring about more vertical applications and emerging services that include vehicular networks and intelligent transportation systems (ITSs). To achieve their vision of real-time and safetyapplications, vehicular networks rely on short-range to medium-range communications. One emerging technology that aims to provide reliability and high-data rate in short-range communications is the visible light communications (VLC). Due to its remarkable advantages, some studies have recently investigated the integration of VLC in vehicular networks and ITSs. Despite their attractive features, such networks also face several implementation issues. This paper provides an extended tutorial on the implementation of VLC-based vehicular networks. To begin with, we present the implementation characteristics of these systems and discuss some related issues. The underlying system considers a general structure with transmitters, channels, and receivers based on photodetectors and cameras, as well as standardization efforts and types of topologies. In addition, we discuss the impact of the sun and artificial light sources, flickering, dimming, throughput enhancement, uplink security, and mobility on practical implementation. Finally, we highlight some key challenges and potential solutions and provide some directions for future research investigations that could constitute an advancement toward the development of commercial VLC-based vehicular systems.
Establishing assembly-oriented modular product architectures through Design for Assembly enhanced Modular Function Deployment
Modular product design has become a strategic enabler for companies seeking to balance product variety, operational efficiency, and market responsiveness, making the alignment between modular architecture and manufacturing considerations increasingly critical. Modular Function Deployment (MFD) is a widely adopted method for defining modular product architectures, yet it lacks systematic support for assembly considerations during early concept and system-level development. This limitation increases the risk of delayed production ramp-up and lifecycle inefficiencies. This paper proposes a set of enhancements to MFD that integrate Design for Assembly (DFA) logic into architectural synthesis. The extended method introduces structured heuristics, assembly-oriented module drivers, a coded interface taxonomy, and quantitative metrics for assessing assembly feasibility and automation readiness. These additions preserve compatibility with standard MFD workflows while enriching decision-making with traceable, production-informed reasoning. An illustrative case study involving a handheld leaf blower demonstrates the method's usability and effectiveness. The redesigned architecture shows reduced assembly effort, simplified interfaces, and increased automation potential. By supporting early-stage evaluation of architectural alternatives through an assembly lens, the method enables faster transition to efficient volume production and provides a foundation for continuous improvement throughout the product lifecycle.
PhysHSI: Towards a Real-World Generalizable and Natural Humanoid-Scene Interaction System
Deploying humanoid robots to interact with real-world environments--such as carrying objects or sitting on chairs--requires generalizable, lifelike motions and robust scene perception. Although prior approaches have advanced each capability individually, combining them in a unified system is still an ongoing challenge. In this work, we present a physical-world humanoid-scene interaction system, PhysHSI, that enables humanoids to autonomously perform diverse interaction tasks while maintaining natural and lifelike behaviors. PhysHSI comprises a simulation training pipeline and a real-world deployment system. In simulation, we adopt adversarial motion prior-based policy learning to imitate natural humanoid-scene interaction data across diverse scenarios, achieving both generalization and lifelike behaviors. For real-world deployment, we introduce a coarse-to-fine object localization module that combines LiDAR and camera inputs to provide continuous and robust scene perception. We validate PhysHSI on four representative interactive tasks--box carrying, sitting, lying, and standing up--in both simulation and real-world settings, demonstrating consistently high success rates, strong generalization across diverse task goals, and natural motion patterns.
comment: Project website: https://why618188.github.io/physhsi/
Optimal Multi-Modal Transportation and Electric Power Flow: The Value of Coordinated Dynamic Operation
The electrification of transportation represents a critical challenge in the global transition toward net-zero emissions, as the sector often accounts for more than one-quarter of national energy consumption. Achieving this transformation requires not only widespread adoption of electric vehicles (EVs) but also their seamless integration into interdependent infrastructure systems-specifically, the transportation-electricity nexus (TEN). This paper develops an optimal multi-modal transportation and electric power flow (OMTEPF) model to evaluate the benefits of coordinated, dynamic system operation. Building on recent advances in hetero-functional graph theory, the framework enables joint optimization of five key operational decisions in intelligent TEN management: vehicle dispatch, route choice, charging station queuing, coordinated charging, and vehicle-to-grid stabilization. The mesoscopic, dynamic model explicitly represents individual EVs and their state-of-charge trajectories, thereby extending beyond the prevailing literature's focus on static, macroscopic traffic assignment. It further captures the full scope of the TEN as a system-of-systems, incorporating five distinct charging modalities: private residential, private commercial, wired public commercial, inductive public, and discharging. On the power system side, an IV-ACOPF formulation ensures globally optimal solutions to the electrical subproblems. Comparative analysis demonstrates the substantial value of coordinated TEN operation relative to the status quo of siloed, uncoordinated infrastructure management. This work provides both a novel methodological contribution and actionable insights for the co-design and operation of next-generation sustainable mobility-energy systems.
comment: 31 pages, 9 figures
Observability and parameter estimation of a generic model for aggregated distributed energy resources
We propose a novel framework for estimating the parameters of an aggregated distributed energy resources (der_a) model. First, we introduce a rigorous method to determine whether all model parameters are estimable. When they are not, our approach identifies the subset of parameters that can be estimated. The proposed framework offers new insights into the number and specific parameters that can be reliably estimated based on commonly available measurements. It also highlights the limitations of calibrating such models. Second, we introduce a Kalman filtering method to calibrate the der_a model. Since we account for nonlinear effects such as saturation and deadbands, we develop a specific mechanism to handle smoothing functions within the Kalman filter. Specifically, we consider the extended and the unscented Kalman filter. We demonstrate the effectiveness of the proposed framework on a modified IEEE 34-node distribution feeder with inverter-based resources. Our findings align with the North American Electric Reliability Corporation's parameterization guideline and underscore the importance of model calibration in accurately capturing the collective dynamics of distributed energy resources installed on distribution systems.
Quantum Deception: Honey-X Deception using Quantum Games
In this paper, we develop a framework for deception in quantum games, extending the Honey-X paradigm from classical zero-sum settings into the quantum domain. Building on a view of deception in classical games as manipulation of a player's perception of the payoff matrix, we formalize quantum deception as controlled perturbations of the payoff Hamiltonian subject to a deception budget. We show that when victims are aware of possible deception, their equilibrium strategies surprisingly coincide with those of naive victims who fully trust the deceptive Hamiltonian. This equivalence allows us to cast quantum deception as a bilevel optimization problem, which can be reformulated into a bilinear semidefinite program. To illustrate the framework, we present simulations on quantum versions of the Penny Flip game, demonstrating how quantum strategy spaces and non-classical payoffs can amplify the impact of deception relative to classical formulations.
comment: Submitted to 2026 American Control Conference (ACC), New Orleans, LA
Coherent Load Profile Synthesis with Conditional Diffusion for LV Distribution Network Scenario Generation
Limited visibility of power distribution network power flows at the low voltage level presents challenges to both distribution network operators from a planning perspective and distribution system operators from a congestion management perspective. Forestalling these challenges through scenario analysis is confounded by the lack of realistic and coherent load data across representative distribution feeders. Load profiling approaches often rely on summarising demand through typical profiles, which oversimplifies the complexity of substation-level operations and limits their applicability in specific power system studies. Sampling methods, and more recently generative models, have attempted to address this through synthesising representative loads from historical exemplars; however, while these approaches can approximate load shapes to a convincing degree of fidelity, the co-behaviour between substations, which ultimately impacts higher voltage level network operation, is often overlooked. This limitation will become even more pronounced with the increasing integration of low-carbon technologies, as estimates of base loads fail to capture load diversity. To address this gap, a Conditional Diffusion model for synthesising daily active and reactive power profiles at the low voltage distribution substation level is proposed. The evaluation of fidelity is demonstrated through conventional metrics capturing temporal and statistical realism, as well as power flow modelling. The results show synthesised load profiles are plausible both independently and as a cohort in a wider power systems context. The Conditional Diffusion model is benchmarked against both naive and state-of-the-art models to demonstrate its effectiveness in producing realistic scenarios on which to base sub-regional power distribution network planning and operations.
ORN-CBF: Learning Observation-conditioned Residual Neural Control Barrier Functions via Hypernetworks
Control barrier functions (CBFs) have been demonstrated as an effective method for safety-critical control of autonomous systems. Although CBFs are simple to deploy, their design remains challenging, motivating the development of learning-based approaches. Yet, issues such as suboptimal safe sets, applicability in partially observable environments, and lack of rigorous safety guarantees persist. In this work, we propose observation-conditioned neural CBFs based on Hamilton-Jacobi (HJ) reachability analysis, which approximately recover the maximal safe sets. We exploit certain mathematical properties of the HJ value function, ensuring that the predicted safe set never intersects with the observed failure set. Moreover, we leverage a hypernetwork-based architecture that is particularly suitable for the design of observation-conditioned safety filters. The proposed method is examined both in simulation and hardware experiments for a ground robot and a quadcopter. The results show improved success rates and generalization to out-of-domain environments compared to the baselines.
Learning Satellite Attitude Dynamics with Physics-Informed Normalising Flow
Attitude control is a fundamental aspect of spacecraft operations. Model Predictive Control (MPC) has emerged as a powerful strategy for these tasks, relying on accurate models of the system dynamics to optimize control actions over a prediction horizon. In scenarios where physics models are incomplete, difficult to derive, or computationally expensive, machine learning offers a flexible alternative by learning the system behavior directly from data. However, purely data-driven models often struggle with generalization and stability, especially when applied to inputs outside their training domain. To address these limitations, we investigate the benefits of incorporating Physics-Informed Neural Networks (PINNs) into the learning of spacecraft attitude dynamics, comparing their performance with that of purely data-driven approaches. Using a Real-valued Non-Volume Preserving (Real NVP) neural network architecture with a self-attention mechanism, we trained several models on simulated data generated with the Basilisk simulator. Two training strategies were considered: a purely data-driven baseline and a physics-informed variant to improve robustness and stability. Our results demonstrate that the inclusion of physics-based information significantly enhances the performance in terms of the mean relative error of the best architectures found by 27.08%. These advantages are particularly evident when the learned models are integrated into an MPC framework, where PINN-based models consistently outperform their purely data-driven counterparts in terms of control accuracy and robustness, yielding improvements of up to 42.86% in performance stability error and increased robustness-to-noise.
A Unified Framework for Innovation-based Stochastic and Deterministic Event Triggers
Resources such as bandwidth and energy are limited in many wireless communications use cases, especially when large numbers of sensors and fusion centers need to exchange information frequently. One opportunity to overcome resource constraints is the use of event-based transmissions and estimation to transmit only information that contributes significantly to the reconstruction of the system's state. The design of efficient triggering policies and estimators is crucial for successful event-based transmissions. While previously deterministic and stochastic event triggering policies have been treated separately, this paper unifies the two approaches and gives insights into the design of consistent trigger-matching estimators. Two different estimators are presented, and different pairs of triggers and estimators are evaluated through simulation studies.
comment: 8 pages, 5 figures, presented at FUSION 2025
Product Digital Twin Supporting End-of-life Phase of Electric Vehicle Batteries Utilizing Product-Process-Resource Asset Network
In a circular economy, products in their end-of-life phase should be either remanufactured or recycled. Both of these processes are crucial for sustainability and environmental conservation. However, manufacturers frequently do not support these processes enough in terms of not sharing relevant data about the products nor their (re-)manufacturing processes. This paper proposes to accompany each product with a digital twin technology, specifically the Product Digital Twin (PDT), which can carry information for facilitating and optimizing production and remanufacturing processes. This paper introduces a knowledge representation called Bi-Flow Product-Process-Resource Asset Network (Bi-PAN). Bi-PAN extends a well-proven Product-Process-Resource Asset Network (PAN) paradigm by integrating both assembly and disassembly workflows into a single information model. Such networks enable capturing relevant relationships across products, production resources, manufacturing processes, and specific production operations that have to be done in the manufacturing phase of a product. The proposed approach is demonstrated in a use-case of disassembling electric vehicle (EV) batteries. By utilizing PDTs with Bi-PAN knowledge models, challenges associated with disassembling of EV batteries can be solved flexibly and efficiently for various battery types, enhancing the sustainability of the EV battery life-cycle management.
comment: \copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Manipulation of Elasto-Flexible Cables with Single or Multiple UAVs
This work considers a large class of systems composed of multiple quadrotors manipulating deformable and extensible cables. The cable is described via a discretized representation, which decomposes it into linear springs interconnected through lumped-mass passive spherical joints. Sets of flat outputs are found for the systems. Numerical simulations support the findings by showing cable manipulation relying on flatness-based trajectories. Eventually, we present an experimental validation of the effectiveness of the proposed discretized cable model for a two-robot example. Moreover, a closed-loop controller based on the identified model and using cable-output feedback is experimentally tested.
Multiple input tangential interpolation-driven damage detection of a jet trainer aircraft
The problem of damage detection and identification is of interest for many aerospace and aeronautical engineering systems. However, relevant literature mostly focuses on subsystems and parts, rather than full airframes. In structural dynamics, modal parameters, such as natural frequencies and mode shapes, from any structure are the main building blocks of vibration-based damage detection. However, traditional comparisons of these parameters are often ambiguous in complex systems, complicating damage detection and assessment. The modified total modal assurance criterion (MTMAC), an index well-known in the field of finite element model updating, is extended to address this challenge and is proposed as an index for damage identification and severity assessment. To support the requirement for precise and robust modal identification of Structural Health Monitoring (SHM), the improved Loewner Framework (iLF), known for its reliability and computational performance, is pioneeringly employed within SHM. Since the MTMAC is proposed solely as a damage identification and severity assessment index, the coordinate modal assurance criterion (COMAC), also a well-established tool, but for damage localisation using mode shapes, is used for completeness. The iLF SHM capabilities are validated through comparisons with traditional methods, including least-squares complex exponential and stochastic subspace identification with canonical variate analysis on a numerical case study of a cantilever beam. Furthermore, the MTMAC is validated against the traditional vibration-based approach, which involves directly comparing natural frequencies and mode shapes. Finally, an experimental dataset from a BAE Systems Hawk T1A jet trainer ground vibration test is used to demonstrate the iLF and MTMAC capabilities on a real-life, real-size SHM problem, showing their effectiveness in detecting and assessing damage.
General formulation of an analytic, Lipschitz continuous control allocation for thrust-vectored controlled rigid-bodies
This study introduces a systematic and scalable method for arbitrary rigid-bodies equipped with vectorized thrusters. Two novel solutions are proposed: a closed-form, Lipschitz continuous mapping that ensures smooth actuator orientation references, and a convex optimization formulation capable of handling practical actuator constraints such as thrust saturation and angular rate limits. Both methods leverage the null-space structure of the allocation mapping to perform singularity avoidance while generating sub-optimal yet practical solutions. The effectiveness and generality of the proposed framework are demonstrated through numerical simulations on a 3DOF marine vessel and a 6DOF aerial quadcopter.
Topology optimization of decoupling feeding networks for antenna arrays
Near-field and radiation coupling between nearby radiating elements is unavoidable, and it is considered a limiting factor for applications in wireless communications and active sensing. This article proposes a density-based topology optimization approach to design decoupling networks for such systems. The decoupling networks are designed based on a multi-objective optimization problem with the radiating elements replaced by their time-domain impulse response for efficient computations and to enable the solution of the design problem using gradient-based optimization methods. We use the adjoint-field method to compute the gradients of the optimization objectives. Additionally, nonlinear filters are applied during the optimization procedure to impose minimum-size control on the optimized designs. We demonstrate the concept by designing the decoupling network for a two-element planar antenna array; the antenna is designed in a separate optimization problem. The optimized decoupling networks provide a signal path that destructively interferes with the coupling between the radiating elements while preserving their individual matching to the feeding ports. Compact decoupling networks capable of suppressing the mutual coupling by more than 10 dB between two closely separated planar antennas operating around 2.45 GHz are presented and validated experimentally.
comment: Accepted version of the manuscript published in IEEE Transactions on Antennas and Propagation, 2025. The final version is available at https://doi.org/10.1109/TAP.2025.3621265
A Taylor Series Approach to Correction of Input Errors in Gaussian Process Regression
Gaussian Processes (GPs) are widely recognized as powerful non-parametric models for regression and classification. Traditional GP frameworks predominantly operate under the assumption that the inputs are either accurately known or subject to zero-mean noise. However, several real-world applications such as mobile sensors have imperfect localization, leading to inputs with biased errors. These biases can typically be estimated through measurements collected over time using, for example, Kalman filters. To avoid recomputation of the entire GP model when better estimates of the inputs used in the training data become available, we introduce a technique for updating a trained GP model to incorporate updated estimates of the inputs. By leveraging the differentiability of the mean and covariance functions derived from the squared exponential kernel, a second-order correction algorithm is developed to update the trained GP models. Precomputed Jacobians and Hessians of kernels enable real-time refinement of the mean and covariance predictions. The efficacy of the developed approach is demonstrated using two simulation studies, with error analyses revealing improvements in both predictive accuracy and uncertainty quantification.
comment: Improving the paper with better results and adding experimental results to publish again
Adaptive DRL for IRS Mirror Orientation in Dynamic OWC Networks
Intelligent reflecting surfaces (IRSs) have emerged as a promising solution to mitigate line-of-sight (LoS) blockages and enhance signal coverage in optical wireless communication (OWC) systems with minimal additional power. In this work, we consider a mirror-based IRS to assist a dynamic indoor visible light communication (VLC) environment. We formulate an optimization problem that aims to maximize the sum rate by adjusting the orientation of the IRS mirrors. To enable real-time adaptability, the problem is modelled as a Markov decision process (MDP), and a deep reinforcement learning (DRL) algorithm is developed based on the deterministic policy gradient for real-time mirror-based IRS optimization in dynamic VLC networks. The proposed DRL is employed to optimize mirror orientation toward mobile users under blockage and mobility constraints. Simulation results demonstrate that our proposed DRL algorithm outperforms the conventional deep Q- learning (DQL) algorithm and achieves substantial improvements in sum rate compared to random-orientation IRS configurations
comment: 6 pages, 5 figures
PowerPlots.jl: An Open Source Power Grid Visualization and Data Analysis Framework for Academic Research
Data visualization is essential for developing an understanding of a complex system. The power grid is one of the most complex systems in the world and effective power grid research visualization software must 1) be easy to use, 2) support unique data that may arise in research, and 3) be capable of creating custom figures for publication and presentation. However, no current software addresses all three of these needs. PowerPlots is an open-source data visualization tool for power grids that does address these needs. In addition, several tools created to support this software facilitate the analysis of power grid data by transforming the data into graph topology or data-frame data formats that are more compatible for some analyses. In this work, we use PowerPlots to investigate several case studies that involve exploring power grid data. These case studies demonstrate the valuable insights that are possible when using network visualization and how it can be applied to research applications.
Adaptive Decentralized Queue Disclosure for Impatient Tenants in Edge and Non-terrestrial Systems
We study how queue-state information disclosures affect impatient tenants in multi-tenant edge systems. We propose an information-bulletin strategy in which each queue periodically broadcasts two Markov models. One is a model of steady-state service-rate behavior and the other a model of the queue length inter-change times. Tenants autonomously decide to renege or jockey based on this information. The queues observe tenant responses and adapt service rates via a learned, rule-based predictive policy designed for decentralized, partially-observed, and time-varying environments. We compare this decentralized, information-driven policy to the classical, centralized Markov Decision Process (MDP) hedging-point policy for M/M/2 systems. Numerical experiments quantify the tradeoffs in average delay, impatience and robustness to stale information. Results show that when full, instantaneous state information and stationarity hold, the hedging-point policy yields less impatience but this diminishes as information becomes partial or stale. The rule-based predictive policy on the other hand is more robust to staleness in dispatched information, making it conducive for conditions typical of edge cloud and non-terrestrial deployments.
comment: Accepted by NFV-SDN'25 Doctoral Symposium
Modeling nonuniform energy decay through the modal decomposition of acoustic radiance transfer (MoD-ART)
Modeling late reverberation in real-time interactive applications is a challenging task when multiple sound sources and listeners are present in the same environment. This is especially problematic when the environment is geometrically complex and/or features uneven energy absorption (e.g. coupled volumes), because in such cases the late reverberation is dependent on the sound sources' and listeners' positions, and therefore must be adapted to their movements in real time. We present a novel approach to the task, named modal decomposition of acoustic radiance transfer (MoD-ART), which can handle highly complex scenarios with efficiency. The approach is based on the geometrical acoustics method of acoustic radiance transfer, from which we extract a set of energy decay modes and their positional relationships with sources and listeners. In this paper, we describe the physical and mathematical significance of MoD-ART, highlighting its advantages and applicability to different scenarios. Through an analysis of the method's computational complexity, we show that it compares very favorably with ray-tracing. We also present simulation results showing that MoD-ART can capture multiple decay slopes and flutter echoes.
Model Predictive Inferential Control of Neural State-Space Models for Autonomous Vehicle Motion Planning
Model predictive control (MPC) has proven useful in enabling safe and optimal motion planning for autonomous vehicles. In this paper, we investigate how to achieve MPC-based motion planning when a neural state-space model represents the vehicle dynamics. As the neural state-space model will lead to highly complex, nonlinear and nonconvex optimization landscapes, mainstream gradient-based MPC methods will be computationally too heavy to be a viable solution. In a departure, we propose the idea of model predictive inferential control (MPIC), which seeks to infer the best control decisions from the control objectives and constraints. Following the idea, we convert the MPC problem for motion planning into a Bayesian state estimation problem. Then, we develop a new particle filtering/smoothing approach to perform the estimation. This approach is implemented as banks of unscented Kalman filters/smoothers and offers high sampling efficiency, fast computation, and estimation accuracy. We evaluate the MPIC approach through a simulation study of autonomous driving in different scenarios, along with an exhaustive comparison with gradient-based MPC. The results show that the MPIC approach has considerable computational efficiency, regardless of complex neural network architectures, and shows the capability to solve large-scale MPC problems for neural state-space models.
Multiagent Systems
Fast and the Furious: Hot Starts in Pursuit-Evasion Games AAMAS
Effectively positioning pursuers in pursuit-evasion games without prior knowledge of evader locations remains a significant challenge. A novel approach that combines game-theoretic control theory with Graph Neural Networks is introduced in this work. By conceptualizing pursuer configurations as strategic arrangements and representing them as graphs, a Graph Characteristic Space is constructed via multi-objective optimization to identify Pareto-optimal configurations. A Graph Convolutional Network (GCN) is trained on these Pareto-optimal graphs to generate strategically effective initial configurations, termed "hot starts". Empirical evaluations demonstrate that the GCN-generated hot starts provide a significant advantage over random configurations. In scenarios considering multiple pursuers and evaders, this method hastens the decline in evader survival rates, reduces pursuer travel distances, and enhances containment, showcasing clear strategic benefits.
comment: Presented at AAMAS Workshop on Autonomous Robots and Multirobot Systems (ARMS)
Two-Layer Voronoi Coverage Control for Hybrid Aerial-Ground Robot Teams in Emergency Response: Implementation and Analysis
We present a comprehensive two-layer Voronoi coverage control approach for coordinating hybrid aerial-ground robot teams in hazardous material emergency response scenarios. Traditional Voronoi coverage control methods face three critical limitations in emergency contexts: heterogeneous agent capabilities with vastly different velocities, clustered initial deployment configurations, and urgent time constraints requiring rapid response rather than eventual convergence. Our method addresses these challenges through a decoupled two-layer architecture that separately optimizes aerial and ground robot positioning, with aerial agents delivering ground sensors via airdrop to high-priority locations. We provide detailed implementation of bounded Voronoi cell computation, efficient numerical integration techniques for importance-weighted centroids, and robust control strategies that prevent agent trapping. Simulation results demonstrate an 88% reduction in response time, achieving target sensor coverage (18.5% of initial sensor loss) in 25 seconds compared to 220 seconds for ground-only deployment. Complete implementation code is available at https://github.com/dHutchings/ME292B.
comment: 23 pages, 7 figures. Technical report with complete implementation details and open-source code
HyperAgent: Leveraging Hypergraphs for Topology Optimization in Multi-Agent Communication
Recent advances in large language model-powered multi-agent systems have demonstrated remarkable collective intelligence through effective communication. However, existing approaches face two primary challenges: (i) \textit{Ineffective group collaboration modeling}, as they rely on pairwise edge representations in graph structures, limiting their ability to capture relationships among multiple agents; and (ii) \textit{Limited task-adaptiveness in communication topology design}, leading to excessive communication cost for simple tasks and insufficient coordination for complex scenarios. These issues restrict the scalability and practical deployment of adaptive collaboration frameworks. To address these challenges, we propose \textbf{HyperAgent}, a hypergraph-based framework that optimizes communication topologies and effectively captures group collaboration patterns using direct hyperedge representations. Unlike edge-based approaches, HyperAgent uses hyperedges to link multiple agents within the same subtask and employs hypergraph convolutional layers to achieve one-step information aggregation in collaboration groups. Additionally, it incorporates a variational autoencoder framework with sparsity regularization to dynamically adjust hypergraph topologies based on task complexity. Experiments highlight the superiority of HyperAgent in both performance and efficiency. For instance, on GSM8K, HyperAgent achieves 95.07\% accuracy while reducing token consumption by 25.33\%, demonstrating the potential of hypergraph-based optimization for multi-agent communication.
Multitask Learning with Learned Task Relationships
Classical consensus-based strategies for federated and decentralized learning are statistically suboptimal in the presence of heterogeneous local data or task distributions. As a result, in recent years, there has been growing interest in multitask or personalized strategies, which allow individual agents to benefit from one another in pursuing locally optimal models without enforcing consensus. Existing strategies require either precise prior knowledge of the underlying task relationships or are fully non-parametric and instead rely on meta-learning or proximal constructions. In this work, we introduce an algorithmic framework that strikes a balance between these extremes. By modeling task relationships through a Gaussian Markov Random Field with an unknown precision matrix, we develop a strategy that jointly learns both the task relationships and the local models, allowing agents to self-organize in a way consistent with their individual data distributions. Our theoretical analysis quantifies the quality of the learned relationship, and our numerical experiments demonstrate its practical effectiveness.
RobotFleet: An Open-Source Framework for Centralized Multi-Robot Task Planning
Coordinating heterogeneous robot fleets to achieve multiple goals is challenging in multi-robot systems. We introduce an open-source and extensible framework for centralized multi-robot task planning and scheduling that leverages LLMs to enable fleets of heterogeneous robots to accomplish multiple tasks. RobotFleet provides abstractions for planning, scheduling, and execution across robots deployed as containerized services to simplify fleet scaling and management. The framework maintains a shared declarative world state and two-way communication for task execution and replanning. By modularizing each layer of the autonomy stack and using LLMs for open-world reasoning, RobotFleet lowers the barrier to building scalable multi-robot systems. The code can be found here: https://github.com/therohangupta/robot-fleet.
Multi-Objective Multi-Agent Path Finding with Lexicographic Cost Preferences
Many real-world scenarios require multiple agents to coordinate in shared environments, while balancing trade-offs between multiple, potentially competing objectives. Current multi-objective multi-agent path finding (MO-MAPF) algorithms typically produce conflict-free plans by computing Pareto frontiers. They do not explicitly optimize for user-defined preferences, even when the preferences are available, and scale poorly with the number of objectives. We propose a lexicographic framework for modeling MO-MAPF, along with an algorithm \textit{Lexicographic Conflict-Based Search} (LCBS) that directly computes a single solution aligned with a lexicographic preference over objectives. LCBS integrates a priority-aware low-level $A^*$ search with conflict-based search, avoiding Pareto frontier construction and enabling efficient planning guided by preference over objectives. We provide insights into optimality and scalability, and empirically demonstrate that LCBS computes optimal solutions while scaling to instances with up to ten objectives -- far beyond the limits of existing MO-MAPF methods. Evaluations on standard and randomized MAPF benchmarks show consistently higher success rates against state-of-the-art baselines, especially with increasing number of objectives.
comment: 8 pages, 7 figures
FURINA: A Fully Customizable Role-Playing Benchmark via Scalable Multi-Agent Collaboration Pipeline
As large language models (LLMs) advance in role-playing (RP) tasks, existing benchmarks quickly become obsolete due to their narrow scope, outdated interaction paradigms, and limited adaptability across diverse application scenarios. To address this gap, we introduce FURINA-Builder, a novel multi-agent collaboration pipeline that automatically constructs fully customizable RP benchmarks at any scale. It enables evaluation of arbitrary characters across diverse scenarios and prompt formats, as the first benchmark builder in RP area for adaptable assessment. FURINA-Builder simulates dialogues between a test character and other characters drawn from a well-constructed character-scene pool, while an LLM judge selects fine-grained evaluation dimensions and adjusts the test character's responses into final test utterances. Using this pipeline, we build FURINA-Bench, a new comprehensive role-playing benchmark featuring both established and synthesized test characters, each assessed with dimension-specific evaluation criteria. Human evaluation and preliminary separability analysis justify our pipeline and benchmark design. We conduct extensive evaluations of cutting-edge LLMs and find that o3 and DeepSeek-R1 achieve the best performance on English and Chinese RP tasks, respectively. Across all models, established characters consistently outperform synthesized ones, with reasoning capabilities further amplifying this disparity. Interestingly, we observe that model scale does not monotonically reduce hallucinations. More critically, for reasoning LLMs, we uncover a novel trade-off: reasoning improves RP performance but simultaneously increases RP hallucinations. This trade-off extends to a broader Pareto frontier between RP performance and reliability for all LLMs. These findings demonstrate the effectiveness of FURINA-Builder and the challenge posed by FURINA-Bench.
Fake News in Social Networks
We propose multi-agent reinforcement learning as a new method for modeling fake news in social networks. This method allows us to model human behavior in social networks both in unaccustomed populations and in populations that have adapted to the presence of fake news. In particular the latter is challenging for existing methods. We find that a fake-news attack is more effective if it targets highly connected people and people with weaker private information. Attacks are more effective when the disinformation is spread across several agents than when the disinformation is concentrated with more intensity on fewer agents. Furthermore, fake news spread less well in balanced networks than in clustered networks. We test a part of our findings in a human-subject experiment. The experimental evidence provides support for the predictions from the model, suggesting that the model is suitable to analyze the spread of fake news in social networks.
Systems and Control (CS)
Storage Participation in Electricity Markets: Arbitrage and Ancillary Services
Electricity storage is used for intertemporal price arbitrage and for ancillary services that balance unforeseen supply and demand fluctuations via frequency regulation. We present an optimization model that computes bids for both arbitrage and frequency regulation and ensures that storage operators can honor their market commitments at all times for all fluctuation signals in an uncertainty set inspired by market rules. This requirement, initially expressed by an infinite number of nonconvex functional constraints, is shown to be equivalent to a finite number of deterministic constraints. The resulting formulation is a mixed-integer bilinear program that admits mixed-integer linear relaxations and restrictions. Empirical tests on European electricity markets show a negligible optimality gap between the relaxation and the restriction. The model can account for intraday trading and, with a solution time of under 5 seconds, may serve as a building block for more complex trading strategies. Such strategies become necessary as battery capacity exceeds the demand for ancillary services. In a backtest from 1 July 2020 through 30 June 2024 joint market participation more than doubles profits and almost halves energy storage output compared to arbitrage alone.
Structured identification of multivariable modal systems
Physically interpretable models are essential for next-generation industrial systems, as these representations enable effective control, support design validation, and provide a foundation for monitoring strategies. The aim of this paper is to develop a system identification framework for estimating modal models of complex multivariable mechanical systems from frequency response data. To achieve this, a two-step structured identification algorithm is presented, where an additive model is first estimated using a refined instrumental variable method and subsequently projected onto a modal form. The developed identification method provides accurate, physically-relevant, minimal-order models, for both generally-damped and proportionally damped modal systems. The effectiveness of the proposed method is demonstrated through experimental validation on a prototype wafer-stage system, which features a large number of spatially distributed actuators and sensors and exhibits complex flexible dynamics.
comment: 20 pages, 12 figures
Two-Layer Voronoi Coverage Control for Hybrid Aerial-Ground Robot Teams in Emergency Response: Implementation and Analysis
We present a comprehensive two-layer Voronoi coverage control approach for coordinating hybrid aerial-ground robot teams in hazardous material emergency response scenarios. Traditional Voronoi coverage control methods face three critical limitations in emergency contexts: heterogeneous agent capabilities with vastly different velocities, clustered initial deployment configurations, and urgent time constraints requiring rapid response rather than eventual convergence. Our method addresses these challenges through a decoupled two-layer architecture that separately optimizes aerial and ground robot positioning, with aerial agents delivering ground sensors via airdrop to high-priority locations. We provide detailed implementation of bounded Voronoi cell computation, efficient numerical integration techniques for importance-weighted centroids, and robust control strategies that prevent agent trapping. Simulation results demonstrate an 88% reduction in response time, achieving target sensor coverage (18.5% of initial sensor loss) in 25 seconds compared to 220 seconds for ground-only deployment. Complete implementation code is available at https://github.com/dHutchings/ME292B.
comment: 23 pages, 7 figures. Technical report with complete implementation details and open-source code
GPS Spoofing Attack Detection in Autonomous Vehicles Using Adaptive DBSCAN
As autonomous vehicles become an essential component of modern transportation, they are increasingly vulnerable to threats such as GPS spoofing attacks. This study presents an adaptive detection approach utilizing a dynamically tuned Density Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, designed to adjust the detection threshold ({\epsilon}) in real-time. The threshold is updated based on the recursive mean and standard deviation of displacement errors between GPS and in-vehicle sensors data, but only at instances classified as non-anomalous. Furthermore, an initial threshold, determined from 120,000 clean data samples, ensures the capability to identify even subtle and gradual GPS spoofing attempts from the beginning. To assess the performance of the proposed method, five different subsets from the real-world Honda Research Institute Driving Dataset (HDD) are selected to simulate both large and small magnitude GPS spoofing attacks. The modified algorithm effectively identifies turn-by-turn, stop, overshoot, and multiple small biased spoofing attacks, achieving detection accuracies of 98.621%, 99.960.1%, 99.880.1%, and 98.380.1%, respectively. This work provides a substantial advancement in enhancing the security and safety of AVs against GPS spoofing threats.
Aggregate Modeling of Air-Conditioner Loads Under Packet-based Control with Both On and Off Grid Access Requests
Coordination of distributed energy resources (DERs) can engender flexibility necessary to improve grid reliability. Packetized Energy Management (PEM) is a method for coordinating DERs, such as thermostatically controlled loads (TCLs) and electric vehicles, within customer quality-of-service (QoS) limits. In PEM, a DER uses local information to offer flexibility by sending a request to the DER coordinator to turn-ON or turn-OFF. Much work has focused on modeling and analyzing aggregations of DERs under PEM with fixed packet durations and only turn-ON requests. Different recent efforts to enable variable packet lengths have shown an increase in available flexibility and ramping capability, but have not been modeled in aggregate, which limits systematic analyses. To address this issue, this paper presents a new aggregate bin-based (macro) model of PEM loads that incorporates both turn-ON and turn-OFF request features, enabling the model to accurately characterize the capability of the fleet of DERs to track a power reference signal, population temperature dynamics, aggregate request rates, and variable packet lengths. Simulation-based validation is performed against an agent-based (micro) model to evaluate robustness and quantify model accuracy. Finally, the distribution of variable packet lengths from macro-model simulations are applied to inform past work on PEM with randomized packet lengths
Transforming Tarlac State University (TSU) Gymnasium to a Nearly Zero-Energy Building through Integration of a Solar Photovoltaic (PV) System
The study is anchored to the principles of Nearly-Zero Energy Building (NZEB). It aimed to transform the Tarlac State University Gymnasium into a facility with energy-efficient equipment to contribute to reducing carbon footprints by integrating a solar PV system as its renewable energy source. The researchers found out that the electrical infrastructure of the Gym was outdated, and the lighting was not energy efficient, and there were too few convenience or power outlets. There was also insufficient cooling equipment to maintain a comfortable temperature. Analysis shows that the payback period is within the average range, making it a cost-effective investment for the University. Aside from the cost of the PV System, adherence to engineering design standards will mean additional costs to replace the metal halides with LED high bay lamps, installation of additional air conditioning units, and provision of additional convenience outlets. These additional costs should be considered when evaluating the feasibility of the project. It is recommended that the integrity of the existing roof system of the Gymnasium be considered. The total cost of putting up the whole electrical system, including new lighting, cooling, and convenience loads, must be calculated to determine the total cost of implementing the whole NZEB project. Other factors in the economic evaluation may be considered to determine a more stringent result.
Decoupled Scaling 4ch Bilateral Control on the Cartesian coordinate by 6-DoF Manipulator using Rotation Matrix
Four-channel bilateral control is a method for achieving remote control with force feedback and adjustment operability by synchronizing the positions and forces of two manipulators. This is expected to significantly improve the operability of the remote control in contact-rich tasks. Among these, 4-channel bilateral control on the Cartesian coordinate system is advantageous owing to its suitability for manipulators with different structures and because it allows the dynamics in the Cartesian coordinate system to be adjusted by adjusting the control parameters, thus achieving intuitive operability for humans. This paper proposes a 4-channel bilateral control method that achieves the desired dynamics by decoupling each dimension in the Cartesian coordinate system regardless of the scaling factor.
comment: 6 pages, 4 figures, Accepted at SAMCON 2025
A Verified High-Performance Composable Object Library for Remote Direct Memory Access (Extended Version)
Remote Direct Memory Access (RDMA) is a memory technology that allows remote devices to directly write to and read from each other's memory, bypassing components such as the CPU and operating system. This enables low-latency high-throughput networking, as required for many modern data centres, HPC applications and AI/ML workloads. However, baseline RDMA comprises a highly permissive weak memory model that is difficult to use in practice and has only recently been formalised. In this paper, we introduce the Library of Composable Objects (LOCO), a formally verified library for building multi-node objects on RDMA, filling the gap between shared memory and distributed system programming. LOCO objects are well-encapsulated and take advantage of the strong locality and the weak consistency characteristics of RDMA. They have performance comparable to custom RDMA systems (e.g. distributed maps), but with a far simpler programming model amenable to formal proofs of correctness. To support verification, we develop a novel modular declarative verification framework, called Mowgli, that is flexible enough to model multinode objects and is independent of a memory consistency model. We instantiate Mowgli with the RDMA memory model, and use it to verify correctness of LOCO libraries.
Galilean Symmetry in Robotics
Galilean symmetry is the natural symmetry of inertial motion that underpins Newtonian physics. Although rigid-body symmetry is one of the most established and fundamental tools in robotics, there appears to be no comparable treatment of Galilean symmetry for a robotics audience. In this paper, we present a robotics-tailored exposition of Galilean symmetry that leverages the community's familiarity with and understanding of rigid-body transformations and pose representations. Our approach contrasts with common treatments in the physics literature that introduce Galilean symmetry as a stepping stone to Einstein's relativity. A key insight is that the Galilean matrix Lie group can be used to describe two different pose representations, Galilean frames, that use inertial velocity in the state definition, and extended poses, that use coordinate velocity. We provide three examples where applying the Galilean matrix Lie-group algebra to robotics problems is straightforward and yields significant insights: inertial navigation above the rotating Earth, manipulator kinematics, and sensor data fusion under temporal uncertainty. We believe that the time is right for the robotics community to benefit from rediscovering and extending this classical material and applying it to modern problems.
comment: Under Review
Towards Dynamic Quadrupedal Gaits: A Symmetry-Guided RL Hierarchy Enables Free Gait Transitions at Varying Speeds
Quadrupedal robots exhibit a wide range of viable gaits, but generating specific footfall sequences often requires laborious expert tuning of numerous variables, such as touch-down and lift-off events and holonomic constraints for each leg. This paper presents a unified reinforcement learning framework for generating versatile quadrupedal gaits by leveraging the intrinsic symmetries and velocity-period relationship of dynamic legged systems. We propose a symmetry-guided reward function design that incorporates temporal, morphological, and time-reversal symmetries. By focusing on preserved symmetries and natural dynamics, our approach eliminates the need for predefined trajectories, enabling smooth transitions between diverse locomotion patterns such as trotting, bounding, half-bounding, and galloping. Implemented on the Unitree Go2 robot, our method demonstrates robust performance across a range of speeds in both simulations and hardware tests, significantly improving gait adaptability without extensive reward tuning or explicit foot placement control. This work provides insights into dynamic locomotion strategies and underscores the crucial role of symmetries in robotic gait design.
Controller for Incremental Input-to-State Practical Stabilization of Partially Unknown systems with Invariance Guarantees
Incremental stability is a property of dynamical systems that ensures the convergence of trajectories with respect to each other rather than a fixed equilibrium point or a fixed trajectory. In this paper, we introduce a related stability notion called incremental input-to-state practical stability ({\delta}-ISpS), ensuring safety guarantees. We also present a feedback linearization based control design scheme that renders a partially unknown system incrementally input-to-state practically stable and safe with formal guarantees. To deal with the unknown dynamics, we utilize Gaussian process regression to approximate the model. Finally, we implement the controller synthesized by the proposed scheme on a manipulator example
comment: 2 figures,9 pages
Risk-Budgeted Control Framework for Balanced Performance and Safety in Autonomous Vehicles
This paper presents a risk-budgeted monitor with a control framework that certifies safety for autonomous driving. In this process, a sliding window is proposed to monitor for insufficient barrier residuals or nonzero tail risk, ensuring system safety. When the safety margin deteriorates, it triggers switching the safety constraint from a performance-based relaxed-control barrier function (R-CBF) to a conservative conditional value at risk (CVaR-CBF) to address the safety concern. This switching is governed by two real-time triggers: Feasibility-Triggered (FT) and Quality-Triggered (QT) conditions. In the FT condition, if the R-CBF constraint becomes infeasible or yields a suboptimal solution, the risk monitor triggers the use of the CVaR constraints for the controller. In the QT condition, the risk monitor observes the safety margin of the R-CBF solution at every step, regardless of feasibility. If it falls below the safety margin, the safety filter switches to the CVaR-CBF constraints. The proposed framework is evaluated using a model predictive controller (MPC) for autonomous driving in the presence of autonomous vehicle (AV) localization noise and obstacle position uncertainties. Multiple AV-pedestrian interaction scenarios are considered, with 1,500 Monte Carlo runs conducted for all scenarios. In the most challenging setting with pedestrian detection uncertainty of 5 m, the proposed framework achieves a 94-96% success rate of not colliding with the pedestrians over 300 trials while maintaining the lowest mean cross-track error (CTE = 3.2-3.6 m) to the reference path. The reduced CTE indicates faster trajectory recovery after obstacle avoidance, demonstrating a balance between safety and performance.
Discovering interpretable piecewise nonlinear model predictive control laws via symbolic decision trees
In this paper, we propose symbolic decision trees as surrogate models for approximating model predictive control laws. The proposed approach learns simultaneously the partition of the input domain (splitting logic) as well as local nonlinear expressions for predicting the control action leading to interpretable piecewise nonlinear control laws. The local nonlinear expressions are determined by the learning problem and are modeled using a set of basis functions. The learning task is posed as a mixed integer optimization, which is solved to global optimality with state-of-the-art global optimization solvers. We apply the proposed approach to a case study regarding the control of an isothermal reactor. The results show that the proposed approach can learn the control law accurately, leading to closed-loop performance comparable to that of a standard model predictive controller. Finally, comparison with existing interpretable models shows that the symbolic trees achieve both lower prediction error and superior closed-loop performance.
MicroRoboScope: A Portable and Integrated Mechatronic Platform for Magnetic and Acoustic Microrobotic Experimentation
This paper presents MicroRoboScope, a portable, compact, and versatile microrobotic experimentation platform designed for real-time, closed-loop control of both magnetic and acoustic microrobots. The system integrates an embedded computer, microscope, power supplies, and control circuitry into a single, low-cost and fully integrated apparatus. Custom control software developed in Python and Arduino C++ handles live video acquisition, microrobot tracking, and generation of control signals for electromagnetic coils and acoustic transducers. The platform's multi-modal actuation, accessibility, and portability make it suitable not only for specialized research laboratories but also for educational and outreach settings. By lowering the barrier to entry for microrobotic experimentation, this system enables new opportunities for research, education, and translational applications in biomedicine, tissue engineering, and robotics.
Optimal Voltage Control Using Online Exponential Barrier Method
This paper address the optimal voltage control problem of distribution systems with high penetration of inverter-based renewable energy resources, under inaccurate model information. We propose the online exponential barrier method that explicitly leverages the online feedback from grids to enhance the robustness to model inaccuracy and incorporates the voltage constraints to maintain the safety requirements. We provide analytical results on the optimal barrier parameter selection and sufficient conditions for the safety guarantee of converged voltages. We also establish theoretical results on the exponential convergence rate with proper step-size. The effectiveness of the proposed framework is validated on a 56-bus radial network, where we significantly improve the robustness against model inaccuracy compared to existing methods.
comment: Restate the theorem for readability
DUST: A Framework for Data-Driven Density Steering
We consider the problem of data-driven stochastic optimal control of an unknown LTI dynamical system. Assuming the process noise is normally distributed, we pose the problem of steering the state's mean and covariance to a target normal distribution, under noisy data collected from the underlying system, a problem commonly referred to as covariance steering (CS). A novel framework for Data-driven Uncertainty quantification and density STeering (DUST) is presented that simultaneously characterizes the noise affecting the measured data and designs an optimal affine-feedback controller to steer the density of the state to a prescribed terminal value. We use both indirect and direct data-driven design approaches based on the notions of persistency of excitation and subspace identification to exactly represent the mean and covariance dynamics of the state in terms of the data and noise realizations. Since both the mean and the covariance steering sub-problems are plagued with stochastic uncertainty arising from noisy data collection, we first estimate the noise realization from this dataset and subsequently compute tractable upper bounds on the estimation errors. The first and second moment steering problems are then solved to optimality using techniques from robust control and robust optimization. Lastly, we present an alternative control design approach based on the certainty equivalence principle and interpret the problem as one of CS under multiplicative uncertainty. We analyze the performance and efficacy of each of these data-driven approaches using a case study and compare them with their model-based counterparts.
comment: submitted to Automatica
Global Attitude Synchronization for Heterogeneous Multi-agent Systems on SO(3)
In this paper, we address the problem of attitude synchronization for a group of rigid body systems evolving on SO(3). The interaction among these systems is modeled through an undirected, connected, and acyclic graph topology. First, we present an almost global continuous distributed attitude synchronization scheme with rigorously proven stability guarantees. Thereafter, we propose two global distributed hybrid attitude synchronization schemes on SO(3). The first scheme is a hybrid control law that leverages angular velocities and relative orientations to achieve global alignment to a common orientation. The second scheme eliminates the dependence on angular velocities by introducing dynamic auxiliary variables, while ensuring global asymptotic attitude synchronization. This velocity-free control scheme relies exclusively on attitude information. The proposed schemes are applicable to heterogeneous multi-agent systems, where agents may have distinct inertia matrices. Simulation results are provided to illustrate the effectiveness of the proposed distributed attitude synchronization schemes.
Gait Transitions in Load-Pulling Quadrupeds: Insights from Sled Dogs and a Minimal SLIP Model
Quadrupedal animals employ diverse galloping strategies to optimize speed, stability, and energy efficiency. However, the biomechanical mechanisms that enable adaptive gait transitions during high-speed locomotion under load remain poorly understood. In this study, we present new empirical and modeling insights into the biomechanics of load-pulling quadrupeds, using sprint sled dogs as a model system. High-speed video and force recordings reveal that sled dogs often switch between rotary and transverse galloping gaits within just a few strides and without any observable changes in speed, stride duration, or terrain, providing clear evidence of locomotor multistability during high-speed load-pulling. To investigate the mechanical basis of these transitions, a physics-based quadrupedal Spring-Loaded Inverted Pendulum model with hybrid dynamics and prescribed footfall sequences to reproduce the asymmetric galloping patterns observed in racing sled dogs. Through trajectory optimization, we replicate experimentally observed gait sequences and identify swing-leg stiffness modulation as a key control mechanism for inducing transitions. This work provides a much-needed biomechanical perspective on high-speed animal draft and establishes a modeling framework for studying locomotion in pulling quadrupeds, with implications for both biological understanding and the design of adaptive legged systems.
Adaptive Network Security Policies via Belief Aggregation and Rollout
Evolving security vulnerabilities and shifting operational conditions require frequent updates to network security policies. These updates include adjustments to incident response procedures and modifications to access controls, among others. Reinforcement learning methods have been proposed for automating such policy adaptations, but most of the methods in the research literature lack performance guarantees and adapt slowly to changes. In this paper, we address these limitations and present a method for computing security policies that is scalable, offers theoretical guarantees, and adapts quickly to changes. It assumes a model or simulator of the system and comprises three components: belief estimation through particle filtering, offline policy computation through aggregation, and online policy adaptation through rollout. Central to our method is a new feature-based aggregation technique, which improves scalability and flexibility. We analyze the approximation error of aggregation and show that rollout efficiently adapts policies to changes under certain conditions. Simulations and testbed results demonstrate that our method outperforms state-of-the-art methods on several benchmarks, including CAGE-2.
Systems and Control (EESS)
Storage Participation in Electricity Markets: Arbitrage and Ancillary Services
Electricity storage is used for intertemporal price arbitrage and for ancillary services that balance unforeseen supply and demand fluctuations via frequency regulation. We present an optimization model that computes bids for both arbitrage and frequency regulation and ensures that storage operators can honor their market commitments at all times for all fluctuation signals in an uncertainty set inspired by market rules. This requirement, initially expressed by an infinite number of nonconvex functional constraints, is shown to be equivalent to a finite number of deterministic constraints. The resulting formulation is a mixed-integer bilinear program that admits mixed-integer linear relaxations and restrictions. Empirical tests on European electricity markets show a negligible optimality gap between the relaxation and the restriction. The model can account for intraday trading and, with a solution time of under 5 seconds, may serve as a building block for more complex trading strategies. Such strategies become necessary as battery capacity exceeds the demand for ancillary services. In a backtest from 1 July 2020 through 30 June 2024 joint market participation more than doubles profits and almost halves energy storage output compared to arbitrage alone.
Structured identification of multivariable modal systems
Physically interpretable models are essential for next-generation industrial systems, as these representations enable effective control, support design validation, and provide a foundation for monitoring strategies. The aim of this paper is to develop a system identification framework for estimating modal models of complex multivariable mechanical systems from frequency response data. To achieve this, a two-step structured identification algorithm is presented, where an additive model is first estimated using a refined instrumental variable method and subsequently projected onto a modal form. The developed identification method provides accurate, physically-relevant, minimal-order models, for both generally-damped and proportionally damped modal systems. The effectiveness of the proposed method is demonstrated through experimental validation on a prototype wafer-stage system, which features a large number of spatially distributed actuators and sensors and exhibits complex flexible dynamics.
comment: 20 pages, 12 figures
Two-Layer Voronoi Coverage Control for Hybrid Aerial-Ground Robot Teams in Emergency Response: Implementation and Analysis
We present a comprehensive two-layer Voronoi coverage control approach for coordinating hybrid aerial-ground robot teams in hazardous material emergency response scenarios. Traditional Voronoi coverage control methods face three critical limitations in emergency contexts: heterogeneous agent capabilities with vastly different velocities, clustered initial deployment configurations, and urgent time constraints requiring rapid response rather than eventual convergence. Our method addresses these challenges through a decoupled two-layer architecture that separately optimizes aerial and ground robot positioning, with aerial agents delivering ground sensors via airdrop to high-priority locations. We provide detailed implementation of bounded Voronoi cell computation, efficient numerical integration techniques for importance-weighted centroids, and robust control strategies that prevent agent trapping. Simulation results demonstrate an 88% reduction in response time, achieving target sensor coverage (18.5% of initial sensor loss) in 25 seconds compared to 220 seconds for ground-only deployment. Complete implementation code is available at https://github.com/dHutchings/ME292B.
comment: 23 pages, 7 figures. Technical report with complete implementation details and open-source code
GPS Spoofing Attack Detection in Autonomous Vehicles Using Adaptive DBSCAN
As autonomous vehicles become an essential component of modern transportation, they are increasingly vulnerable to threats such as GPS spoofing attacks. This study presents an adaptive detection approach utilizing a dynamically tuned Density Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, designed to adjust the detection threshold ({\epsilon}) in real-time. The threshold is updated based on the recursive mean and standard deviation of displacement errors between GPS and in-vehicle sensors data, but only at instances classified as non-anomalous. Furthermore, an initial threshold, determined from 120,000 clean data samples, ensures the capability to identify even subtle and gradual GPS spoofing attempts from the beginning. To assess the performance of the proposed method, five different subsets from the real-world Honda Research Institute Driving Dataset (HDD) are selected to simulate both large and small magnitude GPS spoofing attacks. The modified algorithm effectively identifies turn-by-turn, stop, overshoot, and multiple small biased spoofing attacks, achieving detection accuracies of 98.621%, 99.960.1%, 99.880.1%, and 98.380.1%, respectively. This work provides a substantial advancement in enhancing the security and safety of AVs against GPS spoofing threats.
Aggregate Modeling of Air-Conditioner Loads Under Packet-based Control with Both On and Off Grid Access Requests
Coordination of distributed energy resources (DERs) can engender flexibility necessary to improve grid reliability. Packetized Energy Management (PEM) is a method for coordinating DERs, such as thermostatically controlled loads (TCLs) and electric vehicles, within customer quality-of-service (QoS) limits. In PEM, a DER uses local information to offer flexibility by sending a request to the DER coordinator to turn-ON or turn-OFF. Much work has focused on modeling and analyzing aggregations of DERs under PEM with fixed packet durations and only turn-ON requests. Different recent efforts to enable variable packet lengths have shown an increase in available flexibility and ramping capability, but have not been modeled in aggregate, which limits systematic analyses. To address this issue, this paper presents a new aggregate bin-based (macro) model of PEM loads that incorporates both turn-ON and turn-OFF request features, enabling the model to accurately characterize the capability of the fleet of DERs to track a power reference signal, population temperature dynamics, aggregate request rates, and variable packet lengths. Simulation-based validation is performed against an agent-based (micro) model to evaluate robustness and quantify model accuracy. Finally, the distribution of variable packet lengths from macro-model simulations are applied to inform past work on PEM with randomized packet lengths
Transforming Tarlac State University (TSU) Gymnasium to a Nearly Zero-Energy Building through Integration of a Solar Photovoltaic (PV) System
The study is anchored to the principles of Nearly-Zero Energy Building (NZEB). It aimed to transform the Tarlac State University Gymnasium into a facility with energy-efficient equipment to contribute to reducing carbon footprints by integrating a solar PV system as its renewable energy source. The researchers found out that the electrical infrastructure of the Gym was outdated, and the lighting was not energy efficient, and there were too few convenience or power outlets. There was also insufficient cooling equipment to maintain a comfortable temperature. Analysis shows that the payback period is within the average range, making it a cost-effective investment for the University. Aside from the cost of the PV System, adherence to engineering design standards will mean additional costs to replace the metal halides with LED high bay lamps, installation of additional air conditioning units, and provision of additional convenience outlets. These additional costs should be considered when evaluating the feasibility of the project. It is recommended that the integrity of the existing roof system of the Gymnasium be considered. The total cost of putting up the whole electrical system, including new lighting, cooling, and convenience loads, must be calculated to determine the total cost of implementing the whole NZEB project. Other factors in the economic evaluation may be considered to determine a more stringent result.
Decoupled Scaling 4ch Bilateral Control on the Cartesian coordinate by 6-DoF Manipulator using Rotation Matrix
Four-channel bilateral control is a method for achieving remote control with force feedback and adjustment operability by synchronizing the positions and forces of two manipulators. This is expected to significantly improve the operability of the remote control in contact-rich tasks. Among these, 4-channel bilateral control on the Cartesian coordinate system is advantageous owing to its suitability for manipulators with different structures and because it allows the dynamics in the Cartesian coordinate system to be adjusted by adjusting the control parameters, thus achieving intuitive operability for humans. This paper proposes a 4-channel bilateral control method that achieves the desired dynamics by decoupling each dimension in the Cartesian coordinate system regardless of the scaling factor.
comment: 6 pages, 4 figures, Accepted at SAMCON 2025
A Verified High-Performance Composable Object Library for Remote Direct Memory Access (Extended Version)
Remote Direct Memory Access (RDMA) is a memory technology that allows remote devices to directly write to and read from each other's memory, bypassing components such as the CPU and operating system. This enables low-latency high-throughput networking, as required for many modern data centres, HPC applications and AI/ML workloads. However, baseline RDMA comprises a highly permissive weak memory model that is difficult to use in practice and has only recently been formalised. In this paper, we introduce the Library of Composable Objects (LOCO), a formally verified library for building multi-node objects on RDMA, filling the gap between shared memory and distributed system programming. LOCO objects are well-encapsulated and take advantage of the strong locality and the weak consistency characteristics of RDMA. They have performance comparable to custom RDMA systems (e.g. distributed maps), but with a far simpler programming model amenable to formal proofs of correctness. To support verification, we develop a novel modular declarative verification framework, called Mowgli, that is flexible enough to model multinode objects and is independent of a memory consistency model. We instantiate Mowgli with the RDMA memory model, and use it to verify correctness of LOCO libraries.
Galilean Symmetry in Robotics
Galilean symmetry is the natural symmetry of inertial motion that underpins Newtonian physics. Although rigid-body symmetry is one of the most established and fundamental tools in robotics, there appears to be no comparable treatment of Galilean symmetry for a robotics audience. In this paper, we present a robotics-tailored exposition of Galilean symmetry that leverages the community's familiarity with and understanding of rigid-body transformations and pose representations. Our approach contrasts with common treatments in the physics literature that introduce Galilean symmetry as a stepping stone to Einstein's relativity. A key insight is that the Galilean matrix Lie group can be used to describe two different pose representations, Galilean frames, that use inertial velocity in the state definition, and extended poses, that use coordinate velocity. We provide three examples where applying the Galilean matrix Lie-group algebra to robotics problems is straightforward and yields significant insights: inertial navigation above the rotating Earth, manipulator kinematics, and sensor data fusion under temporal uncertainty. We believe that the time is right for the robotics community to benefit from rediscovering and extending this classical material and applying it to modern problems.
comment: Under Review
Towards Dynamic Quadrupedal Gaits: A Symmetry-Guided RL Hierarchy Enables Free Gait Transitions at Varying Speeds
Quadrupedal robots exhibit a wide range of viable gaits, but generating specific footfall sequences often requires laborious expert tuning of numerous variables, such as touch-down and lift-off events and holonomic constraints for each leg. This paper presents a unified reinforcement learning framework for generating versatile quadrupedal gaits by leveraging the intrinsic symmetries and velocity-period relationship of dynamic legged systems. We propose a symmetry-guided reward function design that incorporates temporal, morphological, and time-reversal symmetries. By focusing on preserved symmetries and natural dynamics, our approach eliminates the need for predefined trajectories, enabling smooth transitions between diverse locomotion patterns such as trotting, bounding, half-bounding, and galloping. Implemented on the Unitree Go2 robot, our method demonstrates robust performance across a range of speeds in both simulations and hardware tests, significantly improving gait adaptability without extensive reward tuning or explicit foot placement control. This work provides insights into dynamic locomotion strategies and underscores the crucial role of symmetries in robotic gait design.
Controller for Incremental Input-to-State Practical Stabilization of Partially Unknown systems with Invariance Guarantees
Incremental stability is a property of dynamical systems that ensures the convergence of trajectories with respect to each other rather than a fixed equilibrium point or a fixed trajectory. In this paper, we introduce a related stability notion called incremental input-to-state practical stability ({\delta}-ISpS), ensuring safety guarantees. We also present a feedback linearization based control design scheme that renders a partially unknown system incrementally input-to-state practically stable and safe with formal guarantees. To deal with the unknown dynamics, we utilize Gaussian process regression to approximate the model. Finally, we implement the controller synthesized by the proposed scheme on a manipulator example
comment: 2 figures,9 pages
Risk-Budgeted Control Framework for Balanced Performance and Safety in Autonomous Vehicles
This paper presents a risk-budgeted monitor with a control framework that certifies safety for autonomous driving. In this process, a sliding window is proposed to monitor for insufficient barrier residuals or nonzero tail risk, ensuring system safety. When the safety margin deteriorates, it triggers switching the safety constraint from a performance-based relaxed-control barrier function (R-CBF) to a conservative conditional value at risk (CVaR-CBF) to address the safety concern. This switching is governed by two real-time triggers: Feasibility-Triggered (FT) and Quality-Triggered (QT) conditions. In the FT condition, if the R-CBF constraint becomes infeasible or yields a suboptimal solution, the risk monitor triggers the use of the CVaR constraints for the controller. In the QT condition, the risk monitor observes the safety margin of the R-CBF solution at every step, regardless of feasibility. If it falls below the safety margin, the safety filter switches to the CVaR-CBF constraints. The proposed framework is evaluated using a model predictive controller (MPC) for autonomous driving in the presence of autonomous vehicle (AV) localization noise and obstacle position uncertainties. Multiple AV-pedestrian interaction scenarios are considered, with 1,500 Monte Carlo runs conducted for all scenarios. In the most challenging setting with pedestrian detection uncertainty of 5 m, the proposed framework achieves a 94-96% success rate of not colliding with the pedestrians over 300 trials while maintaining the lowest mean cross-track error (CTE = 3.2-3.6 m) to the reference path. The reduced CTE indicates faster trajectory recovery after obstacle avoidance, demonstrating a balance between safety and performance.
Discovering interpretable piecewise nonlinear model predictive control laws via symbolic decision trees
In this paper, we propose symbolic decision trees as surrogate models for approximating model predictive control laws. The proposed approach learns simultaneously the partition of the input domain (splitting logic) as well as local nonlinear expressions for predicting the control action leading to interpretable piecewise nonlinear control laws. The local nonlinear expressions are determined by the learning problem and are modeled using a set of basis functions. The learning task is posed as a mixed integer optimization, which is solved to global optimality with state-of-the-art global optimization solvers. We apply the proposed approach to a case study regarding the control of an isothermal reactor. The results show that the proposed approach can learn the control law accurately, leading to closed-loop performance comparable to that of a standard model predictive controller. Finally, comparison with existing interpretable models shows that the symbolic trees achieve both lower prediction error and superior closed-loop performance.
MicroRoboScope: A Portable and Integrated Mechatronic Platform for Magnetic and Acoustic Microrobotic Experimentation
This paper presents MicroRoboScope, a portable, compact, and versatile microrobotic experimentation platform designed for real-time, closed-loop control of both magnetic and acoustic microrobots. The system integrates an embedded computer, microscope, power supplies, and control circuitry into a single, low-cost and fully integrated apparatus. Custom control software developed in Python and Arduino C++ handles live video acquisition, microrobot tracking, and generation of control signals for electromagnetic coils and acoustic transducers. The platform's multi-modal actuation, accessibility, and portability make it suitable not only for specialized research laboratories but also for educational and outreach settings. By lowering the barrier to entry for microrobotic experimentation, this system enables new opportunities for research, education, and translational applications in biomedicine, tissue engineering, and robotics.
Optimal Voltage Control Using Online Exponential Barrier Method
This paper address the optimal voltage control problem of distribution systems with high penetration of inverter-based renewable energy resources, under inaccurate model information. We propose the online exponential barrier method that explicitly leverages the online feedback from grids to enhance the robustness to model inaccuracy and incorporates the voltage constraints to maintain the safety requirements. We provide analytical results on the optimal barrier parameter selection and sufficient conditions for the safety guarantee of converged voltages. We also establish theoretical results on the exponential convergence rate with proper step-size. The effectiveness of the proposed framework is validated on a 56-bus radial network, where we significantly improve the robustness against model inaccuracy compared to existing methods.
comment: Restate the theorem for readability
DUST: A Framework for Data-Driven Density Steering
We consider the problem of data-driven stochastic optimal control of an unknown LTI dynamical system. Assuming the process noise is normally distributed, we pose the problem of steering the state's mean and covariance to a target normal distribution, under noisy data collected from the underlying system, a problem commonly referred to as covariance steering (CS). A novel framework for Data-driven Uncertainty quantification and density STeering (DUST) is presented that simultaneously characterizes the noise affecting the measured data and designs an optimal affine-feedback controller to steer the density of the state to a prescribed terminal value. We use both indirect and direct data-driven design approaches based on the notions of persistency of excitation and subspace identification to exactly represent the mean and covariance dynamics of the state in terms of the data and noise realizations. Since both the mean and the covariance steering sub-problems are plagued with stochastic uncertainty arising from noisy data collection, we first estimate the noise realization from this dataset and subsequently compute tractable upper bounds on the estimation errors. The first and second moment steering problems are then solved to optimality using techniques from robust control and robust optimization. Lastly, we present an alternative control design approach based on the certainty equivalence principle and interpret the problem as one of CS under multiplicative uncertainty. We analyze the performance and efficacy of each of these data-driven approaches using a case study and compare them with their model-based counterparts.
comment: submitted to Automatica
Global Attitude Synchronization for Heterogeneous Multi-agent Systems on SO(3)
In this paper, we address the problem of attitude synchronization for a group of rigid body systems evolving on SO(3). The interaction among these systems is modeled through an undirected, connected, and acyclic graph topology. First, we present an almost global continuous distributed attitude synchronization scheme with rigorously proven stability guarantees. Thereafter, we propose two global distributed hybrid attitude synchronization schemes on SO(3). The first scheme is a hybrid control law that leverages angular velocities and relative orientations to achieve global alignment to a common orientation. The second scheme eliminates the dependence on angular velocities by introducing dynamic auxiliary variables, while ensuring global asymptotic attitude synchronization. This velocity-free control scheme relies exclusively on attitude information. The proposed schemes are applicable to heterogeneous multi-agent systems, where agents may have distinct inertia matrices. Simulation results are provided to illustrate the effectiveness of the proposed distributed attitude synchronization schemes.
Gait Transitions in Load-Pulling Quadrupeds: Insights from Sled Dogs and a Minimal SLIP Model
Quadrupedal animals employ diverse galloping strategies to optimize speed, stability, and energy efficiency. However, the biomechanical mechanisms that enable adaptive gait transitions during high-speed locomotion under load remain poorly understood. In this study, we present new empirical and modeling insights into the biomechanics of load-pulling quadrupeds, using sprint sled dogs as a model system. High-speed video and force recordings reveal that sled dogs often switch between rotary and transverse galloping gaits within just a few strides and without any observable changes in speed, stride duration, or terrain, providing clear evidence of locomotor multistability during high-speed load-pulling. To investigate the mechanical basis of these transitions, a physics-based quadrupedal Spring-Loaded Inverted Pendulum model with hybrid dynamics and prescribed footfall sequences to reproduce the asymmetric galloping patterns observed in racing sled dogs. Through trajectory optimization, we replicate experimentally observed gait sequences and identify swing-leg stiffness modulation as a key control mechanism for inducing transitions. This work provides a much-needed biomechanical perspective on high-speed animal draft and establishes a modeling framework for studying locomotion in pulling quadrupeds, with implications for both biological understanding and the design of adaptive legged systems.
Adaptive Network Security Policies via Belief Aggregation and Rollout
Evolving security vulnerabilities and shifting operational conditions require frequent updates to network security policies. These updates include adjustments to incident response procedures and modifications to access controls, among others. Reinforcement learning methods have been proposed for automating such policy adaptations, but most of the methods in the research literature lack performance guarantees and adapt slowly to changes. In this paper, we address these limitations and present a method for computing security policies that is scalable, offers theoretical guarantees, and adapts quickly to changes. It assumes a model or simulator of the system and comprises three components: belief estimation through particle filtering, offline policy computation through aggregation, and online policy adaptation through rollout. Central to our method is a new feature-based aggregation technique, which improves scalability and flexibility. We analyze the approximation error of aggregation and show that rollout efficiently adapts policies to changes under certain conditions. Simulations and testbed results demonstrate that our method outperforms state-of-the-art methods on several benchmarks, including CAGE-2.
Robotics
Preference-Conditioned Multi-Objective RL for Integrated Command Tracking and Force Compliance in Humanoid Locomotion
Humanoid locomotion requires not only accurate command tracking for navigation but also compliant responses to external forces during human interaction. Despite significant progress, existing RL approaches mainly emphasize robustness, yielding policies that resist external forces but lack compliance-particularly challenging for inherently unstable humanoids. In this work, we address this by formulating humanoid locomotion as a multi-objective optimization problem that balances command tracking and external force compliance. We introduce a preference-conditioned multi-objective RL (MORL) framework that integrates rigid command following and compliant behaviors within a single omnidirectional locomotion policy. External forces are modeled via velocity-resistance factor for consistent reward design, and training leverages an encoder-decoder structure that infers task-relevant privileged features from deployable observations. We validate our approach in both simulation and real-world experiments on a humanoid robot. Experimental results indicate that our framework not only improves adaptability and convergence over standard pipelines, but also realizes deployable preference-conditioned humanoid locomotion.
Contact Sensing via Joint Torque Sensors and a Force/Torque Sensor for Legged Robots
This paper presents a method for detecting and localizing contact along robot legs using distributed joint torque sensors and a single hip-mounted force-torque (FT) sensor using a generalized momentum-based observer framework. We designed a low-cost strain-gauge-based joint torque sensor that can be installed on every joint to provide direct torque measurements, eliminating the need for complex friction models and providing more accurate torque readings than estimation based on motor current. Simulation studies on a floating-based 2-DoF robot leg verified that the proposed framework accurately recovers contact force and location along the thigh and shin links. Through a calibration procedure, our torque sensor achieved an average 96.4% accuracy relative to ground truth measurements. Building upon the torque sensor, we performed hardware experiments on a 2-DoF manipulator, which showed sub-centimeter contact localization accuracy and force errors below 0.2 N.
comment: Proc. IEEE 21st International Conference on Automation Science and Engineering (CASE), Los Angeles, CA, USA, Aug. 17-21, 2025, pp. 1-7, doi:10.1109/CASE58245.2025.11164031
The Irrational Machine: Neurosis and the Limits of Algorithmic Safety
We present a framework for characterizing neurosis in embodied AI: behaviors that are internally coherent yet misaligned with reality, arising from interactions among planning, uncertainty handling, and aversive memory. In a grid navigation stack we catalogue recurrent modalities including flip-flop, plan churn, perseveration loops, paralysis and hypervigilance, futile search, belief incoherence, tie break thrashing, corridor thrashing, optimality compulsion, metric mismatch, policy oscillation, and limited-visibility variants. For each we give lightweight online detectors and reusable escape policies (short commitments, a margin to switch, smoothing, principled arbitration). We then show that durable phobic avoidance can persist even under full visibility when learned aversive costs dominate local choice, producing long detours despite globally safe routes. Using First/Second/Third Law as engineering shorthand for safety latency, command compliance, and resource efficiency, we argue that local fixes are insufficient; global failures can remain. To surface them, we propose genetic-programming based destructive testing that evolves worlds and perturbations to maximize law pressure and neurosis scores, yielding adversarial curricula and counterfactual traces that expose where architectural revision, not merely symptom-level patches, is required.
comment: 41 pages, 17 figures, 5 tables
Representing Data in Robotic Tactile Perception -- A Review
Robotic tactile perception is a complex process involving several computational steps performed at different levels. Tactile information is shaped by the interplay of robot actions, the mechanical properties of its body, and the software that processes the data. In this respect, high-level computation, required to process and extract information, is commonly performed by adapting existing techniques from other domains, such as computer vision, which expects input data to be properly structured. Therefore, it is necessary to transform tactile sensor data to match a specific data structure. This operation directly affects the tactile information encoded and, as a consequence, the task execution. This survey aims to address this specific aspect of the tactile perception pipeline, namely Data Representation. The paper first clearly defines its contributions to the perception pipeline and then reviews how previous studies have dealt with the problem of representing tactile information, investigating the relationships among hardware, representations, and high-level computation methods. The analysis has led to the identification of six structures commonly used in the literature to represent data. The manuscript provides discussions and guidelines for properly selecting a representation depending on operating conditions, including the available hardware, the tactile information required to be encoded, and the task at hand.
Two-Layer Voronoi Coverage Control for Hybrid Aerial-Ground Robot Teams in Emergency Response: Implementation and Analysis
We present a comprehensive two-layer Voronoi coverage control approach for coordinating hybrid aerial-ground robot teams in hazardous material emergency response scenarios. Traditional Voronoi coverage control methods face three critical limitations in emergency contexts: heterogeneous agent capabilities with vastly different velocities, clustered initial deployment configurations, and urgent time constraints requiring rapid response rather than eventual convergence. Our method addresses these challenges through a decoupled two-layer architecture that separately optimizes aerial and ground robot positioning, with aerial agents delivering ground sensors via airdrop to high-priority locations. We provide detailed implementation of bounded Voronoi cell computation, efficient numerical integration techniques for importance-weighted centroids, and robust control strategies that prevent agent trapping. Simulation results demonstrate an 88% reduction in response time, achieving target sensor coverage (18.5% of initial sensor loss) in 25 seconds compared to 220 seconds for ground-only deployment. Complete implementation code is available at https://github.com/dHutchings/ME292B.
comment: 23 pages, 7 figures. Technical report with complete implementation details and open-source code
Real2USD: Scene Representations in Universal Scene Description Language
Large Language Models (LLMs) can help robots reason about abstract task specifications. This requires augmenting classical representations of the environment used by robots with natural language-based priors. There are a number of existing approaches to doing so, but they are tailored to specific tasks, e.g., visual-language models for navigation, language-guided neural radiance fields for mapping, etc. This paper argues that the Universal Scene Description (USD) language is an effective and general representation of geometric, photometric and semantic information in the environment for LLM-based robotics tasks. Our argument is simple: a USD is an XML-based scene graph, readable by LLMs and humans alike, and rich enough to support essentially any task -- Pixar developed this language to store assets, scenes and even movies. We demonstrate a ``Real to USD'' system using a Unitree Go2 quadruped robot carrying LiDAR and a RGB camera that (i) builds an explicit USD representation of indoor environments with diverse objects and challenging settings with lots of glass, and (ii) parses the USD using Google's Gemini to demonstrate scene understanding, complex inferences, and planning. We also study different aspects of this system in simulated warehouse and hospital settings using Nvidia's Issac Sim. Code is available at https://github.com/grasp-lyrl/Real2USD .
comment: 8 pages, 10 figures, 1 table
Gain Tuning Is Not What You Need: Reward Gain Adaptation for Constrained Locomotion Learning
Existing robot locomotion learning techniques rely heavily on the offline selection of proper reward weighting gains and cannot guarantee constraint satisfaction (i.e., constraint violation) during training. Thus, this work aims to address both issues by proposing Reward-Oriented Gains via Embodied Regulation (ROGER), which adapts reward-weighting gains online based on penalties received throughout the embodied interaction process. The ratio between the positive reward (primary reward) and negative reward (penalty) gains is automatically reduced as the learning approaches the constraint thresholds to avoid violation. Conversely, the ratio is increased when learning is in safe states to prioritize performance. With a 60-kg quadruped robot, ROGER achieved near-zero constraint violation throughout multiple learning trials. It also achieved up to 50% more primary reward than the equivalent state-of-the-art techniques. In MuJoCo continuous locomotion benchmarks, including a single-leg hopper, ROGER exhibited comparable or up to 100% higher performance and 60% less torque usage and orientation deviation compared to those trained with the default reward function. Finally, real-world locomotion learning of a physical quadruped robot was achieved from scratch within one hour without any falls. Therefore, this work contributes to constraint-satisfying real-world continual robot locomotion learning and simplifies reward weighting gain tuning, potentially facilitating the development of physical robots and those that learn in the real world.
comment: RSS 2025
Controllable Generative Trajectory Prediction via Weak Preference Alignment
Deep generative models such as conditional variational autoencoders (CVAEs) have shown great promise for predicting trajectories of surrounding agents in autonomous vehicle planning. State-of-the-art models have achieved remarkable accuracy in such prediction tasks. Besides accuracy, diversity is also crucial for safe planning because human behaviors are inherently uncertain and multimodal. However, existing methods generally lack a scheme to generate controllably diverse trajectories, which is arguably more useful than randomly diversified trajectories, to the end of safe planning. To address this, we propose PrefCVAE, an augmented CVAE framework that uses weakly labeled preference pairs to imbue latent variables with semantic attributes. Using average velocity as an example attribute, we demonstrate that PrefCVAE enables controllable, semantically meaningful predictions without degrading baseline accuracy. Our results show the effectiveness of preference supervision as a cost-effective way to enhance sampling-based generative models.
Deployment and Development of a Cognitive Teleoreactive Framework for Deep Sea Autonomy
A new AUV mission planning and execution software has been tested on AUV Sentry. Dubbed DINOS-R, it draws inspiration from cognitive architectures and AUV control systems to replace the legacy MC architecture. Unlike these existing architectures, however, DINOS-R is built from the ground-up to unify symbolic decision making (for understandable, repeatable, provable behavior) with machine learning techniques and reactive behaviors, for field-readiness across oceanographic platforms. Implemented primarily in Python3, DINOS-R is extensible, modular, and reusable, with an emphasis on non-expert use as well as growth for future research in oceanography and robot algorithms. Mission specification is flexible, and can be specified declaratively. Behavior specification is similarly flexible, supporting simultaneous use of real-time task planning and hard-coded user specified plans. These features were demonstrated in the field on Sentry, in addition to a variety of simulated cases. These results are discussed, and future work is outlined.
Bhasha-Rupantarika: Algorithm-Hardware Co-design approach for Multilingual Neural Machine Translation
This paper introduces Bhasha-Rupantarika, a light and efficient multilingual translation system tailored through algorithm-hardware codesign for resource-limited settings. The method investigates model deployment at sub-octet precision levels (FP8, INT8, INT4, and FP4), with experimental results indicating a 4.1x reduction in model size (FP4) and a 4.2x speedup in inference speed, which correlates with an increased throughput of 66 tokens/s (improvement by 4.8x). This underscores the importance of ultra-low precision quantization for real-time deployment in IoT devices using FPGA accelerators, achieving performance on par with expectations. Our evaluation covers bidirectional translation between Indian and international languages, showcasing its adaptability in low-resource linguistic contexts. The FPGA deployment demonstrated a 1.96x reduction in LUTs and a 1.65x decrease in FFs, resulting in a 2.2x enhancement in throughput compared to OPU and a 4.6x enhancement compared to HPTA. Overall, the evaluation provides a viable solution based on quantisation-aware translation along with hardware efficiency suitable for deployable multilingual AI systems. The entire codes [https://github.com/mukullokhande99/Bhasha-Rupantarika/] and dataset for reproducibility are publicly available, facilitating rapid integration and further development by researchers.
UniCoD: Enhancing Robot Policy via Unified Continuous and Discrete Representation Learning
Building generalist robot policies that can handle diverse tasks in open-ended environments is a central challenge in robotics. To leverage knowledge from large-scale pretraining, prior work has typically built generalist policies either on top of vision-language understanding models (VLMs) or generative models. However, both semantic understanding from vision-language pretraining and visual dynamics modeling from visual-generation pretraining are crucial for embodied robots. Recent unified models of generation and understanding have demonstrated strong capabilities in both comprehension and generation through large-scale pretraining. We posit that robotic policy learning can likewise benefit from the combined strengths of understanding, planning and continuous future representation learning. Building on this insight, we introduce UniCoD, which acquires the ability to dynamically model high-dimensional visual features through pretraining on over 1M internet-scale instructional manipulation videos. Subsequently, UniCoD is fine-tuned on data collected from the robot embodiment, enabling the learning of mappings from predictive representations to action tokens. Extensive experiments show our approach consistently outperforms baseline methods in terms of 9\% and 12\% across simulation environments and real-world out-of-distribution tasks.
High-Fidelity Simulated Data Generation for Real-World Zero-Shot Robotic Manipulation Learning with Gaussian Splatting
The scalability of robotic learning is fundamentally bottlenecked by the significant cost and labor of real-world data collection. While simulated data offers a scalable alternative, it often fails to generalize to the real world due to significant gaps in visual appearance, physical properties, and object interactions. To address this, we propose RoboSimGS, a novel Real2Sim2Real framework that converts multi-view real-world images into scalable, high-fidelity, and physically interactive simulation environments for robotic manipulation. Our approach reconstructs scenes using a hybrid representation: 3D Gaussian Splatting (3DGS) captures the photorealistic appearance of the environment, while mesh primitives for interactive objects ensure accurate physics simulation. Crucially, we pioneer the use of a Multi-modal Large Language Model (MLLM) to automate the creation of physically plausible, articulated assets. The MLLM analyzes visual data to infer not only physical properties (e.g., density, stiffness) but also complex kinematic structures (e.g., hinges, sliding rails) of objects. We demonstrate that policies trained entirely on data generated by RoboSimGS achieve successful zero-shot sim-to-real transfer across a diverse set of real-world manipulation tasks. Furthermore, data from RoboSimGS significantly enhances the performance and generalization capabilities of SOTA methods. Our results validate RoboSimGS as a powerful and scalable solution for bridging the sim-to-real gap.
comment: 13 pages, 6 figures
SpikeGrasp: A Benchmark for 6-DoF Grasp Pose Detection from Stereo Spike Streams
Most robotic grasping systems rely on converting sensor data into explicit 3D point clouds, which is a computational step not found in biological intelligence. This paper explores a fundamentally different, neuro-inspired paradigm for 6-DoF grasp detection. We introduce SpikeGrasp, a framework that mimics the biological visuomotor pathway, processing raw, asynchronous events from stereo spike cameras, similarly to retinas, to directly infer grasp poses. Our model fuses these stereo spike streams and uses a recurrent spiking neural network, analogous to high-level visual processing, to iteratively refine grasp hypotheses without ever reconstructing a point cloud. To validate this approach, we built a large-scale synthetic benchmark dataset. Experiments show that SpikeGrasp surpasses traditional point-cloud-based baselines, especially in cluttered and textureless scenes, and demonstrates remarkable data efficiency. By establishing the viability of this end-to-end, neuro-inspired approach, SpikeGrasp paves the way for future systems capable of the fluid and efficient manipulation seen in nature, particularly for dynamic objects.
Fast Vision in the Dark: A Case for Single-Photon Imaging in Planetary Navigation
Improving robotic navigation is critical for extending exploration range and enhancing operational efficiency. Vision-based navigation relying on traditional CCD or CMOS cameras faces major challenges when complex illumination conditions are paired with motion, limiting the range and accessibility of mobile planetary robots. In this study, we propose a novel approach to planetary navigation that leverages the unique imaging capabilities of Single-Photon Avalanche Diode (SPAD) cameras. We present the first comprehensive evaluation of single-photon imaging as an alternative passive sensing technology for robotic exploration missions targeting perceptually challenging locations, with a special emphasis on high-latitude lunar regions. We detail the operating principles and performance characteristics of SPAD cameras, assess their advantages and limitations in addressing key perception challenges of upcoming exploration missions to the Moon, and benchmark their performance under representative illumination conditions.
comment: 9 pages, 6 figures, conference paper
Reinforcement Learning-based Dynamic Adaptation for Sampling-Based Motion Planning in Agile Autonomous Driving ICRA 2026
Sampling-based trajectory planners are widely used for agile autonomous driving due to their ability to generate fast, smooth, and kinodynamically feasible trajectories. However, their behavior is often governed by a cost function with manually tuned, static weights, which forces a tactical compromise that is suboptimal across the wide range of scenarios encountered in a race. To address this shortcoming, we propose using a Reinforcement Learning (RL) agent as a high-level behavioral selector that dynamically switches the cost function parameters of an analytical, low-level trajectory planner during runtime. We show the effectiveness of our approach in simulation in an autonomous racing environment where our RL-based planner achieved 0% collision rate while reducing overtaking time by up to 60% compared to state-of-the-art static planners. Our new agent now dynamically switches between aggressive and conservative behaviors, enabling interactive maneuvers unattainable with static configurations. These results demonstrate that integrating reinforcement learning as a high-level selector resolves the inherent trade-off between safety and competitiveness in autonomous racing planners. The proposed methodology offers a pathway toward adaptive yet interpretable motion planning for broader autonomous driving applications.
comment: 8 pages, submitted to the IEEE ICRA 2026, Vienna, Austria
Decoupled Scaling 4ch Bilateral Control on the Cartesian coordinate by 6-DoF Manipulator using Rotation Matrix
Four-channel bilateral control is a method for achieving remote control with force feedback and adjustment operability by synchronizing the positions and forces of two manipulators. This is expected to significantly improve the operability of the remote control in contact-rich tasks. Among these, 4-channel bilateral control on the Cartesian coordinate system is advantageous owing to its suitability for manipulators with different structures and because it allows the dynamics in the Cartesian coordinate system to be adjusted by adjusting the control parameters, thus achieving intuitive operability for humans. This paper proposes a 4-channel bilateral control method that achieves the desired dynamics by decoupling each dimension in the Cartesian coordinate system regardless of the scaling factor.
comment: 6 pages, 4 figures, Accepted at SAMCON 2025
AI-Agents for Culturally Diverse Online Higher Education Environments
As the global reach of online higher education continues to grow, universities are increasingly accommodating students from diverse cultural backgrounds \parencite{tereshko2024culturally}. This can present a number of challenges including linguistic barriers \parencite{ullah2021linguistic}, cultural differences in learning style \parencite{omidvar2012cultural}, cultural sensitivity in course design \parencite{nguyen2022cultural} and perceived isolation when students feel their perspectives or experiences are not reflected or valued in the learning environment \parencite{hansen2022belonging}. Ensuring active engagement and reasonable learning outcomes in such a environments requires distance educational systems that are not only adaptive but also culturally resonant \parencite{dalle2024cultural}. Both embodied and virtual AI-Agents have great potential in this regard as they can facilitate personalized learning and adapt their interactions and content delivery to align with students' cultural context. In addition Generative AI (GAI), such as, Large Language Models (LLMs) can amplify the potential for these culturally aware AI agents to address educational challenges due to their advanced capacity for understanding and generating contextually relevant content \parencite{wang2024large}. This chapter reviews existing research and suggests the usage of culturally aware AI-Agents, powered by GAI, to foster engagement and improve learning outcomes in culturally diverse online higher education environments.
Population-Coded Spiking Neural Networks for High-Dimensional Robotic Control
Energy-efficient and high-performance motor control remains a critical challenge in robotics, particularly for high-dimensional continuous control tasks with limited onboard resources. While Deep Reinforcement Learning (DRL) has achieved remarkable results, its computational demands and energy consumption limit deployment in resource-constrained environments. This paper introduces a novel framework combining population-coded Spiking Neural Networks (SNNs) with DRL to address these challenges. Our approach leverages the event-driven, asynchronous computation of SNNs alongside the robust policy optimization capabilities of DRL, achieving a balance between energy efficiency and control performance. Central to this framework is the Population-coded Spiking Actor Network (PopSAN), which encodes high-dimensional observations into neuronal population activities and enables optimal policy learning through gradient-based updates. We evaluate our method on the Isaac Gym platform using the PixMC benchmark with complex robotic manipulation tasks. Experimental results on the Franka robotic arm demonstrate that our approach achieves energy savings of up to 96.10% compared to traditional Artificial Neural Networks (ANNs) while maintaining comparable control performance. The trained SNN policies exhibit robust finger position tracking with minimal deviation from commanded trajectories and stable target height maintenance during pick-and-place operations. These results position population-coded SNNs as a promising solution for energy-efficient, high-performance robotic control in resource-constrained applications, paving the way for scalable deployment in real-world robotics systems.
SuperEx: Enhancing Indoor Mapping and Exploration using Non-Line-of-Sight Perception
Efficient exploration and mapping in unknown indoor environments is a fundamental challenge, with high stakes in time-critical settings. In current systems, robot perception remains confined to line-of-sight; occluded regions remain unknown until physically traversed, leading to inefficient exploration when layouts deviate from prior assumptions. In this work, we bring non-line-of-sight (NLOS) sensing to robotic exploration. We leverage single-photon LiDARs, which capture time-of-flight histograms that encode the presence of hidden objects - allowing robots to look around blind corners. Recent single-photon LiDARs have become practical and portable, enabling deployment beyond controlled lab settings. Prior NLOS works target 3D reconstruction in static, lab-based scenarios, and initial efforts toward NLOS-aided navigation consider simplified geometries. We introduce SuperEx, a framework that integrates NLOS sensing directly into the mapping-exploration loop. SuperEx augments global map prediction with beyond-line-of-sight cues by (i) carving empty NLOS regions from timing histograms and (ii) reconstructing occupied structure via a two-step physics-based and data-driven approach that leverages structural regularities. Evaluations on complex simulated maps and the real-world KTH Floorplan dataset show a 12% gain in mapping accuracy under < 30% coverage and improved exploration efficiency compared to line-of-sight baselines, opening a path to reliable mapping beyond direct visibility.
comment: 8 pages, 9 Figures , Project webpage: https://super-ex.github.io/
Align2Act: Instruction-Tuned Models for Human-Aligned Autonomous Driving
Motion planning in complex scenarios is a core challenge in autonomous driving. Conventional methods apply predefined rules or learn from driving data to generate trajectories, while recent approaches leverage large language models (LLMs) for decision-making. However, it remains unclear whether LLMs truly capture human driving logic. We propose Align2Act, a motion planning framework that transforms instruction-tuned LLMs into interpretable planners aligned with human behavior. We derive structured driving instructions based on human reasoning patterns (e.g., anticipate hazards, yield at intersections) and traffic rules (e.g., stop at red lights, maintain lane boundaries). Our Align2ActChain module guides step-by-step reasoning to produce both an interpretable rationale and a safe trajectory. By fine-tuning LLaMA-2-7B with LoRA on one million scenarios from the nuPlan dataset, our method achieves an open-loop score of 85.17 and closed-loop scores of 70.31 (non-reactive) and 66.96 (reactive) on Test14-random. Unlike prior work focused on synthetic or open-loop settings, we demonstrate improved planning quality and human-likeness on the real-world nuPlan closed-loop benchmark. Ablation studies confirm that structured reasoning significantly improves performance over baseline LLM planners.
Galilean Symmetry in Robotics
Galilean symmetry is the natural symmetry of inertial motion that underpins Newtonian physics. Although rigid-body symmetry is one of the most established and fundamental tools in robotics, there appears to be no comparable treatment of Galilean symmetry for a robotics audience. In this paper, we present a robotics-tailored exposition of Galilean symmetry that leverages the community's familiarity with and understanding of rigid-body transformations and pose representations. Our approach contrasts with common treatments in the physics literature that introduce Galilean symmetry as a stepping stone to Einstein's relativity. A key insight is that the Galilean matrix Lie group can be used to describe two different pose representations, Galilean frames, that use inertial velocity in the state definition, and extended poses, that use coordinate velocity. We provide three examples where applying the Galilean matrix Lie-group algebra to robotics problems is straightforward and yields significant insights: inertial navigation above the rotating Earth, manipulator kinematics, and sensor data fusion under temporal uncertainty. We believe that the time is right for the robotics community to benefit from rediscovering and extending this classical material and applying it to modern problems.
comment: Under Review
Towards Dynamic Quadrupedal Gaits: A Symmetry-Guided RL Hierarchy Enables Free Gait Transitions at Varying Speeds
Quadrupedal robots exhibit a wide range of viable gaits, but generating specific footfall sequences often requires laborious expert tuning of numerous variables, such as touch-down and lift-off events and holonomic constraints for each leg. This paper presents a unified reinforcement learning framework for generating versatile quadrupedal gaits by leveraging the intrinsic symmetries and velocity-period relationship of dynamic legged systems. We propose a symmetry-guided reward function design that incorporates temporal, morphological, and time-reversal symmetries. By focusing on preserved symmetries and natural dynamics, our approach eliminates the need for predefined trajectories, enabling smooth transitions between diverse locomotion patterns such as trotting, bounding, half-bounding, and galloping. Implemented on the Unitree Go2 robot, our method demonstrates robust performance across a range of speeds in both simulations and hardware tests, significantly improving gait adaptability without extensive reward tuning or explicit foot placement control. This work provides insights into dynamic locomotion strategies and underscores the crucial role of symmetries in robotic gait design.
MonoSE(3)-Diffusion: A Monocular SE(3) Diffusion Framework for Robust Camera-to-Robot Pose Estimation
We propose MonoSE(3)-Diffusion, a monocular SE(3) diffusion framework that formulates markerless, image-based robot pose estimation as a conditional denoising diffusion process. The framework consists of two processes: a visibility-constrained diffusion process for diverse pose augmentation and a timestep-aware reverse process for progressive pose refinement. The diffusion process progressively perturbs ground-truth poses to noisy transformations for training a pose denoising network. Importantly, we integrate visibility constraints into the process, ensuring the transformations remain within the camera field of view. Compared to the fixed-scale perturbations used in current methods, the diffusion process generates in-view and diverse training poses, thereby improving the network generalization capability. Furthermore, the reverse process iteratively predicts the poses by the denoising network and refines pose estimates by sampling from the diffusion posterior of current timestep, following a scheduled coarse-to-fine procedure. Moreover, the timestep indicates the transformation scales, which guide the denoising network to achieve more accurate pose predictions. The reverse process demonstrates higher robustness than direct prediction, benefiting from its timestep-aware refinement scheme. Our approach demonstrates improvements across two benchmarks (DREAM and RoboKeyGen), achieving a notable AUC of 66.75 on the most challenging dataset, representing a 32.3% gain over the state-of-the-art.
Hierarchical Planning for Long-Horizon Multi-Target Tracking Under Target Motion Uncertainty
Achieving persistent tracking of multiple dynamic targets over a large spatial area poses significant challenges for a single-robot system with constrained sensing capabilities. As the robot moves to track different targets, the ones outside the field of view accumulate uncertainty, making them progressively harder to track. An effective path planning algorithm must manage uncertainty over a long horizon and account for the risk of permanently losing track of targets that remain unseen for too long. However, most existing approaches rely on short planning horizons and assume small, bounded environments, resulting in poor tracking performance and target loss in large-scale scenarios. In this paper, we present a hierarchical planner for tracking multiple moving targets with an aerial vehicle. To address the challenge of tracking non-static targets, our method incorporates motion models and uncertainty propagation during path execution, allowing for more informed decision-making. We decompose the multi-target tracking task into sub-tasks of single target search and detection, and our proposed pipeline consists a novel low-level coverage planner that enables searching for a target in an evolving belief area, and an estimation method to assess the likelihood of success for each sub-task, making it possible to convert the active target tracking task to a Markov decision process (MDP) that we solve with a tree-based algorithm to determine the sequence of sub-tasks. We validate our approach in simulation, demonstrating its effectiveness compared to existing planners for active target tracking tasks, and our proposed planner outperforms existing approaches, achieving a reduction of 11-70% in final uncertainty across different environments.
comment: 8 pages, 7 figures. Accepted to IEEE Robotics and Automation Letters (RAL), 2025
MicroRoboScope: A Portable and Integrated Mechatronic Platform for Magnetic and Acoustic Microrobotic Experimentation
This paper presents MicroRoboScope, a portable, compact, and versatile microrobotic experimentation platform designed for real-time, closed-loop control of both magnetic and acoustic microrobots. The system integrates an embedded computer, microscope, power supplies, and control circuitry into a single, low-cost and fully integrated apparatus. Custom control software developed in Python and Arduino C++ handles live video acquisition, microrobot tracking, and generation of control signals for electromagnetic coils and acoustic transducers. The platform's multi-modal actuation, accessibility, and portability make it suitable not only for specialized research laboratories but also for educational and outreach settings. By lowering the barrier to entry for microrobotic experimentation, this system enables new opportunities for research, education, and translational applications in biomedicine, tissue engineering, and robotics.
RobotFleet: An Open-Source Framework for Centralized Multi-Robot Task Planning
Coordinating heterogeneous robot fleets to achieve multiple goals is challenging in multi-robot systems. We introduce an open-source and extensible framework for centralized multi-robot task planning and scheduling that leverages LLMs to enable fleets of heterogeneous robots to accomplish multiple tasks. RobotFleet provides abstractions for planning, scheduling, and execution across robots deployed as containerized services to simplify fleet scaling and management. The framework maintains a shared declarative world state and two-way communication for task execution and replanning. By modularizing each layer of the autonomy stack and using LLMs for open-world reasoning, RobotFleet lowers the barrier to building scalable multi-robot systems. The code can be found here: https://github.com/therohangupta/robot-fleet.
Zero-Shot Large Language Model Agents for Fully Automated Radiotherapy Treatment Planning NeurIPS 2025
Radiation therapy treatment planning is an iterative, expertise-dependent process, and the growing burden of cancer cases has made reliance on manual planning increasingly unsustainable, underscoring the need for automation. In this study, we propose a workflow that leverages a large language model (LLM)-based agent to navigate inverse treatment planning for intensity-modulated radiation therapy (IMRT). The LLM agent was implemented to directly interact with a clinical treatment planning system (TPS) to iteratively extract intermediate plan states and propose new constraint values to guide inverse optimization. The agent's decision-making process is informed by current observations and previous optimization attempts and evaluations, allowing for dynamic strategy refinement. The planning process was performed in a zero-shot inference setting, where the LLM operated without prior exposure to manually generated treatment plans and was utilized without any fine-tuning or task-specific training. The LLM-generated plans were evaluated on twenty head-and-neck cancer cases against clinical manual plans, with key dosimetric endpoints analyzed and reported. The LLM-generated plans achieved comparable organ-at-risk (OAR) sparing relative to clinical plans while demonstrating improved hot spot control (Dmax: 106.5% vs. 108.8%) and superior conformity (conformity index: 1.18 vs. 1.39 for boost PTV; 1.82 vs. 1.88 for primary PTV). This study demonstrates the feasibility of a zero-shot, LLM-driven workflow for automated IMRT treatment planning in a commercial TPS. The proposed approach provides a generalizable and clinically applicable solution that could reduce planning variability and support broader adoption of AI-based planning strategies.
comment: Accepted for poster presentation at the NeurIPS 2025 Workshop on GenAI for Health: Potential, Trust, and Policy Compliance
VendiRL: A Framework for Self-Supervised Reinforcement Learning of Diversely Diverse Skills NeurIPS 2025
In self-supervised reinforcement learning (RL), one of the key challenges is learning a diverse set of skills to prepare agents for unknown future tasks. Despite impressive advances, scalability and evaluation remain prevalent issues. Regarding scalability, the search for meaningful skills can be obscured by high-dimensional feature spaces, where relevant features may vary across downstream task domains. For evaluating skill diversity, defining what constitutes "diversity" typically requires a hard commitment to a specific notion of what it means for skills to be diverse, potentially leading to inconsistencies in how skill diversity is understood, making results across different approaches hard to compare, and leaving many forms of diversity unexplored. To address these issues, we adopt a measure of sample diversity that translates ideas from ecology to machine learning -- the Vendi Score -- allowing the user to specify and evaluate any desired form of diversity. We demonstrate how this metric facilitates skill evaluation and introduce VendiRL, a unified framework for learning diversely diverse sets of skills. Given distinct similarity functions, VendiRL motivates distinct forms of diversity, which could support skill-diversity pretraining in new and richly interactive environments where optimising for various forms of diversity may be desirable.
comment: 17 pages including appendices, full paper at the Scaling Environments for Agents workshop at NeurIPS 2025
SVN-ICP: Uncertainty Estimation of ICP-based LiDAR Odometry using Stein Variational Newton
This letter introduces SVN-ICP, a novel Iterative Closest Point (ICP) algorithm with uncertainty estimation that leverages Stein Variational Newton (SVN) on manifold. Designed specifically for fusing LiDAR odometry in multisensor systems, the proposed method ensures accurate pose estimation and consistent noise parameter inference, even in LiDAR-degraded environments. By approximating the posterior distribution using particles within the Stein Variational Inference framework, SVN-ICP eliminates the need for explicit noise modeling or manual parameter tuning. To evaluate its effectiveness, we integrate SVN-ICP into a simple error-state Kalman filter alongside an IMU and test it across multiple datasets spanning diverse environments and robot types. Extensive experimental results demonstrate that our approach outperforms best-in-class methods on challenging scenarios while providing reliable uncertainty estimates.
Grasping Deformable Objects via Reinforcement Learning with Cross-Modal Attention to Visuo-Tactile Inputs
We consider the problem of grasping deformable objects with soft shells using a robotic gripper. Such objects have a center-of-mass that changes dynamically and are fragile so prone to burst. Thus, it is difficult for robots to generate appropriate control inputs not to drop or break the object while performing manipulation tasks. Multi-modal sensing data could help understand the grasping state through global information (e.g., shapes, pose) from visual data and local information around the contact (e.g., pressure) from tactile data. Although they have complementary information that can be beneficial to use together, fusing them is difficult owing to their different properties. We propose a method based on deep reinforcement learning (DRL) that generates control inputs of a simple gripper from visuo-tactile sensing information. Our method employs a cross-modal attention module in the encoder network and trains it in a self-supervised manner using the loss function of the RL agent. With the multi-modal fusion, the proposed method can learn the representation for the DRL agent from the visuo-tactile sensory data. The experimental result shows that cross-modal attention is effective to outperform other early and late data fusion methods across different environments including unseen robot motions and objects.
DexGarmentLab: Dexterous Garment Manipulation Environment with Generalizable Policy NeurIPS2025
Garment manipulation is a critical challenge due to the diversity in garment categories, geometries, and deformations. Despite this, humans can effortlessly handle garments, thanks to the dexterity of our hands. However, existing research in the field has struggled to replicate this level of dexterity, primarily hindered by the lack of realistic simulations of dexterous garment manipulation. Therefore, we propose DexGarmentLab, the first environment specifically designed for dexterous (especially bimanual) garment manipulation, which features large-scale high-quality 3D assets for 15 task scenarios, and refines simulation techniques tailored for garment modeling to reduce the sim-to-real gap. Previous data collection typically relies on teleoperation or training expert reinforcement learning (RL) policies, which are labor-intensive and inefficient. In this paper, we leverage garment structural correspondence to automatically generate a dataset with diverse trajectories using only a single expert demonstration, significantly reducing manual intervention. However, even extensive demonstrations cannot cover the infinite states of garments, which necessitates the exploration of new algorithms. To improve generalization across diverse garment shapes and deformations, we propose a Hierarchical gArment-manipuLation pOlicy (HALO). It first identifies transferable affordance points to accurately locate the manipulation area, then generates generalizable trajectories to complete the task. Through extensive experiments and detailed analysis of our method and baseline, we demonstrate that HALO consistently outperforms existing methods, successfully generalizing to previously unseen instances even with significant variations in shape and deformation where others fail. Our project page is available at: https://wayrise.github.io/DexGarmentLab/.
comment: NeurIPS2025 Spotlight
Humanoid Robots and Humanoid AI: Review, Perspectives and Directions
In the approximately century-long journey of robotics, humanoid robots made their debut around six decades ago. While current humanoids bear human-like appearances, none have embodied true humaneness, remaining distant from achieving human-like to human-level intelligence. The rapid recent advancements in generative AI and (multimodal) large language models have further reignited and escalated interest in humanoids towards real-time, interactive, and multimodal designs and applications, such as fostering humanoid workers, advisers, educators, medical professionals, caregivers, and receptionists. These unveil boundless opportunities of transforming 1) AI robotics into a research era of humanoid AI, and 2) AI robots into new-generation humanoid AI robots (AI humanoids). Our unique and comprehensive review of about 30 reported humanoids discloses a systematic terminology and a paradigmatic landscape of human-looking to human-like and human-level humanoids. It inspires comprehensive new perspectives and directions of humanoid AI as an area: transitioning from human-looking to humane humanoids, humanizing humanoids with functional and nonfunctional specifications, and cultivating technical and actionable advances of AI humanoids. Humanoid AI and AI humanoids nurture symbiotic advancements and future opportunities of synthesizing and transforming humanity modeling and conventional, generative to human-level AI into humanoid robotics.
comment: 35 pages, 3 figures
CHD: Coupled Hierarchical Diffusion for Long-Horizon Tasks
Diffusion-based planners have shown strong performance in short-horizon tasks but often fail in complex, long-horizon settings. We trace the failure to loose coupling between high-level (HL) sub-goal selection and low-level (LL) trajectory generation, which leads to incoherent plans and degraded performance. We propose Coupled Hierarchical Diffusion (CHD), a framework that models HL sub-goals and LL trajectories jointly within a unified diffusion process. A shared classifier passes LL feedback upstream so that sub-goals self-correct while sampling proceeds. This tight HL-LL coupling improves trajectory coherence and enables scalable long-horizon diffusion planning. Experiments across maze navigation, tabletop manipulation, and household environments show that CHD consistently outperforms both flat and hierarchical diffusion baselines. Our website is: https://sites.google.com/view/chd2025/home
Robotics
Learning to Throw-Flip IROS 2025
Dynamic manipulation, such as robot tossing or throwing objects, has recently gained attention as a novel paradigm to speed up logistic operations. However, the focus has predominantly been on the object's landing location, irrespective of its final orientation. In this work, we present a method enabling a robot to accurately "throw-flip" objects to a desired landing pose (position and orientation). Conventionally, objects thrown by revolute robots suffer from parasitic rotation, resulting in highly restricted and uncontrollable landing poses. Our approach is based on two key design choices: first, leveraging the impulse-momentum principle, we design a family of throwing motions that effectively decouple the parasitic rotation, significantly expanding the feasible set of landing poses. Second, we combine a physics-based model of free flight with regression-based learning methods to account for unmodeled effects. Real robot experiments demonstrate that our framework can learn to throw-flip objects to a pose target within ($\pm$5 cm, $\pm$45 degrees) threshold in dozens of trials. Thanks to data assimilation, incorporating projectile dynamics reduces sample complexity by an average of 40% when throw-flipping to unseen poses compared to end-to-end learning methods. Additionally, we show that past knowledge on in-hand object spinning can be effectively reused, accelerating learning by 70% when throwing a new object with a Center of Mass (CoM) shift. A video summarizing the proposed method and the hardware experiments is available at https://youtu.be/txYc9b1oflU.
comment: Accepted to IROS 2025. Video Summary: https://youtu.be/txYc9b1oflU
sqrtVINS: Robust and Ultrafast Square-Root Filter-based 3D Motion Tracking
In this paper, we develop and open-source, for the first time, a square-root filter (SRF)-based visual-inertial navigation system (VINS), termed sqrtVINS, which is ultra-fast, numerically stable, and capable of dynamic initialization even under extreme conditions (i.e., extremely small time window). Despite recent advancements in VINS, resource constraints and numerical instability on embedded (robotic) systems with limited precision remain critical challenges. A square-root covariance-based filter offers a promising solution by providing numerical stability, efficient memory usage, and guaranteed positive semi-definiteness. However, canonical SRFs suffer from inefficiencies caused by disruptions in the triangular structure of the covariance matrix during updates. The proposed method significantly improves VINS efficiency with a novel Cholesky decomposition (LLT)-based SRF update, by fully exploiting the system structure to preserve the structure. Moreover, we design a fast, robust, dynamic initialization method, which first recovers the minimal states without triangulating 3D features and then efficiently performs iterative SRF update to refine the full states, enabling seamless VINS operation. The proposed LLT-based SRF is extensively verified through numerical studies, demonstrating superior numerical stability and achieving robust efficient performance on 32-bit single-precision floats, operating at twice the speed of state-of-the-art (SOTA) methods. Our initialization method, tested on both mobile workstations and Jetson Nano computers, achieving a high success rate of initialization even within a 100 ms window under minimal conditions. Finally, the proposed sqrtVINS is extensively validated across diverse scenarios, demonstrating strong efficiency, robustness, and reliability. The full open-source implementation is released to support future research and applications.
Rise of the Robochemist
Chemistry, a long-standing discipline, has historically relied on manual and often time-consuming processes. While some automation exists, the field is now on the cusp of a significant evolution driven by the integration of robotics and artificial intelligence (AI), giving rise to the concept of the robochemist: a new paradigm where autonomous systems assist in designing, executing, and analyzing experiments. Robochemists integrate mobile manipulators, advanced perception, teleoperation, and data-driven protocols to execute experiments with greater adaptability, reproducibility, and safety. Rather than a fully automated replacement for human chemists, we envisioned the robochemist as a complementary partner that works collaboratively to enhance discovery, enabling a more efficient exploration of chemical space and accelerating innovation in pharmaceuticals, materials science, and sustainable manufacturing. This article traces the technologies, applications, and challenges that define this transformation, highlighting both the opportunities and the responsibilities that accompany the emergence of the robochemist. Ultimately, the future of chemistry is argued to lie in a symbiotic partnership where human intuition and expertise is amplified by robotic precision and AI-driven insight.
comment: This article was originally published in the IEEE Systems, Man, and Cybernetics Society eNewsletter, September 2025 issue: https://www.ieeesmc.org/wp-content/uploads/2024/10/FeatureArticle_Sept25.pdf
KG-MAS: Knowledge Graph-Enhanced Multi-Agent Infrastructure for coupling physical and digital robotic environments
The seamless integration of physical and digital environments in Cyber-Physical Systems(CPS), particularly within Industry 4.0, presents significant challenges stemming from system heterogeneity and complexity. Traditional approaches often rely on rigid, data-centric solutions like co-simulation frameworks or brittle point-to-point middleware bridges, which lack the semantic richness and flexibility required for intelligent, autonomous coordination. This report introduces the Knowledge Graph-Enhanced Multi-Agent Infrastructure(KG-MAS), as resolution in addressing such limitations. KG-MAS leverages a centralized Knowledge Graph (KG) as a dynamic, shared world model, providing a common semantic foundation for a Multi-Agent System(MAS). Autonomous agents, representing both physical and digital components, query this KG for decision-making and update it with real-time state information. The infrastructure features a model-driven architecture which facilitates the automatic generation of agents from semantic descriptions, thereby simplifying system extension and maintenance. By abstracting away underlying communication protocols and providing a unified, intelligent coordination mechanism, KG-MAS offers a robust, scalable, and flexible solution for coupling heterogeneous physical and digital robotic environments.
Bridging Perspectives: Foundation Model Guided BEV Maps for 3D Object Detection and Tracking
Camera-based 3D object detection and tracking are essential for perception in autonomous driving. Current state-of-the-art approaches often rely exclusively on either perspective-view (PV) or bird's-eye-view (BEV) features, limiting their ability to leverage both fine-grained object details and spatially structured scene representations. In this work, we propose DualViewDistill, a hybrid detection and tracking framework that incorporates both PV and BEV camera image features to leverage their complementary strengths. Our approach introduces BEV maps guided by foundation models, leveraging descriptive DINOv2 features that are distilled into BEV representations through a novel distillation process. By integrating PV features with BEV maps enriched with semantic and geometric features from DINOv2, our model leverages this hybrid representation via deformable aggregation to enhance 3D object detection and tracking. Extensive experiments on the nuScenes and Argoverse 2 benchmarks demonstrate that DualViewDistill achieves state-of-the-art performance. The results showcase the potential of foundation model BEV maps to enable more reliable perception for autonomous driving. We make the code and pre-trained models available at https://dualviewdistill.cs.uni-freiburg.de .
X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model
Successful generalist Vision-Language-Action (VLA) models rely on effective training across diverse robotic platforms with large-scale, cross-embodiment, heterogeneous datasets. To facilitate and leverage the heterogeneity in rich, diverse robotic data sources, we propose a novel Soft Prompt approach with minimally added parameters, by infusing prompt learning concepts into cross-embodiment robot learning and introducing separate sets of learnable embeddings for each distinct data source. These embeddings serve as embodiment-specific prompts, which in unity empower VLA models with effective exploitation of varying cross-embodiment features. Our new X-VLA, a neat flow-matching-based VLA architecture, relies exclusively on soft-prompted standard Transformer encoders, enjoying both scalability and simplicity. Evaluated across 6 simulations as well as 3 real-world robots, our 0.9B instantiation-X-VLA-0.9B simultaneously achieves SOTA performance over a sweep of benchmarks, demonstrating superior results on a wide axes of capabilities, from flexible dexterity to quick adaptation across embodiments, environments, and tasks. Website: https://thu-air-dream.github.io/X-VLA/
comment: preprint, technical report, 33 pages
A3RNN: Bi-directional Fusion of Bottom-up and Top-down Process for Developmental Visual Attention in Robots
This study investigates the developmental interaction between top-down (TD) and bottom-up (BU) visual attention in robotic learning. Our goal is to understand how structured, human-like attentional behavior emerges through the mutual adaptation of TD and BU mechanisms over time. To this end, we propose a novel attention model $A^3 RNN$ that integrates predictive TD signals and saliency-based BU cues through a bi-directional attention architecture. We evaluate our model in robotic manipulation tasks using imitation learning. Experimental results show that attention behaviors evolve throughout training, from saliency-driven exploration to prediction-driven direction. Initially, BU attention highlights visually salient regions, which guide TD processes, while as learning progresses, TD attention stabilizes and begins to reshape what is perceived as salient. This trajectory reflects principles from cognitive science and the free-energy framework, suggesting the importance of self-organizing attention through interaction between perception and internal prediction. Although not explicitly optimized for stability, our model exhibits more coherent and interpretable attention patterns than baselines, supporting the idea that developmental mechanisms contribute to robust attention formation.
comment: 8 pages, 5 figures
UF-RNN: Real-Time Adaptive Motion Generation Using Uncertainty-Driven Foresight Prediction
Training robots to operate effectively in environments with uncertain states, such as ambiguous object properties or unpredictable interactions, remains a longstanding challenge in robotics. Imitation learning methods typically rely on successful examples and often neglect failure scenarios where uncertainty is most pronounced. To address this limitation, we propose the Uncertainty-driven Foresight Recurrent Neural Network (UF-RNN), a model that combines standard time-series prediction with an active "Foresight" module. This module performs internal simulations of multiple future trajectories and refines the hidden state to minimize predicted variance, enabling the model to selectively explore actions under high uncertainty. We evaluate UF-RNN on a door-opening task in both simulation and a real-robot setting, demonstrating that, despite the absence of explicit failure demonstrations, the model exhibits robust adaptation by leveraging self-induced chaotic dynamics in its latent space. When guided by the Foresight module, these chaotic properties stimulate exploratory behaviors precisely when the environment is ambiguous, yielding improved success rates compared to conventional stochastic RNN baselines. These findings suggest that integrating uncertainty-driven foresight into imitation learning pipelines can significantly enhance a robot's ability to handle unpredictable real-world conditions.
comment: 8 pages, 6 figures
It Takes Two: Learning Interactive Whole-Body Control Between Humanoid Robots
The true promise of humanoid robotics lies beyond single-agent autonomy: two or more humanoids must engage in physically grounded, socially meaningful whole-body interactions that echo the richness of human social interaction. However, single-humanoid methods suffer from the isolation issue, ignoring inter-agent dynamics and causing misaligned contacts, interpenetrations, and unrealistic motions. To address this, we present Harmanoid , a dual-humanoid motion imitation framework that transfers interacting human motions to two robots while preserving both kinematic fidelity and physical realism. Harmanoid comprises two key components: (i) contact-aware motion retargeting, which restores inter-body coordination by aligning SMPL contacts with robot vertices, and (ii) interaction-driven motion controller, which leverages interaction-specific rewards to enforce coordinated keypoints and physically plausible contacts. By explicitly modeling inter-agent contacts and interaction-aware dynamics, Harmanoid captures the coupled behaviors between humanoids that single-humanoid frameworks inherently overlook. Experiments demonstrate that Harmanoid significantly improves interactive motion imitation, surpassing existing single-humanoid frameworks that largely fail in such scenarios.
Dejavu: Post-Deployment Learning for Embodied Agents via Experience Feedback
Embodied agents face a fundamental limitation: once deployed in real-world environments to perform specific tasks, they are unable to acquire new useful knowledge to enhance task performance. In this paper, we propose a general post-deployment learning framework called Dejavu, which employs an Experience Feedback Network (EFN) and augments the frozen Vision-Language-Action (VLA) policy with retrieved execution memories. EFN automatically identifies contextually successful prior action experiences and conditions action prediction on this retrieved guidance. We adopt reinforcement learning with semantic similarity rewards on EFN to ensure that the predicted actions align with past successful behaviors under current observations. During deployment, EFN continually enriches its memory with new trajectories, enabling the agent to exhibit "learning from experience" despite fixed weights. Experiments across diverse embodied tasks show that EFN significantly improves adaptability, robustness, and success rates over frozen baselines. These results highlight a promising path toward embodied agents that continually refine their behavior after deployment.
CompassNav: Steering From Path Imitation To Decision Understanding In Navigation
The dominant paradigm for training Large Vision-Language Models (LVLMs) in navigation relies on imitating expert trajectories. This approach reduces the complex navigation task to a sequence-to-sequence replication of a single correct path, fundamentally limiting the agent's ability to explore and generalize. In this work, we argue for and introduce a new paradigm: a shift from Path Imitation to Decision Understanding. The goal of this paradigm is to build agents that do not just follow, but truly understand how to navigate. We materialize this through two core contributions: first, we introduce Compass-Data-22k, a novel 22k-trajectory dataset.Its Reinforcement Fine-Tuning (RFT) subset provides a panoramic view of the decision landscape by annotating all feasible actions with A* geodesic distances. Second, we design a novel gap-aware hybrid reward function that dynamically adapts its feedback to decision certainty, shifting between decisive signals for optimal actions and nuanced scores to encourage exploration. Integrated into an SFT-then-RFT recipe, our CompassNav agent is trained not to memorize static routes, but to develop an internal ``compass'' that constantly intuits the direction to the goal by evaluating the relative quality of all possible moves. This approach enables our 7B agent to set a new state-of-the-art on Goal navigation benchmarks, outperforming even larger proprietary models, and achieve robust real-world goal navigation on a physical robot.
Ctrl-World: A Controllable Generative World Model for Robot Manipulation
Generalist robot policies can now perform a wide range of manipulation skills, but evaluating and improving their ability with unfamiliar objects and instructions remains a significant challenge. Rigorous evaluation requires a large number of real-world rollouts, while systematic improvement demands additional corrective data with expert labels. Both of these processes are slow, costly, and difficult to scale. World models offer a promising, scalable alternative by enabling policies to rollout within imagination space. However, a key challenge is building a controllable world model that can handle multi-step interactions with generalist robot policies. This requires a world model compatible with modern generalist policies by supporting multi-view prediction, fine-grained action control, and consistent long-horizon interactions, which is not achieved by previous works. In this paper, we make a step forward by introducing a controllable multi-view world model that can be used to evaluate and improve the instruction-following ability of generalist robot policies. Our model maintains long-horizon consistency with a pose-conditioned memory retrieval mechanism and achieves precise action control through frame-level action conditioning. Trained on the DROID dataset (95k trajectories, 564 scenes), our model generates spatially and temporally consistent trajectories under novel scenarios and new camera placements for over 20 seconds. We show that our method can accurately rank policy performance without real-world robot rollouts. Moreover, by synthesizing successful trajectories in imagination and using them for supervised fine-tuning, our approach can improve policy success by 44.7\%.
comment: 17 pages
Beyond ADE and FDE: A Comprehensive Evaluation Framework for Safety-Critical Prediction in Multi-Agent Autonomous Driving Scenarios
Current evaluation methods for autonomous driving prediction models rely heavily on simplistic metrics such as Average Displacement Error (ADE) and Final Displacement Error (FDE). While these metrics offer basic performance assessments, they fail to capture the nuanced behavior of prediction modules under complex, interactive, and safety-critical driving scenarios. For instance, existing benchmarks do not distinguish the influence of nearby versus distant agents, nor systematically test model robustness across varying multi-agent interactions. This paper addresses this critical gap by proposing a novel testing framework that evaluates prediction performance under diverse scene structures, saying, map context, agent density and spatial distribution. Through extensive empirical analysis, we quantify the differential impact of agent proximity on target trajectory prediction and identify scenario-specific failure cases that are not exposed by traditional metrics. Our findings highlight key vulnerabilities in current state-of-the-art prediction models and demonstrate the importance of scenario-aware evaluation. The proposed framework lays the groundwork for rigorous, safety-driven prediction validation, contributing significantly to the identification of failure-prone corner cases and the development of robust, certifiable prediction systems for autonomous vehicles.
Ionospheric and Plasmaspheric Delay Characterization for Lunar Terrestrial GNSS Receivers with Global Core Plasma Model
Recent advancements in lunar positioning, navigation, and timing (PNT) have demonstrated that terrestrial GNSS signals, including weak sidelobe transmissions, can be exploited for lunar spacecraft positioning and timing. While GNSS-based navigation at the Moon has been validated recently, unmodeled ionospheric and plasmaspheric delays remain a significant error source, particularly given the unique signal geometry and extended propagation paths. This paper characterizes these delays using the Global Core Plasma Model (GCPM) and a custom low-cost ray-tracing algorithm that iteratively solves for bent signal paths. We simulate first-, second-, and third-order group delays, as well as excess path length from ray bending, for GNSS signals received at both lunar orbit and the lunar south pole under varying solar and geomagnetic conditions. Results show that mean group delays are typically on the order of 1 m, but can exceed 100 m for low-altitude ray paths during high solar activity, while bending delays are generally smaller but non-negligible for low-altitude ray paths. We also quantify the influence of signal frequency, geomagnetic $K_p$ index, and solar R12 index. These findings inform the design of robust positioning and timing algorithms that utilize terrestrial GNSS signals.
comment: Submitted NAVIGATION: Journal of the Institute of Navigation
LOMORO: Long-term Monitoring of Dynamic Targets with Minimum Robotic Fleet under Resource Constraints IROS 2025
Long-term monitoring of numerous dynamic targets can be tedious for a human operator and infeasible for a single robot, e.g., to monitor wild flocks, detect intruders, search and rescue. Fleets of autonomous robots can be effective by acting collaboratively and concurrently. However, the online coordination is challenging due to the unknown behaviors of the targets and the limited perception of each robot. Existing work often deploys all robots available without minimizing the fleet size, or neglects the constraints on their resources such as battery and memory. This work proposes an online coordination scheme called LOMORO for collaborative target monitoring, path routing and resource charging. It includes three core components: (I) the modeling of multi-robot task assignment problem under the constraints on resources and monitoring intervals; (II) the resource-aware task coordination algorithm iterates between the high-level assignment of dynamic targets and the low-level multi-objective routing via the Martin's algorithm; (III) the online adaptation algorithm in case of unpredictable target behaviors and robot failures. It ensures the explicitly upper-bounded monitoring intervals for all targets and the lower-bounded resource levels for all robots, while minimizing the average number of active robots. The proposed methods are validated extensively via large-scale simulations against several baselines, under different road networks, robot velocities, charging rates and monitoring intervals.
comment: Accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025)
Hybrid Robotic Meta-gripper for Tomato Harvesting: Analysis of Auxetic Structures with Lattice Orientation Variations
The agricultural sector is rapidly evolving to meet growing global food demands, yet tasks like fruit and vegetable handling remain labor-intensive, causing inefficiencies and post-harvest losses. Automation, particularly selective harvesting, offers a viable solution, with soft robotics emerging as a key enabler. This study introduces a novel hybrid gripper for tomato harvesting, incorporating a rigid outer frame with a soft auxetic internal lattice. The six-finger, 3D caging-effect design enables gentle yet secure grasping in unstructured environments. Uniquely, the work investigates the effect of auxetic lattice orientation on grasping conformability, combining experimental validation with 2D Digital Image Correlation (DIC) and nonlinear finite element analysis (FEA). Auxetic configurations with unit cell inclinations of 0 deg, 30 deg, 45 deg, and 60 deg are evaluated, and their grasping forces, deformation responses, and motor torque requirements are systematically compared. Results demonstrate that lattice orientation strongly influences compliance, contact forces, and energy efficiency, with distinct advantages across configurations. This comparative framework highlights the novelty of tailoring auxetic geometries to optimize robotic gripper performance. The findings provide new insights into soft-rigid hybrid gripper design, advancing automation strategies for precision agriculture while minimizing crop damage.
ATRos: Learning Energy-Efficient Agile Locomotion for Wheeled-legged Robots IROS 2025
Hybrid locomotion of wheeled-legged robots has recently attracted increasing attention due to their advantages of combining the agility of legged locomotion and the efficiency of wheeled motion. But along with expanded performance, the whole-body control of wheeled-legged robots remains challenging for hybrid locomotion. In this paper, we present ATRos, a reinforcement learning (RL)-based hybrid locomotion framework to achieve hybrid walking-driving motions on the wheeled-legged robot. Without giving predefined gait patterns, our planner aims to intelligently coordinate simultaneous wheel and leg movements, thereby achieving improved terrain adaptability and improved energy efficiency. Based on RL techniques, our approach constructs a prediction policy network that could estimate external environmental states from proprioceptive sensory information, and the outputs are then fed into an actor critic network to produce optimal joint commands. The feasibility of the proposed framework is validated through both simulations and real-world experiments across diverse terrains, including flat ground, stairs, and grassy surfaces. The hybrid locomotion framework shows robust performance over various unseen terrains, highlighting its generalization capability.
comment: 4 pages, 2 figures, submitted to IROS 2025 wheeled-legged workshop
Reinforcement Fine-Tuning of Flow-Matching Policies for Vision-Language-Action Models
Vision-Language-Action (VLA) models such as OpenVLA, Octo, and $\pi_0$ have shown strong generalization by leveraging large-scale demonstrations, yet their performance is still fundamentally constrained by the quality and coverage of supervised data. Reinforcement learning (RL) provides a promising path for improving and fine-tuning VLAs through online interaction. However, conventional policy gradient methods are computationally infeasible in the context of flow-matching based models due to the intractability of the importance sampling process, which requires explicit computation of policy ratios. To overcome this limitation, we propose Flow Policy Optimization (FPO) algorithm, which reformulates importance sampling by leveraging per-sample changes in the conditional flow-matching objective. Furthermore, FPO achieves stable and scalable online reinforcement fine-tuning of the $\pi_0$ model by integrating structure-aware credit assignment to enhance gradient efficiency, clipped surrogate objectives to stabilize optimization, multi-step latent exploration to encourage diverse policy updates, and a Q-ensemble mechanism to provide robust value estimation. We evaluate FPO on the LIBERO benchmark and the ALOHA simulation task against supervised, preference-aligned, diffusion-based, autoregressive online RL, and $\pi_0$-FAST baselines, observing consistent improvements over the imitation prior and strong alternatives with stable learning under sparse rewards. In addition, ablation studies and analyses of the latent space dynamics further highlight the contributions of individual components within FPO, validating the effectiveness of the proposed computational modules and the stable convergence of the conditional flow-matching objective during online RL.
FORM: Fixed-Lag Odometry with Reparative Mapping utilizing Rotating LiDAR Sensors ICRA 2026
Light Detection and Ranging (LiDAR) sensors have become a de-facto sensor for many robot state estimation tasks, spurring development of many LiDAR Odometry (LO) methods in recent years. While some smoothing-based LO methods have been proposed, most require matching against multiple scans, resulting in sub-real-time performance. Due to this, most prior works estimate a single state at a time and are ``submap''-based. This architecture propagates any error in pose estimation to the fixed submap and can cause jittery trajectories and degrade future registrations. We propose Fixed-Lag Odometry with Reparative Mapping (FORM), a LO method that performs smoothing over a densely connected factor graph while utilizing a single iterative map for matching. This allows for both real-time performance and active correction of the local map as pose estimates are further refined. We evaluate on a wide variety of datasets to show that FORM is robust, accurate, real-time, and provides smooth trajectory estimates when compared to prior state-of-the-art LO methods.
comment: Submitted to ICRA 2026
LLM-HBT: Dynamic Behavior Tree Construction for Adaptive Coordination in Heterogeneous Robots ICRA 2026
We introduce a novel framework for automatic behavior tree (BT) construction in heterogeneous multi-robot systems, designed to address the challenges of adaptability and robustness in dynamic environments. Traditional robots are limited by fixed functional attributes and cannot efficiently reconfigure their strategies in response to task failures or environmental changes. To overcome this limitation, we leverage large language models (LLMs) to generate and extend BTs dynamically, combining the reasoning and generalization power of LLMs with the modularity and recovery capability of BTs. The proposed framework consists of four interconnected modules task initialization, task assignment, BT update, and failure node detection which operate in a closed loop. Robots tick their BTs during execution, and upon encountering a failure node, they can either extend the tree locally or invoke a centralized virtual coordinator (Alex) to reassign subtasks and synchronize BTs across peers. This design enables long-term cooperative execution in heterogeneous teams. We validate the framework on 60 tasks across three simulated scenarios and in a real-world cafe environment with a robotic arm and a wheeled-legged robot. Results show that our method consistently outperforms baseline approaches in task success rate, robustness, and scalability, demonstrating its effectiveness for multi-robot collaboration in complex scenarios.
comment: It contains 8 pages, 7 figures and 4 tables. This paper is submitted to ICRA 2026
VG-Mapping: Variation-Aware 3D Gaussians for Online Semi-static Scene Mapping
Maintaining an up-to-date map that accurately reflects recent changes in the environment is crucial, especially for robots that repeatedly traverse the same space. Failing to promptly update the changed regions can degrade map quality, resulting in poor localization, inefficient operations, and even lost robots. 3D Gaussian Splatting (3DGS) has recently seen widespread adoption in online map reconstruction due to its dense, differentiable, and photorealistic properties, yet accurately and efficiently updating the regions of change remains a challenge. In this paper, we propose VG-Mapping, a novel online 3DGS-based mapping system tailored for such semi-static scenes. Our approach introduces a hybrid representation that augments 3DGS with a TSDF-based voxel map to efficiently identify changed regions in a scene, along with a variation-aware density control strategy that inserts or deletes Gaussian primitives in regions undergoing change. Furthermore, to address the absence of public benchmarks for this task, we construct a RGB-D dataset comprising both synthetic and real-world semi-static environments. Experimental results demonstrate that our method substantially improves the rendering quality and map update efficiency in semi-static scenes. The code and dataset are available at https://github.com/heyicheng-never/VG-Mapping.
Accurate and Noise-Tolerant Extraction of Routine Logs in Robotic Process Automation (Extended Version)
Robotic Process Mining focuses on the identification of the routine types performed by human resources through a User Interface. The ultimate goal is to discover routine-type models to enable robotic process automation. The discovery of routine-type models requires the provision of a routine log. Unfortunately, the vast majority of existing works do not directly focus on enabling the model discovery, limiting themselves to extracting the set of actions that are part of the routines. They were also not evaluated in scenarios characterized by inconsistent routine execution, hereafter referred to as noise, which reflects natural variability and occasional errors in human performance. This paper presents a clustering-based technique that aims to extract routine logs. Experiments were conducted on nine UI logs from the literature with different levels of injected noise. Our technique was compared with existing techniques, most of which are not meant to discover routine logs but were adapted for the purpose. The results were evaluated through standard state-of-the-art metrics, showing that we can extract more accurate routine logs than what the state of the art could, especially in the presence of noise.
comment: 16 pages, 5 figures
Differentiable Particle Optimization for Fast Sequential Manipulation
Sequential robot manipulation tasks require finding collision-free trajectories that satisfy geometric constraints across multiple object interactions in potentially high-dimensional configuration spaces. Solving these problems in real-time and at large scales has remained out of reach due to computational requirements. Recently, GPU-based acceleration has shown promising results, but prior methods achieve limited performance due to CPU-GPU data transfer overhead and complex logic that prevents full hardware utilization. To this end, we present SPaSM (Sampling Particle optimization for Sequential Manipulation), a fully GPU-parallelized framework that compiles constraint evaluation, sampling, and gradient-based optimization into optimized CUDA kernels for end-to-end trajectory optimization without CPU coordination. The method consists of a two-stage particle optimization strategy: first solving placement constraints through massively parallel sampling, then lifting solutions to full trajectory optimization in joint space. Unlike hierarchical approaches, SPaSM jointly optimizes object placements and robot trajectories to handle scenarios where motion feasibility constrains placement options. Experimental evaluation on challenging benchmarks demonstrates solution times in the realm of $\textbf{milliseconds}$ with a 100% success rate; a $4000\times$ speedup compared to existing approaches. Code and examples are available at $\href{https://commalab.org/papers/spasm}{commalab.org/papers/spasm}$.
comment: 8 pages, 7 figures, 3 tables. Under review
ASTREA: Introducing Agentic Intelligence for Orbital Thermal Autonomy
This paper presents ASTREA, the first agentic system executed on flight-heritage hardware (TRL 9) for autonomous spacecraft operations, with on-orbit operation aboard the International Space Station (ISS). Using thermal control as a representative use case, we integrate a resource-constrained Large Language Model (LLM) agent with a reinforcement learning controller in an asynchronous architecture tailored for space-qualified platforms. Ground experiments show that LLM-guided supervision improves thermal stability and reduces violations, confirming the feasibility of combining semantic reasoning with adaptive control under hardware constraints. On-orbit validation aboard the ISS initially faced challenges due to inference latency misaligned with the rapid thermal cycles of Low Earth Orbit (LEO) satellites. Synchronization with the orbit length successfully surpassed the baseline with reduced violations, extended episode durations, and improved CPU utilization. These findings demonstrate the potential for scalable agentic supervision architectures in future autonomous spacecraft.
comment: Accepted for presentation at the European Space Agency's AI Start 2025 Conference (see https://atpi.eventsair.com/ai-star-2025/)
A Vision-Based Shared-Control Teleoperation Scheme for Controlling the Robotic Arm of a Four-Legged Robot
In hazardous and remote environments, robotic systems perform critical tasks demanding improved safety and efficiency. Among these, quadruped robots with manipulator arms offer mobility and versatility for complex operations. However, teleoperating quadruped robots is challenging due to the lack of integrated obstacle detection and intuitive control methods for the robotic arm, increasing collision risks in confined or dynamically changing workspaces. Teleoperation via joysticks or pads can be non-intuitive and demands a high level of expertise due to its complexity, culminating in a high cognitive load on the operator. To address this challenge, a teleoperation approach that directly maps human arm movements to the robotic manipulator offers a simpler and more accessible solution. This work proposes an intuitive remote control by leveraging a vision-based pose estimation pipeline that utilizes an external camera with a machine learning-based model to detect the operator's wrist position. The system maps these wrist movements into robotic arm commands to control the robot's arm in real-time. A trajectory planner ensures safe teleoperation by detecting and preventing collisions with both obstacles and the robotic arm itself. The system was validated on the real robot, demonstrating robust performance in real-time control. This teleoperation approach provides a cost-effective solution for industrial applications where safety, precision, and ease of use are paramount, ensuring reliable and intuitive robotic control in high-risk environments.
Autonomous UAV Flight Navigation in Confined Spaces: A Reinforcement Learning Approach
Autonomous UAV inspection of confined industrial infrastructure, such as ventilation ducts, demands robust navigation policies where collisions are unacceptable. While Deep Reinforcement Learning (DRL) offers a powerful paradigm for developing such policies, it presents a critical trade-off between on-policy and off-policy algorithms. Off-policy methods promise high sample efficiency, a vital trait for minimizing costly and unsafe real-world fine-tuning. In contrast, on-policy methods often exhibit greater training stability, which is essential for reliable convergence in hazard-dense environments. This paper directly investigates this trade-off by comparing a leading on-policy algorithm, Proximal Policy Optimization (PPO), against an off-policy counterpart, Soft Actor-Critic (SAC), for precision flight in procedurally generated ducts within a high-fidelity simulator. Our results show that PPO consistently learned a stable, collision-free policy that completed the entire course. In contrast, SAC failed to find a complete solution, converging to a suboptimal policy that navigated only the initial segments before failure. This work provides evidence that for high-precision, safety-critical navigation tasks, the reliable convergence of a well-established on-policy method can be more decisive than the nominal sample efficiency of an off-policy algorithm.
Optimizing Grasping in Legged Robots: A Deep Learning Approach to Loco-Manipulation
This paper presents a deep learning framework designed to enhance the grasping capabilities of quadrupeds equipped with arms, with a focus on improving precision and adaptability. Our approach centers on a sim-to-real methodology that minimizes reliance on physical data collection. We developed a pipeline within the Genesis simulation environment to generate a synthetic dataset of grasp attempts on common objects. By simulating thousands of interactions from various perspectives, we created pixel-wise annotated grasp-quality maps to serve as the ground truth for our model. This dataset was used to train a custom CNN with a U-Net-like architecture that processes multi-modal input from an onboard RGB and depth cameras, including RGB images, depth maps, segmentation masks, and surface normal maps. The trained model outputs a grasp-quality heatmap to identify the optimal grasp point. We validated the complete framework on a four-legged robot. The system successfully executed a full loco-manipulation task: autonomously navigating to a target object, perceiving it with its sensors, predicting the optimal grasp pose using our model, and performing a precise grasp. This work proves that leveraging simulated training with advanced sensing offers a scalable and effective solution for object handling.
A Synthetic Dataset for Manometry Recognition in Robotic Applications
This paper addresses the challenges of data scarcity and high acquisition costs in training robust object detection models for complex industrial environments, such as offshore oil platforms. Data collection in these hazardous settings often limits the development of autonomous inspection systems. To mitigate this issue, we propose a hybrid data synthesis pipeline that integrates procedural rendering and AI-driven video generation. The approach uses BlenderProc to produce photorealistic images with domain randomization and NVIDIA's Cosmos-Predict2 to generate physically consistent video sequences with temporal variation. A YOLO-based detector trained on a composite dataset, combining real and synthetic data, outperformed models trained solely on real images. A 1:1 ratio between real and synthetic samples achieved the highest accuracy. The results demonstrate that synthetic data generation is a viable, cost-effective, and safe strategy for developing reliable perception systems in safety-critical and resource-constrained industrial applications.
Efficient Navigation in Unknown Indoor Environments with Vision-Language Models IROS 2025
We present a novel high-level planning framework that leverages vision-language models (VLMs) to improve autonomous navigation in unknown indoor environments with many dead ends. Traditional exploration methods often take inefficient routes due to limited global reasoning and reliance on local heuristics. In contrast, our approach enables a VLM to reason directly about occupancy maps in a zero-shot manner, selecting subgoals that are likely to yield more efficient paths. At each planning step, we convert a 3D occupancy grid into a partial 2D map of the environment, and generate candidate subgoals. Each subgoal is then evaluated and ranked against other candidates by the model. We integrate this planning scheme into DYNUS \cite{kondo2025dynus}, a state-of-the-art trajectory planner, and demonstrate improved navigation efficiency in simulation. The VLM infers structural patterns (e.g., rooms, corridors) from incomplete maps and balances the need to make progress toward a goal against the risk of entering unknown space. This reduces common greedy failures (e.g., detouring into small rooms) and achieves about 10\% shorter paths on average.
comment: 7 pages, 4 figures, accepted to the OWN workshop at IROS 2025
Robust and Safe Multi-Agent Reinforcement Learning with Communication for Autonomous Vehicles: From Simulation to Hardware
Deep multi-agent reinforcement learning (MARL) has been demonstrated effectively in simulations for multi-robot problems. For autonomous vehicles, the development of vehicle-to-vehicle (V2V) communication technologies provide opportunities to further enhance system safety. However, zero-shot transfer of simulator-trained MARL policies to dynamic hardware systems remains challenging, and how to leverage communication and shared information for MARL has limited demonstrations on hardware. This problem is challenged by discrepancies between simulated and physical states, system state and model uncertainties, practical shared information design, and the need for safety guarantees in both simulation and hardware. This paper designs RSR-RSMARL, a novel Robust and Safe MARL framework that supports Real-Sim-Real (RSR) policy adaptation for multi-agent systems with communication among agents, with both simulation and hardware demonstrations. RSR-RSMARL leverages state (includes shared state information among agents) and action representations considering real system complexities for MARL formulation. The MARL policy is trained with robust MARL algorithm to enable zero-shot transfer to hardware considering the sim-to-real gap. A safety shield module using Control Barrier Functions (CBFs) provides safety guarantee for each individual agent. Experimental results on 1/10th-scale autonomous vehicles with V2V communication demonstrate the ability of RSR-RSMARL framework to enhance driving safety and coordination across multiple configurations. These findings emphasize the importance of jointly designing robust policy representations and modular safety architectures to enable scalable, generalizable RSR transfer in multi-agent autonomy.
comment: 19 pages, 9 Figures
{S\textsuperscript{2}M\textsuperscript{2}}: Scalable Stereo Matching Model for Reliable Depth Estimation ICCV
The pursuit of a generalizable stereo matching model, capable of performing well across varying resolutions and disparity ranges without dataset-specific fine-tuning, has revealed a fundamental trade-off. Iterative local search methods achieve high scores on constrained benchmarks, but their core mechanism inherently limits the global consistency required for true generalization. However, global matching architectures, while theoretically more robust, have historically been rendered infeasible by prohibitive computational and memory costs. We resolve this dilemma with {S\textsuperscript{2}M\textsuperscript{2}}: a global matching architecture that achieves state-of-the-art accuracy and high efficiency without relying on cost volume filtering or deep refinement stacks. Our design integrates a multi-resolution transformer for robust long-range correspondence, trained with a novel loss function that concentrates probability on feasible matches. This approach enables a more robust joint estimation of disparity, occlusion, and confidence. {S\textsuperscript{2}M\textsuperscript{2}} establishes a new state of the art on Middlebury v3 and ETH3D benchmarks, significantly outperforming prior methods in most metrics while reconstructing high-quality details with competitive efficiency.
comment: 8 pages, 5 figures, ICCV accepted paper
Agentic Vehicles for Human-Centered Mobility
Autonomy, from the Greek autos (self) and nomos (law), refers to the capacity to operate according to internal rules without external control. Autonomous vehicles (AuVs) are therefore understood as systems that perceive their environment and execute pre-programmed tasks independently of external input, consistent with the SAE levels of automated driving. Yet recent research and real-world deployments have begun to showcase vehicles that exhibit behaviors outside the scope of this definition. These include natural language interaction with humans, goal adaptation, contextual reasoning, external tool use, and the handling of unforeseen ethical dilemmas, enabled in part by multimodal large language models (LLMs). These developments highlight not only a gap between technical autonomy and the broader cognitive and social capacities required for human-centered mobility, but also the emergence of a form of vehicle intelligence that currently lacks a clear designation. To address this gap, the paper introduces the concept of agentic vehicles (AgVs): vehicles that integrate agentic AI systems to reason, adapt, and interact within complex environments. It synthesizes recent advances in agentic systems and suggests how AgVs can complement and even reshape conventional autonomy to ensure mobility services are aligned with user and societal needs. The paper concludes by outlining key challenges in the development and governance of AgVs and their potential role in shaping future agentic transportation systems.
NaviMaster: Learning a Unified Policy for GUI and Embodied Navigation Tasks
Recent advances in Graphical User Interface (GUI) and embodied navigation have driven progress, yet these domains have largely evolved in isolation, with disparate datasets and training paradigms. In this paper, we observe that both tasks can be formulated as Markov Decision Processes (MDP), suggesting a foundational principle for their unification. Hence, we present NaviMaster, the first unified agent capable of unifying GUI navigation and embodied navigation within a single framework. Specifically, NaviMaster (i) proposes a visual-target trajectory collection pipeline that generates trajectories for both GUI and embodied tasks using a single formulation. (ii) employs a unified reinforcement learning framework on the mix data to improve generalization. (iii) designs a novel distance-aware reward to ensure efficient learning from the trajectories. Through extensive experiments on out-of-domain benchmarks, NaviMaster is shown to outperform state-of-the-art agents in GUI navigation, spatial affordance prediction, and embodied navigation. Ablation studies further demonstrate the efficacy of our unified training strategy, data mixing strategy, and reward design.
comment: Homepage: https://iron-boyy.github.io/navimaster/
Robot Learning with Super-Linear Scaling
Scaling robot learning requires data collection pipelines that scale favorably with human effort. In this work, we propose Crowdsourcing and Amortizing Human Effort for Real-to-Sim-to-Real(CASHER), a pipeline for scaling up data collection and learning in simulation where the performance scales superlinearly with human effort. The key idea is to crowdsource digital twins of real-world scenes using 3D reconstruction and collect large-scale data in simulation, rather than the real-world. Data collection in simulation is initially driven by RL, bootstrapped with human demonstrations. As the training of a generalist policy progresses across environments, its generalization capabilities can be used to replace human effort with model generated demonstrations. This results in a pipeline where behavioral data is collected in simulation with continually reducing human effort. We show that CASHER demonstrates zero-shot and few-shot scaling laws on three real-world tasks across diverse scenarios. We show that CASHER enables fine-tuning of pre-trained policies to a target scenario using a video scan without any additional human effort. See our project website: https://casher-robot-learning.github.io/CASHER/
TinyIO: Lightweight Reparameterized Inertial Odometry
Inertial localization is regarded as a promising positioning solution for consumer-grade IoT devices due to its cost-effectiveness and independence from external infrastructure. However, data-driven inertial localization methods often rely on increasingly complex network architectures to improve accuracy, which challenges the limited computational resources of IoT devices. Moreover, these methods frequently overlook the importance of modeling long-term dependencies in inertial measurements - a critical factor for accurate trajectory reconstruction - thereby limiting localization performance. To address these challenges, we propose a reparameterized inertial localization network that uses a multi-branch structure during training to enhance feature extraction. At inference time, this structure is transformed into an equivalent single-path architecture to improve parameter efficiency. To further capture long-term dependencies in motion trajectories, we introduce a temporal-scale sparse attention mechanism that selectively emphasizes key trajectory segments while suppressing noise. Additionally, a gated convolutional unit is incorporated to effectively integrate long-range dependencies with local fine-grained features. Extensive experiments on public benchmarks demonstrate that our method achieves a favorable trade-off between accuracy and model compactness. For example, on the RoNIN dataset, our approach reduces the Absolute Trajectory Error (ATE) by 2.59% compared to RoNIN-ResNet while reducing the number of parameters by 3.86%.
StarIO: A Lightweight Inertial Odometry for Nonlinear Motion
Inertial odometry (IO) directly estimates the position of a carrier from inertial sensor measurements and serves as a core technology for the widespread deployment of consumer grade localization systems. While existing IO methods can accurately reconstruct simple and near linear motion trajectories, they often fail to account for drift errors caused by complex motion patterns such as turning. This limitation significantly degrades localization accuracy and restricts the applicability of IO systems in real world scenarios. To address these challenges, we propose a lightweight IO framework. Specifically, inertial data is projected into a high dimensional implicit nonlinear feature space using the Star Operation method, enabling the extraction of complex motion features that are typically overlooked. We further introduce a collaborative attention mechanism that jointly models global motion dynamics across both channel and temporal dimensions. In addition, we design Multi Scale Gated Convolution Units to capture fine grained dynamic variations throughout the motion process, thereby enhancing the model's ability to learn rich and expressive motion representations. Extensive experiments demonstrate that our proposed method consistently outperforms SOTA baselines across six widely used inertial datasets. Compared to baseline models on the RoNIN dataset, it achieves reductions in ATE ranging from 2.26% to 65.78%, thereby establishing a new benchmark in the field.
IONext: Unlocking the Next Era of Inertial Odometry
Researchers have increasingly adopted Transformer-based models for inertial odometry. While Transformers excel at modeling long-range dependencies, their limited sensitivity to local, fine-grained motion variations and lack of inherent inductive biases often hinder localization accuracy and generalization. Recent studies have shown that incorporating large-kernel convolutions and Transformer-inspired architectural designs into CNN can effectively expand the receptive field, thereby improving global motion perception. Motivated by these insights, we propose a novel CNN-based module called the Dual-wing Adaptive Dynamic Mixer (DADM), which adaptively captures both global motion patterns and local, fine-grained motion features from dynamic inputs. This module dynamically generates selective weights based on the input, enabling efficient multi-scale feature aggregation. To further improve temporal modeling, we introduce the Spatio-Temporal Gating Unit (STGU), which selectively extracts representative and task-relevant motion features in the temporal domain. This unit addresses the limitations of temporal modeling observed in existing CNN approaches. Built upon DADM and STGU, we present a new CNN-based inertial odometry backbone, named Next Era of Inertial Odometry (IONext). Extensive experiments on six public datasets demonstrate that IONext consistently outperforms state-of-the-art (SOTA) Transformer- and CNN-based methods. For instance, on the RNIN dataset, IONext reduces the average ATE by 10% and the average RTE by 12% compared to the representative model iMOT.
Policy Contrastive Decoding for Robotic Foundation Models
Robotic foundation models, or generalist robot policies, hold immense potential to enable flexible, general-purpose and dexterous robotic systems. Despite their advancements, our empirical experiments reveal that existing robot policies are prone to learning spurious correlations from pre-training trajectories, adversely affecting their generalization capabilities beyond the training data. To tackle this, we propose a novel Policy Contrastive Decoding (PCD) approach, which redirects the robot policy's focus toward object-relevant visual clues by contrasting action probability distributions derived from original and object-masked visual inputs. As a training-free method, our PCD can be used as a plugin to improve different types of robot policies without needing to finetune or access model weights. We conduct extensive experiments on top of three open-source robot policies, including the autoregressive policy OpenVLA and the diffusion-based policies Octo and $\pi_0$. The obtained results in both simulation and real-world environments prove PCD's flexibility and effectiveness, e.g., PCD enhances the state-of-the-art policy $\pi_0$ by 8.9% in the simulation environment and by 108% in the real-world environment. Code and demos are publicly available at: https://Koorye.github.io/proj/PCD.
Follow-Bench: A Unified Motion Planning Benchmark for Socially-Aware Robot Person Following
Robot person following (RPF) -- mobile robots that follow and assist a specific person -- has emerging applications in personal assistance, security patrols, eldercare, and logistics. To be effective, such robots must follow the target while ensuring safety and comfort for both the target and surrounding people. In this work, we present the first comprehensive study of RPF, which (i) surveys representative scenarios, motion-planning methods, and evaluation metrics with a focus on safety and comfort; (ii) introduces Follow-Bench, a unified benchmark simulating diverse scenarios, including various target trajectory patterns, crowd dynamics, and environmental layouts; and (iii) re-implements six representative RPF planners, ensuring that both safety and comfort are systematically considered. Moreover, we evaluate the two best-performing planners from our benchmark on a differential-drive robot to provide insights into the real-world deployment of RPF planners. Extensive simulation and real-world experiments provide a quantitative study of the safety-comfort trade-offs of existing planners, while revealing open challenges and future research directions.
comment: Project page: https://follow-bench.github.io/
Multiagent Systems
KG-MAS: Knowledge Graph-Enhanced Multi-Agent Infrastructure for coupling physical and digital robotic environments
The seamless integration of physical and digital environments in Cyber-Physical Systems(CPS), particularly within Industry 4.0, presents significant challenges stemming from system heterogeneity and complexity. Traditional approaches often rely on rigid, data-centric solutions like co-simulation frameworks or brittle point-to-point middleware bridges, which lack the semantic richness and flexibility required for intelligent, autonomous coordination. This report introduces the Knowledge Graph-Enhanced Multi-Agent Infrastructure(KG-MAS), as resolution in addressing such limitations. KG-MAS leverages a centralized Knowledge Graph (KG) as a dynamic, shared world model, providing a common semantic foundation for a Multi-Agent System(MAS). Autonomous agents, representing both physical and digital components, query this KG for decision-making and update it with real-time state information. The infrastructure features a model-driven architecture which facilitates the automatic generation of agents from semantic descriptions, thereby simplifying system extension and maintenance. By abstracting away underlying communication protocols and providing a unified, intelligent coordination mechanism, KG-MAS offers a robust, scalable, and flexible solution for coupling heterogeneous physical and digital robotic environments.
It Takes Two: Learning Interactive Whole-Body Control Between Humanoid Robots
The true promise of humanoid robotics lies beyond single-agent autonomy: two or more humanoids must engage in physically grounded, socially meaningful whole-body interactions that echo the richness of human social interaction. However, single-humanoid methods suffer from the isolation issue, ignoring inter-agent dynamics and causing misaligned contacts, interpenetrations, and unrealistic motions. To address this, we present Harmanoid , a dual-humanoid motion imitation framework that transfers interacting human motions to two robots while preserving both kinematic fidelity and physical realism. Harmanoid comprises two key components: (i) contact-aware motion retargeting, which restores inter-body coordination by aligning SMPL contacts with robot vertices, and (ii) interaction-driven motion controller, which leverages interaction-specific rewards to enforce coordinated keypoints and physically plausible contacts. By explicitly modeling inter-agent contacts and interaction-aware dynamics, Harmanoid captures the coupled behaviors between humanoids that single-humanoid frameworks inherently overlook. Experiments demonstrate that Harmanoid significantly improves interactive motion imitation, surpassing existing single-humanoid frameworks that largely fail in such scenarios.
MedAgentAudit: Diagnosing and Quantifying Collaborative Failure Modes in Medical Multi-Agent Systems
While large language model (LLM)-based multi-agent systems show promise in simulating medical consultations, their evaluation is often confined to final-answer accuracy. This practice treats their internal collaborative processes as opaque "black boxes" and overlooks a critical question: is a diagnostic conclusion reached through a sound and verifiable reasoning pathway? The inscrutable nature of these systems poses a significant risk in high-stakes medical applications, potentially leading to flawed or untrustworthy conclusions. To address this, we conduct a large-scale empirical study of 3,600 cases from six medical datasets and six representative multi-agent frameworks. Through a rigorous, mixed-methods approach combining qualitative analysis with quantitative auditing, we develop a comprehensive taxonomy of collaborative failure modes. Our quantitative audit reveals four dominant failure patterns: flawed consensus driven by shared model deficiencies, suppression of correct minority opinions, ineffective discussion dynamics, and critical information loss during synthesis. This study demonstrates that high accuracy alone is an insufficient measure of clinical or public trust. It highlights the urgent need for transparent and auditable reasoning processes, a cornerstone for the responsible development and deployment of medical AI.
comment: Code: https://github.com/yhzhu99/MedAgentAudit
ALLOY: Generating Reusable Agent Workflows from User Demonstration
Large language models (LLMs) enable end-users to delegate complex tasks to autonomous agents through natural language. However, prompt-based interaction faces critical limitations: Users often struggle to specify procedural requirements for tasks, especially those that don't have a factually correct solution but instead rely on personal preferences, such as posting social media content or planning a trip. Additionally, a ''successful'' prompt for one task may not be reusable or generalizable across similar tasks. We present ALLOY, a system inspired by classical HCI theories on Programming by Demonstration (PBD), but extended to enhance adaptability in creating LLM-based web agents. ALLOY enables users to express procedural preferences through natural demonstrations rather than prompts, while making these procedures transparent and editable through visualized workflows that can be generalized across task variations. In a study with 12 participants, ALLOY's demonstration--based approach outperformed prompt-based agents and manual workflows in capturing user intent and procedural preferences in complex web tasks. Insights from the study also show how demonstration--based interaction complements the traditional prompt-based approach.
Structured Cooperative Multi-Agent Reinforcement Learning: a Bayesian Network Perspective
The empirical success of multi-agent reinforcement learning (MARL) has motivated the search for more efficient and scalable algorithms for large scale multi-agent systems. However, existing state-of-the-art algorithms do not fully exploit inter-agent coupling information to develop MARL algorithms. In this paper, we propose a systematic approach to leverage structures in the inter-agent couplings for efficient model-free reinforcement learning. We model the cooperative MARL problem via a Bayesian network and characterize the subset of agents, termed as the value dependency set, whose information is required by each agent to estimate its local action value function exactly. Moreover, we propose a partially decentralized training decentralized execution (P-DTDE) paradigm based on the value dependency set. We theoretically establish that the total variance of our P-DTDE policy gradient estimator is less than the centralized training decentralized execution (CTDE) policy gradient estimator. We derive a multi-agent policy gradient theorem based on the P-DTDE scheme and develop a scalable actor-critic algorithm. We demonstrate the efficiency and scalability of the proposed algorithm on multi-warehouse resource allocation and multi-zone temperature control examples. For dense value dependency sets, we propose an approximation scheme based on truncation of the Bayesian network and empirically show that it achieves a faster convergence than the exact value dependence set for applications with a large number of agents.
Scheming Ability in LLM-to-LLM Strategic Interactions
As large language model (LLM) agents are deployed autonomously in diverse contexts, evaluating their capacity for strategic deception becomes crucial. While recent research has examined how AI systems scheme against human developers, LLM-to-LLM scheming remains underexplored. We investigate the scheming ability and propensity of frontier LLM agents through two game-theoretic frameworks: a Cheap Talk signaling game and a Peer Evaluation adversarial game. Testing four models (GPT-4o, Gemini-2.5-pro, Claude-3.7-Sonnet, and Llama-3.3-70b), we measure scheming performance with and without explicit prompting while analyzing scheming tactics through chain-of-thought reasoning. When prompted, most models, especially Gemini-2.5-pro and Claude-3.7-Sonnet, achieved near-perfect performance. Critically, models exhibited significant scheming propensity without prompting: all models chose deception over confession in Peer Evaluation (100% rate), while models choosing to scheme in Cheap Talk succeeded at 95-100% rates. These findings highlight the need for robust evaluations using high-stakes game-theoretic scenarios in multi-agent settings.
comment: 25 pages, 13 figures, under review at IASEAI'26
MADS: Multi-Agent Dialogue Simulation for Diverse Persuasion Data Generation
We propose MADS (Multi-Agent Dialogue Simulation), a scalable framework for generating persuasive multi-turn dialogues via agent self-play. MADS employs three coordinated agents: User Agents designed to simulate diverse persona-driven behaviors by leveraging personality signifiers such as Zodiac Signs and MBTI types, a Dialog Agent executing task-oriented persuasion strategies and an Optimization Agent evaluating and refining dialogue outcomes. We further validate its effectiveness through users' Chain-of-Attitude (CoA) modeling and dedicated LLMs' persuasion assessment. This approach enables low-cost generation of training data without human annotation, addressing key industry challenges such as lack of user data, cold-start evaluation difficulties, and prompt inefficiency. Applied to a real-world marketing scenario, MADS significantly improved the persuasion capacity of small LLMs, increasing the organic traffic conversion rate by 22.4% (from 1.83% to 2.24%) , demonstrating clear business value.
ASTREA: Introducing Agentic Intelligence for Orbital Thermal Autonomy
This paper presents ASTREA, the first agentic system executed on flight-heritage hardware (TRL 9) for autonomous spacecraft operations, with on-orbit operation aboard the International Space Station (ISS). Using thermal control as a representative use case, we integrate a resource-constrained Large Language Model (LLM) agent with a reinforcement learning controller in an asynchronous architecture tailored for space-qualified platforms. Ground experiments show that LLM-guided supervision improves thermal stability and reduces violations, confirming the feasibility of combining semantic reasoning with adaptive control under hardware constraints. On-orbit validation aboard the ISS initially faced challenges due to inference latency misaligned with the rapid thermal cycles of Low Earth Orbit (LEO) satellites. Synchronization with the orbit length successfully surpassed the baseline with reduced violations, extended episode durations, and improved CPU utilization. These findings demonstrate the potential for scalable agentic supervision architectures in future autonomous spacecraft.
comment: Accepted for presentation at the European Space Agency's AI Start 2025 Conference (see https://atpi.eventsair.com/ai-star-2025/)
Debate, Deliberate, Decide (D3): A Cost-Aware Adversarial Framework for Reliable and Interpretable LLM Evaluation
The evaluation of Large Language Models (LLMs) remains challenging due to inconsistency, bias, and the absence of transparent decision criteria in automated judging. We present Debate, Deliberate, Decide (D3), a cost-aware, adversarial multi-agent framework that orchestrates structured debate among role-specialized agents (advocates, a judge, and an optional jury) to produce reliable and interpretable evaluations. D3 instantiates two complementary protocols: (1) Multi-Advocate One-Round Evaluation (MORE), which elicits k parallel defenses per answer to amplify signal via diverse advocacy, and (2) Single-Advocate Multi-Round Evaluation (SAMRE) with budgeted stopping, which iteratively refines arguments under an explicit token budget and convergence checks. We develop a probabilistic model of score gaps that (i) characterizes reliability and convergence under iterative debate and (ii) explains the separation gains from parallel advocacy. Under mild assumptions, the posterior distribution of the round-r gap concentrates around the true difference and the probability of mis-ranking vanishes; moreover, aggregating across k advocates provably increases expected score separation. We complement theory with a rigorous experimental suite across MT-Bench, AlignBench, and AUTO-J, showing state-of-the-art agreement with human judgments (accuracy and Cohen's kappa), reduced positional and verbosity biases via anonymization and role diversification, and a favorable cost-accuracy frontier enabled by budgeted stopping. Ablations and qualitative analyses isolate the contributions of debate, aggregation, and anonymity. Together, these results establish D3 as a principled, practical recipe for reliable, interpretable, and cost-aware LLM evaluation.
HealthFlow: A Self-Evolving AI Agent with Meta Planning for Autonomous Healthcare Research
The rapid proliferation of scientific knowledge presents a grand challenge: transforming this vast repository of information into an active engine for discovery, especially in high-stakes domains like healthcare. Current AI agents, however, are constrained by static, predefined strategies, limiting their ability to navigate the complex, evolving ecosystem of scientific research. This paper introduces HealthFlow, a self-evolving AI agent that overcomes this limitation through a novel meta-level evolution mechanism. HealthFlow autonomously refines its high-level problem-solving policies by distilling procedural successes and failures into a durable, structured knowledge base, enabling it to learn not just how to use tools, but how to strategize. To anchor our research and provide a community resource, we introduce EHRFlowBench, a new benchmark featuring complex health data analysis tasks systematically derived from peer-reviewed scientific literature. Our experiments demonstrate that HealthFlow's self-evolving approach significantly outperforms state-of-the-art agent frameworks. This work offers a new paradigm for intelligent systems that can learn to operationalize the procedural knowledge embedded in scientific content, marking a critical step toward more autonomous and effective AI for healthcare scientific discovery.
comment: Code: https://github.com/yhzhu99/HealthFlow
Coordination Requires Simplification: Thermodynamic Bounds on Multi-Objective Compromise in Natural and Artificial Intelligence
Information-processing systems coordinating across multiple agents and objectives face fundamental thermodynamic constraints. We show that solutions with maximum utility to act as coordination focal points have much higher selection pressure for being findable across agents rather than accuracy. We derive that the information-theoretic minimum description length of coordination protocols to precision $\varepsilon$ scales as $L(P)\geq NK\log_2 K+N^2d^2\log (1/\varepsilon)$ for $N$ agents with $d$ potentially conflicting objectives and internal model complexity $K$. This scaling forces progressive simplification, with coordination dynamics changing the environment itself and shifting optimization across hierarchical levels. Moving from established focal points requires re-coordination, creating persistent metastable states and hysteresis until significant environmental shifts trigger phase transitions through spontaneous symmetry breaking. We operationally define coordination temperature to predict critical phenomena and estimate coordination work costs, identifying measurable signatures across systems from neural networks to restaurant bills to bureaucracies. Extending the topological version of Arrow's theorem on the impossibility of consistent preference aggregation, we find it recursively binds whenever preferences are combined. This potentially explains the indefinite cycling in multi-objective gradient descent and alignment faking in Large Language Models trained with reinforcement learning with human feedback. We term this framework Thermodynamic Coordination Theory (TCT), which demonstrates that coordination requires radical information loss.
comment: 11 pages, 1 figure, 6 pages supplementary material, submitted to Physical Review E
Robust and Safe Multi-Agent Reinforcement Learning with Communication for Autonomous Vehicles: From Simulation to Hardware
Deep multi-agent reinforcement learning (MARL) has been demonstrated effectively in simulations for multi-robot problems. For autonomous vehicles, the development of vehicle-to-vehicle (V2V) communication technologies provide opportunities to further enhance system safety. However, zero-shot transfer of simulator-trained MARL policies to dynamic hardware systems remains challenging, and how to leverage communication and shared information for MARL has limited demonstrations on hardware. This problem is challenged by discrepancies between simulated and physical states, system state and model uncertainties, practical shared information design, and the need for safety guarantees in both simulation and hardware. This paper designs RSR-RSMARL, a novel Robust and Safe MARL framework that supports Real-Sim-Real (RSR) policy adaptation for multi-agent systems with communication among agents, with both simulation and hardware demonstrations. RSR-RSMARL leverages state (includes shared state information among agents) and action representations considering real system complexities for MARL formulation. The MARL policy is trained with robust MARL algorithm to enable zero-shot transfer to hardware considering the sim-to-real gap. A safety shield module using Control Barrier Functions (CBFs) provides safety guarantee for each individual agent. Experimental results on 1/10th-scale autonomous vehicles with V2V communication demonstrate the ability of RSR-RSMARL framework to enhance driving safety and coordination across multiple configurations. These findings emphasize the importance of jointly designing robust policy representations and modular safety architectures to enable scalable, generalizable RSR transfer in multi-agent autonomy.
comment: 19 pages, 9 Figures
OrbitZoo: Multi-Agent Reinforcement Learning Environment for Orbital Dynamics NeurIPS 2025
The increasing number of satellites and orbital debris has made space congestion a critical issue, threatening satellite safety and sustainability. Challenges such as collision avoidance, station-keeping, and orbital maneuvering require advanced techniques to handle dynamic uncertainties and multi-agent interactions. Reinforcement learning (RL) has shown promise in this domain, enabling adaptive, autonomous policies for space operations; however, many existing RL frameworks rely on custom-built environments developed from scratch, which often use simplified models and require significant time to implement and validate the orbital dynamics, limiting their ability to fully capture real-world complexities. To address this, we introduce OrbitZoo, a versatile multi-agent RL environment built on a high-fidelity industry standard library, that enables realistic data generation, supports scenarios like collision avoidance and cooperative maneuvers, and ensures robust and accurate orbital dynamics. The environment is validated against a real satellite constellation, Starlink, achieving a Mean Absolute Percentage Error (MAPE) of 0.16% compared to real-world data. This validation ensures reliability for generating high-fidelity simulations and enabling autonomous and independent satellite operations.
comment: Accepted for publication at the Thirty-Ninth Conference on Neural Information Processing Systems (NeurIPS 2025)
On the Surprising Effectiveness of a Single Global Merging in Decentralized Learning
Decentralized learning provides a scalable alternative to parameter-server-based training, yet its performance is often hindered by limited peer-to-peer communication. In this paper, we study how communication should be scheduled over time to improve global generalization, including determining when and how frequently devices synchronize. Counterintuitive empirical results show that concentrating communication budgets in the later stages of decentralized training remarkably improves global generalization. Surprisingly, we uncover that fully connected communication at the final step, implemented by a single global merging, can significant improve the generalization performance of decentralized learning under serve high data heterogeneity. Our theoretical contributions, which explains these phenomena, are first to establish that the globally merged model of decentralized SGD can match the convergence rate of parallel SGD. Technically, we reinterpret part of the discrepancy among local models, which were previously considered as detrimental noise, as constructive components essential for matching this rate. This work provides promising results that decentralized learning is able to generalize under high data heterogeneity and limited communication, while offering broad new avenues for model merging research. The code will be made publicly available.
comment: We discover and theoretically explain why and when a single global parameter merging in decentralized learning can recover the performance of federated learning, even in highly heterogeneous and communication-constrained environments
Systems and Control (CS)
Low-cost Pyranometer-Based ANN Approach for MPPT in Solar PV Systems
This article presents a study on the application of artificial neural networks (ANNs) for maximum power point tracking (MPPT) in photovoltaic (PV) systems using low-cost pyranometer sensors. The proposed approach integrates pyranometers, temperature sensors, and an ANN to estimate the duty cycle of a DC/DC converter, enabling the system to consistently operate at its maximum power point. The strategy was implemented in the local control of a Cuk converter and experimentally validated against the conventional Perturb and Observe (P&O) method. Results demonstrate that the ANN-based technique, leveraging affordable sensor technology, achieves accurate MPPT performance with reduced fluctuations, enhancing the responsiveness and efficiency of PV tracking systems.
The algorithmic regulator
The regulator theorem states that, under certain conditions, any optimal controller must embody a model of the system it regulates, grounding the idea that controllers embed, explicitly or implicitly, internal models of the controlled. This principle underpins neuroscience and predictive brain theories like the Free-Energy Principle or Kolmogorov/Algorithmic Agent theory. However, the theorem is only proven in limited settings. Here, we treat the deterministic, closed, coupled world-regulator system $(W,R)$ as a single self-delimiting program $p$ via a constant-size wrapper that produces the world output string~$x$ fed to the regulator. We analyze regulation from the viewpoint of the algorithmic complexity of the output, $K(x)$. We define $R$ to be a \emph{good algorithmic regulator} if it \emph{reduces} the algorithmic complexity of the readout relative to a null (unregulated) baseline $\varnothing$, i.e., \[ \Delta = K\big(O_{W,\varnothing}\big) - K\big(O_{W,R}\big) > 0. \] We then prove that the larger $\Delta$ is, the more world-regulator pairs with high mutual algorithmic information are favored. More precisely, a complexity gap $\Delta > 0$ yields \[ \Pr\big((W,R)\mid x\big) \le C\,2^{\,M(W{:}R)}\,2^{-\Delta}, \] making low $M(W{:}R)$ exponentially unlikely as $\Delta$ grows. This is an AIT version of the idea that ``the regulator contains a model of the world.'' The framework is distribution-free, applies to individual sequences, and complements the Internal Model Principle. Beyond this necessity claim, the same coding-theorem calculus singles out a \emph{canonical scalar objective} and implicates a \emph{planner}. On the realized episode, a regulator behaves \emph{as if} it minimized the conditional description length of the readout.
comment: 2 Figures
Optimal monophasic, asymmetric electric field pulses for selective transcranial magnetic stimulation (TMS) with minimised power and coil heating
Transcranial magnetic stimulation (TMS) with asymmetric electric field pulses, such as monophasic, offers directional selectivity for neural activation but requires excessive energy. Previous pulse shape optimisation has been limited to symmetric pulses or heavily constrained variations of conventional waveforms without achieving general optimality in energy efficiency or neural selectivity. We implemented an optimisation framework that incorporates neuron model activation constraints and flexible control of pulse asymmetry. The optimised electric field waveforms achieved up to 92 % and 88 % reduction in energy loss and thus coil heating respectively compared to conventional monophasic pulses and previously improved monophasic-equivalent pulses. In the human experiments, OUR pulses showed similar motor thresholds to monophasic pulses in both AP and PA directions with significantly lower energy loss, particularly in the AP direction. Moreover, there was a significant MEP latency difference of (1.79 +/- 0.41) ms between AP and PA direction with OUR pulses, which suggests directional selectivity. Our framework successfully identified highly energy-efficient asymmetric pulses for directionally-selective neural engagement. These pulses can enable selective rapid-rate repetitive TMS protocols with reduced power consumption and coil heating, with potential benefits for precision and potency of neuro-modulation.
comment: 31 pages, 8 figures
Hybrid MAC Protocol with Integrated Multi-Layered Security for Resource-Constrained UAV Swarm Communications
Flying Ad Hoc Networks (FANETs) present unique challenges due to high node mobility, dynamic topologies, and strict resource constraints. Existing routing protocols often optimize for a single metric, such as path length or energy, while neglecting the complex dependencies between network performance, security, and MAC layer efficiency. This paper introduces a novel hardware software co design framework for secure and adaptive UAV swarm communications, featuring an energy aware protocol stack. The architecture employs a multicast, clustered organization where routing decisions integrate dynamic trust scores, historical link quality, and internodal distance. A hybrid MAC protocol combines contention based and scheduled channel access for optimized throughput. Security is ensured through a zero trust model that fuses cryptographic authentication with a behavioral reputation system, alongside hardware accelerated AES GCM encryption. Comparative analysis in an NS 3 simulation environment demonstrates the framework's superiority in packet delivery ratio, latency, resilience, and overhead, providing a scalable foundation for high performance swarm operations.
comment: Accepted at ISED 2025
Bounds of Validity for Bifurcations of Equilibria in a Class of Networked Dynamical Systems
Local bifurcation analysis plays a central role in understanding qualitative transitions in networked nonlinear dynamical systems, including dynamic neural network and opinion dynamics models. In this article we establish explicit bounds of validity for the classification of bifurcation diagrams in two classes of continuous-time networked dynamical systems, analogous in structure to the Hopfield and the Firing Rate dynamic neural network models. Our approach leverages recent advances in computing the bounds for the validity of Lyapunov-Schmidt reduction, a reduction method widely employed in nonlinear systems analysis. Using these bounds we rigorously characterize neighborhoods around bifurcation points where predictions from reduced-order models remain reliable. We further demonstrate how these bounds can be applied to an illustrative family of nonlinear opinion dynamics on k-regular graphs, which emerges as a special case of the general framework. These results provide new analytical tools for quantifying the robustness of bifurcation phenomena in dynamics over networked systems and highlight the interplay between network structure and nonlinear dynamical behavior.
comment: This manuscript has been submitted to the 2026 American Control Conference taking place in New Orleans, Louisiana, in May 2026
Distributionally Robust Control with End-to-End Statistically Guaranteed Metric Learning
Wasserstein distributionally robust control (DRC) recently emerges as a principled paradigm for handling uncertainty in stochastic dynamical systems. However, it constructs data-driven ambiguity sets via uniform distribution shifts before sequentially incorporating them into downstream control synthesis. This segregation between ambiguity set construction and control objectives inherently introduces a structural misalignment, which undesirably leads to conservative control policies with sub-optimal performance. To address this limitation, we propose a novel end-to-end finite-horizon Wasserstein DRC framework that integrates the learning of anisotropic Wasserstein metrics with downstream control tasks in a closed-loop manner, thus enabling ambiguity sets to be systematically adjusted along performance-critical directions and yielding more effective control policies. This framework is formulated as a bilevel program: the inner level characterizes dynamical system evolution under DRC, while the outer level refines the anisotropic metric leveraging control-performance feedback across a range of initial conditions. To solve this program efficiently, we develop a stochastic augmented Lagrangian algorithm tailored to the bilevel structure. Theoretically, we prove that the learned ambiguity sets preserve statistical finite-sample guarantees under a novel radius adjustment mechanism, and we establish the well-posedness of the bilevel formulation by demonstrating its continuity with respect to the learnable metric. Furthermore, we show that the algorithm converges to stationary points of the outer level problem, which are statistically consistent with the optimal metric at a non-asymptotic convergence rate. Experiments on both numerical and inventory control tasks verify that the proposed framework achieves superior closed-loop performance and robustness compared against state-of-the-art methods.
Performance Index Shaping for Closed-loop Optimal Control
The design of the performance index, also referred to as cost or reward shaping, is central to both optimal control and reinforcement learning, as it directly determines the behaviors, trade-offs, and objectives that the resulting control laws seek to achieve. A commonly used approach for this inference task in recent years is differentiable trajectory optimization, which allows gradients to be computed with respect to cost parameters by differentiating through an optimal control solver. However, this method often requires repeated solving of the underlying optimal control problem at every iteration, making the method computationally expensive. In this work, assuming known dynamics, we propose a novel framework that analytically links the performance index to the resulting closed-loop optimal control law, thereby transforming a typically bi-level inverse problem into a tractable single-level formulation. Our approach is motivated by the question: given a closed-loop control law that solves an infinite-horizon optimal control problem, how does this law change when the performance index is modified with additional terms? This formulation yields closed-form characterizations for broad classes of systems and performance indices, which not only facilitate interpretation and stability analysis, but also provide insight into the robust stability and input-to-state stable behavior of the resulting nonlinear closed-loop system. Moreover, this analytical perspective enables the generalization of our approach to diverse design objectives, yielding a unifying framework for performance index shaping. Given specific design objectives, we propose a systematic methodology to guide the shaping of the performance index and thereby design the resulting optimal control law.
Modeling the Impact of Communication and Human Uncertainties on Runway Capacity in Terminal Airspace
We investigate the potential impact of communication and human performance uncertainties on runway operations. Specifically, we consider these impacts within the context of an arrival scenario with two converging flows: a straight-in approach stream and a downwind stream merging into it. Both arrival stream are modeled using a modified Possion distribution that incorporate the separation minima as well as the runway occupancy time. Various system level uncertainties are addressed in this process, including communication link- and human-related uncertainties. In this research, we first build a Monte Carlo-based discrete-time simulation, where aircraft arrivals are generated by modified Poisson processes subject to minimum separation constraints, simulating various traffic operations. The merging logic incorporates standard bank angle continuous turn-to-final, pilot response delays, and dynamic gap availability in real time. Then, we investigate an automated final approach vectoring model (i.e., Auto-ATC), in which inverse optimal control is used to learn decision advisories from human expert records. By augmenting trajectories and incorporating the aforementioned uncertainties into the planning scenario, we create a setup analogous to the discrete event simulation. For both studies, runway capacity is measured by runway throughput, the fraction of downwind arrivals that merge immediately without holding, and the average delay (i.e., holding time/distance) experienced on the downwind leg. This research provides a method for runway capacity estimation in merging scenarios, and demonstrates that aeronautical communication link uncertainties significantly affect runway capacity in current voice-based operations, whereas the impact can be mitigated in autonomous operational settings.
Causal-Guided Dimension Reduction for Efficient Pareto Optimization
Multi-objective optimization of analog circuits is hindered by high-dimensional parameter spaces, strong feedback couplings, and expensive transistor-level simulations. Evolutionary algorithms such as Non-dominated Sorting Genetic Algorithm II (NSGA-II) are widely used but treat all parameters equally, thereby wasting effort on variables with little impact on performance, which limits their scalability. We introduce CaDRO, a causal-guided dimensionality reduction framework that embeds causal discovery into the optimization pipeline. CaDRO builds a quantitative causal map through a hybrid observational-interventional process, ranking parameters by their causal effect on the objectives. Low-impact parameters are fixed to values from high-quality solutions, while critical drivers remain active in the search. The reduced design space enables focused evolutionary optimization without modifying the underlying algorithm. Across amplifiers, regulators, and RF circuits, CaDRO converges up to 10$\times$ faster than NSGA-II while preserving or improving Pareto quality. For instance, on the Folded-Cascode Amplifier, hypervolume improves from 0.56 to 0.94, and on the LDO regulator from 0.65 to 0.81, with large gains in non-dominated solutions.
Structured Cooperative Multi-Agent Reinforcement Learning: a Bayesian Network Perspective
The empirical success of multi-agent reinforcement learning (MARL) has motivated the search for more efficient and scalable algorithms for large scale multi-agent systems. However, existing state-of-the-art algorithms do not fully exploit inter-agent coupling information to develop MARL algorithms. In this paper, we propose a systematic approach to leverage structures in the inter-agent couplings for efficient model-free reinforcement learning. We model the cooperative MARL problem via a Bayesian network and characterize the subset of agents, termed as the value dependency set, whose information is required by each agent to estimate its local action value function exactly. Moreover, we propose a partially decentralized training decentralized execution (P-DTDE) paradigm based on the value dependency set. We theoretically establish that the total variance of our P-DTDE policy gradient estimator is less than the centralized training decentralized execution (CTDE) policy gradient estimator. We derive a multi-agent policy gradient theorem based on the P-DTDE scheme and develop a scalable actor-critic algorithm. We demonstrate the efficiency and scalability of the proposed algorithm on multi-warehouse resource allocation and multi-zone temperature control examples. For dense value dependency sets, we propose an approximation scheme based on truncation of the Bayesian network and empirically show that it achieves a faster convergence than the exact value dependence set for applications with a large number of agents.
Viscosity CBFs: Bridging the Control Barrier Function and Hamilton-Jacobi Reachability Frameworks in Safe Control Theory
Control barrier functions (CBFs) and Hamilton-Jacobi reachability (HJR) are central frameworks in safe control. Traditionally, these frameworks have been viewed as distinct, with the former focusing on optimally safe controller design and the latter providing sufficient conditions for safety. A previous work introduced the notion of a control barrier value function (CB-VF), which is defined similarly to the other value functions studied in HJR but has certain CBF-like properties. In this work, we proceed the other direction by generalizing CBFs to non-differentiable ``viscosity'' CBFs. We show the deep connection between viscosity CBFs and CB-VFs, bridging the CBF and HJR frameworks. Through this bridge, we characterize the viscosity CBFs as precisely those functions which provide CBF-like safety guarantees (control invariance and smooth approach to the boundary). We then further show nice theoretical properties of viscosity CBFs, including their desirable closure under maximum and limit operations. In the process, we also extend CB-VFs to non-exponential anti-discounting and update the corresponding theory for CB-VFs along these lines.
ASTREA: Introducing Agentic Intelligence for Orbital Thermal Autonomy
This paper presents ASTREA, the first agentic system executed on flight-heritage hardware (TRL 9) for autonomous spacecraft operations, with on-orbit operation aboard the International Space Station (ISS). Using thermal control as a representative use case, we integrate a resource-constrained Large Language Model (LLM) agent with a reinforcement learning controller in an asynchronous architecture tailored for space-qualified platforms. Ground experiments show that LLM-guided supervision improves thermal stability and reduces violations, confirming the feasibility of combining semantic reasoning with adaptive control under hardware constraints. On-orbit validation aboard the ISS initially faced challenges due to inference latency misaligned with the rapid thermal cycles of Low Earth Orbit (LEO) satellites. Synchronization with the orbit length successfully surpassed the baseline with reduced violations, extended episode durations, and improved CPU utilization. These findings demonstrate the potential for scalable agentic supervision architectures in future autonomous spacecraft.
comment: Accepted for presentation at the European Space Agency's AI Start 2025 Conference (see https://atpi.eventsair.com/ai-star-2025/)
Improving cosmological reach of a gravitational wave observatory using Deep Loop Shaping
Improved low-frequency sensitivity of gravitational wave observatories would unlock study of intermediate-mass black hole mergers, binary black hole eccentricity, and provide early warnings for multi-messenger observations of binary neutron star mergers. Today's mirror stabilization control injects harmful noise, constituting a major obstacle to sensitivity improvements. We eliminated this noise through Deep Loop Shaping, a reinforcement learning method using frequency domain rewards. We proved our methodology on the LIGO Livingston Observatory (LLO). Our controller reduced control noise in the 10--30Hz band by over 30x, and up to 100x in sub-bands surpassing the design goal motivated by the quantum limit. These results highlight the potential of Deep Loop Shaping to improve current and future GW observatories, and more broadly instrumentation and control systems.
comment: Re-added a reference that was dropped by mistake in the published paper. Fixed date of experiment in text
A Vision-Based Shared-Control Teleoperation Scheme for Controlling the Robotic Arm of a Four-Legged Robot
In hazardous and remote environments, robotic systems perform critical tasks demanding improved safety and efficiency. Among these, quadruped robots with manipulator arms offer mobility and versatility for complex operations. However, teleoperating quadruped robots is challenging due to the lack of integrated obstacle detection and intuitive control methods for the robotic arm, increasing collision risks in confined or dynamically changing workspaces. Teleoperation via joysticks or pads can be non-intuitive and demands a high level of expertise due to its complexity, culminating in a high cognitive load on the operator. To address this challenge, a teleoperation approach that directly maps human arm movements to the robotic manipulator offers a simpler and more accessible solution. This work proposes an intuitive remote control by leveraging a vision-based pose estimation pipeline that utilizes an external camera with a machine learning-based model to detect the operator's wrist position. The system maps these wrist movements into robotic arm commands to control the robot's arm in real-time. A trajectory planner ensures safe teleoperation by detecting and preventing collisions with both obstacles and the robotic arm itself. The system was validated on the real robot, demonstrating robust performance in real-time control. This teleoperation approach provides a cost-effective solution for industrial applications where safety, precision, and ease of use are paramount, ensuring reliable and intuitive robotic control in high-risk environments.
Autonomous UAV Flight Navigation in Confined Spaces: A Reinforcement Learning Approach
Autonomous UAV inspection of confined industrial infrastructure, such as ventilation ducts, demands robust navigation policies where collisions are unacceptable. While Deep Reinforcement Learning (DRL) offers a powerful paradigm for developing such policies, it presents a critical trade-off between on-policy and off-policy algorithms. Off-policy methods promise high sample efficiency, a vital trait for minimizing costly and unsafe real-world fine-tuning. In contrast, on-policy methods often exhibit greater training stability, which is essential for reliable convergence in hazard-dense environments. This paper directly investigates this trade-off by comparing a leading on-policy algorithm, Proximal Policy Optimization (PPO), against an off-policy counterpart, Soft Actor-Critic (SAC), for precision flight in procedurally generated ducts within a high-fidelity simulator. Our results show that PPO consistently learned a stable, collision-free policy that completed the entire course. In contrast, SAC failed to find a complete solution, converging to a suboptimal policy that navigated only the initial segments before failure. This work provides evidence that for high-precision, safety-critical navigation tasks, the reliable convergence of a well-established on-policy method can be more decisive than the nominal sample efficiency of an off-policy algorithm.
Optimizing Grasping in Legged Robots: A Deep Learning Approach to Loco-Manipulation
This paper presents a deep learning framework designed to enhance the grasping capabilities of quadrupeds equipped with arms, with a focus on improving precision and adaptability. Our approach centers on a sim-to-real methodology that minimizes reliance on physical data collection. We developed a pipeline within the Genesis simulation environment to generate a synthetic dataset of grasp attempts on common objects. By simulating thousands of interactions from various perspectives, we created pixel-wise annotated grasp-quality maps to serve as the ground truth for our model. This dataset was used to train a custom CNN with a U-Net-like architecture that processes multi-modal input from an onboard RGB and depth cameras, including RGB images, depth maps, segmentation masks, and surface normal maps. The trained model outputs a grasp-quality heatmap to identify the optimal grasp point. We validated the complete framework on a four-legged robot. The system successfully executed a full loco-manipulation task: autonomously navigating to a target object, perceiving it with its sensors, predicting the optimal grasp pose using our model, and performing a precise grasp. This work proves that leveraging simulated training with advanced sensing offers a scalable and effective solution for object handling.
A Novel Hybrid Grey Wolf Differential Evolution Algorithm
Grey wolf optimizer (GWO) is a nature-inspired stochastic meta-heuristic of the swarm intelligence field that mimics the hunting behavior of grey wolves. Differential evolution (DE) is a popular stochastic algorithm of the evolutionary computation field that is well suited for global optimization. In this part, we introduce a new algorithm based on the hybridization of GWO and two DE variants, namely the GWO-DE algorithm. We evaluate the new algorithm by applying various numerical benchmark functions. The numerical results of the comparative study are quite satisfactory in terms of performance and solution quality.
comment: 19 pages, 32 figures, journal
Continuous body 3-D reconstruction of limbless animals
Limbless animals such as snakes, limbless lizards, worms, eels, and lampreys move their slender, long bodies in three dimensions to traverse diverse environments. Accurately quantifying their continuous body's 3-D shape and motion is important for understanding body-environment interactions in complex terrain, but this is difficult to achieve (especially for local orientation and rotation). Here, we describe an interpolation method to quantify continuous body 3-D position and orientation. We simplify the body as an elastic rod and apply a backbone optimization method to interpolate continuous body shape between end constraints imposed by tracked markers. Despite over-simplifying the biomechanics, our method achieves a higher interpolation accuracy (~50% error) in both 3-D position and orientation compared with the widely-used cubic B-spline interpolation method. Beyond snakes traversing large obstacles as demonstrated, our method applies to other long, slender, limbless animals and continuum robots. We provide codes and demo files for easy application of our method.
Optimal Planning of Electric Vehicle Charging Stations: Integrating Public Charging Networks and Transportation Congestion
The adoption of electric vehicles (EVs) represents a critical shift in personal mobility, fueled by policy support and advancements in automotive technology. However, the expansion of EVs for long-distance travel is hindered by charging time concerns, the sparse distribution of charging stations, and the worsening waiting times due to congestion. The main objective of this work is two-fold: 1) first, to comprehensively analyze the existing public charging station robustness and effectively strategize for the new ones, and 2) secondly, to select the optimal chargers for long-distance journeys, by estimating the waiting time from current traffic congestion. This is achieved by accompanying effective EV charging strategies, pinpointing on the congestion points from the existing traffic, and the robustness of the current charging station infrastructure. Utilizing a real-time transportation and charging station dataset in Texas, we identify optimal charger placement strategies to minimize travel time by examining the congestion and charging time trade-offs. Our findings suggest that maximizing the constant current phase during charging enhances efficiency, crucial for long-distance travel. On the contrary, we also explore the negative impact of congestion on travel times and we conclude that sometimes it might be beneficial to exceed the constant current phase to avoid the congested charging stations.
Systems and Control (EESS)
Low-cost Pyranometer-Based ANN Approach for MPPT in Solar PV Systems
This article presents a study on the application of artificial neural networks (ANNs) for maximum power point tracking (MPPT) in photovoltaic (PV) systems using low-cost pyranometer sensors. The proposed approach integrates pyranometers, temperature sensors, and an ANN to estimate the duty cycle of a DC/DC converter, enabling the system to consistently operate at its maximum power point. The strategy was implemented in the local control of a Cuk converter and experimentally validated against the conventional Perturb and Observe (P&O) method. Results demonstrate that the ANN-based technique, leveraging affordable sensor technology, achieves accurate MPPT performance with reduced fluctuations, enhancing the responsiveness and efficiency of PV tracking systems.
The algorithmic regulator
The regulator theorem states that, under certain conditions, any optimal controller must embody a model of the system it regulates, grounding the idea that controllers embed, explicitly or implicitly, internal models of the controlled. This principle underpins neuroscience and predictive brain theories like the Free-Energy Principle or Kolmogorov/Algorithmic Agent theory. However, the theorem is only proven in limited settings. Here, we treat the deterministic, closed, coupled world-regulator system $(W,R)$ as a single self-delimiting program $p$ via a constant-size wrapper that produces the world output string~$x$ fed to the regulator. We analyze regulation from the viewpoint of the algorithmic complexity of the output, $K(x)$. We define $R$ to be a \emph{good algorithmic regulator} if it \emph{reduces} the algorithmic complexity of the readout relative to a null (unregulated) baseline $\varnothing$, i.e., \[ \Delta = K\big(O_{W,\varnothing}\big) - K\big(O_{W,R}\big) > 0. \] We then prove that the larger $\Delta$ is, the more world-regulator pairs with high mutual algorithmic information are favored. More precisely, a complexity gap $\Delta > 0$ yields \[ \Pr\big((W,R)\mid x\big) \le C\,2^{\,M(W{:}R)}\,2^{-\Delta}, \] making low $M(W{:}R)$ exponentially unlikely as $\Delta$ grows. This is an AIT version of the idea that ``the regulator contains a model of the world.'' The framework is distribution-free, applies to individual sequences, and complements the Internal Model Principle. Beyond this necessity claim, the same coding-theorem calculus singles out a \emph{canonical scalar objective} and implicates a \emph{planner}. On the realized episode, a regulator behaves \emph{as if} it minimized the conditional description length of the readout.
comment: 2 Figures
Optimal monophasic, asymmetric electric field pulses for selective transcranial magnetic stimulation (TMS) with minimised power and coil heating
Transcranial magnetic stimulation (TMS) with asymmetric electric field pulses, such as monophasic, offers directional selectivity for neural activation but requires excessive energy. Previous pulse shape optimisation has been limited to symmetric pulses or heavily constrained variations of conventional waveforms without achieving general optimality in energy efficiency or neural selectivity. We implemented an optimisation framework that incorporates neuron model activation constraints and flexible control of pulse asymmetry. The optimised electric field waveforms achieved up to 92 % and 88 % reduction in energy loss and thus coil heating respectively compared to conventional monophasic pulses and previously improved monophasic-equivalent pulses. In the human experiments, OUR pulses showed similar motor thresholds to monophasic pulses in both AP and PA directions with significantly lower energy loss, particularly in the AP direction. Moreover, there was a significant MEP latency difference of (1.79 +/- 0.41) ms between AP and PA direction with OUR pulses, which suggests directional selectivity. Our framework successfully identified highly energy-efficient asymmetric pulses for directionally-selective neural engagement. These pulses can enable selective rapid-rate repetitive TMS protocols with reduced power consumption and coil heating, with potential benefits for precision and potency of neuro-modulation.
comment: 31 pages, 8 figures
Hybrid MAC Protocol with Integrated Multi-Layered Security for Resource-Constrained UAV Swarm Communications
Flying Ad Hoc Networks (FANETs) present unique challenges due to high node mobility, dynamic topologies, and strict resource constraints. Existing routing protocols often optimize for a single metric, such as path length or energy, while neglecting the complex dependencies between network performance, security, and MAC layer efficiency. This paper introduces a novel hardware software co design framework for secure and adaptive UAV swarm communications, featuring an energy aware protocol stack. The architecture employs a multicast, clustered organization where routing decisions integrate dynamic trust scores, historical link quality, and internodal distance. A hybrid MAC protocol combines contention based and scheduled channel access for optimized throughput. Security is ensured through a zero trust model that fuses cryptographic authentication with a behavioral reputation system, alongside hardware accelerated AES GCM encryption. Comparative analysis in an NS 3 simulation environment demonstrates the framework's superiority in packet delivery ratio, latency, resilience, and overhead, providing a scalable foundation for high performance swarm operations.
comment: Accepted at ISED 2025
Bounds of Validity for Bifurcations of Equilibria in a Class of Networked Dynamical Systems
Local bifurcation analysis plays a central role in understanding qualitative transitions in networked nonlinear dynamical systems, including dynamic neural network and opinion dynamics models. In this article we establish explicit bounds of validity for the classification of bifurcation diagrams in two classes of continuous-time networked dynamical systems, analogous in structure to the Hopfield and the Firing Rate dynamic neural network models. Our approach leverages recent advances in computing the bounds for the validity of Lyapunov-Schmidt reduction, a reduction method widely employed in nonlinear systems analysis. Using these bounds we rigorously characterize neighborhoods around bifurcation points where predictions from reduced-order models remain reliable. We further demonstrate how these bounds can be applied to an illustrative family of nonlinear opinion dynamics on k-regular graphs, which emerges as a special case of the general framework. These results provide new analytical tools for quantifying the robustness of bifurcation phenomena in dynamics over networked systems and highlight the interplay between network structure and nonlinear dynamical behavior.
comment: This manuscript has been submitted to the 2026 American Control Conference taking place in New Orleans, Louisiana, in May 2026
Distributionally Robust Control with End-to-End Statistically Guaranteed Metric Learning
Wasserstein distributionally robust control (DRC) recently emerges as a principled paradigm for handling uncertainty in stochastic dynamical systems. However, it constructs data-driven ambiguity sets via uniform distribution shifts before sequentially incorporating them into downstream control synthesis. This segregation between ambiguity set construction and control objectives inherently introduces a structural misalignment, which undesirably leads to conservative control policies with sub-optimal performance. To address this limitation, we propose a novel end-to-end finite-horizon Wasserstein DRC framework that integrates the learning of anisotropic Wasserstein metrics with downstream control tasks in a closed-loop manner, thus enabling ambiguity sets to be systematically adjusted along performance-critical directions and yielding more effective control policies. This framework is formulated as a bilevel program: the inner level characterizes dynamical system evolution under DRC, while the outer level refines the anisotropic metric leveraging control-performance feedback across a range of initial conditions. To solve this program efficiently, we develop a stochastic augmented Lagrangian algorithm tailored to the bilevel structure. Theoretically, we prove that the learned ambiguity sets preserve statistical finite-sample guarantees under a novel radius adjustment mechanism, and we establish the well-posedness of the bilevel formulation by demonstrating its continuity with respect to the learnable metric. Furthermore, we show that the algorithm converges to stationary points of the outer level problem, which are statistically consistent with the optimal metric at a non-asymptotic convergence rate. Experiments on both numerical and inventory control tasks verify that the proposed framework achieves superior closed-loop performance and robustness compared against state-of-the-art methods.
Performance Index Shaping for Closed-loop Optimal Control
The design of the performance index, also referred to as cost or reward shaping, is central to both optimal control and reinforcement learning, as it directly determines the behaviors, trade-offs, and objectives that the resulting control laws seek to achieve. A commonly used approach for this inference task in recent years is differentiable trajectory optimization, which allows gradients to be computed with respect to cost parameters by differentiating through an optimal control solver. However, this method often requires repeated solving of the underlying optimal control problem at every iteration, making the method computationally expensive. In this work, assuming known dynamics, we propose a novel framework that analytically links the performance index to the resulting closed-loop optimal control law, thereby transforming a typically bi-level inverse problem into a tractable single-level formulation. Our approach is motivated by the question: given a closed-loop control law that solves an infinite-horizon optimal control problem, how does this law change when the performance index is modified with additional terms? This formulation yields closed-form characterizations for broad classes of systems and performance indices, which not only facilitate interpretation and stability analysis, but also provide insight into the robust stability and input-to-state stable behavior of the resulting nonlinear closed-loop system. Moreover, this analytical perspective enables the generalization of our approach to diverse design objectives, yielding a unifying framework for performance index shaping. Given specific design objectives, we propose a systematic methodology to guide the shaping of the performance index and thereby design the resulting optimal control law.
Modeling the Impact of Communication and Human Uncertainties on Runway Capacity in Terminal Airspace
We investigate the potential impact of communication and human performance uncertainties on runway operations. Specifically, we consider these impacts within the context of an arrival scenario with two converging flows: a straight-in approach stream and a downwind stream merging into it. Both arrival stream are modeled using a modified Possion distribution that incorporate the separation minima as well as the runway occupancy time. Various system level uncertainties are addressed in this process, including communication link- and human-related uncertainties. In this research, we first build a Monte Carlo-based discrete-time simulation, where aircraft arrivals are generated by modified Poisson processes subject to minimum separation constraints, simulating various traffic operations. The merging logic incorporates standard bank angle continuous turn-to-final, pilot response delays, and dynamic gap availability in real time. Then, we investigate an automated final approach vectoring model (i.e., Auto-ATC), in which inverse optimal control is used to learn decision advisories from human expert records. By augmenting trajectories and incorporating the aforementioned uncertainties into the planning scenario, we create a setup analogous to the discrete event simulation. For both studies, runway capacity is measured by runway throughput, the fraction of downwind arrivals that merge immediately without holding, and the average delay (i.e., holding time/distance) experienced on the downwind leg. This research provides a method for runway capacity estimation in merging scenarios, and demonstrates that aeronautical communication link uncertainties significantly affect runway capacity in current voice-based operations, whereas the impact can be mitigated in autonomous operational settings.
Causal-Guided Dimension Reduction for Efficient Pareto Optimization
Multi-objective optimization of analog circuits is hindered by high-dimensional parameter spaces, strong feedback couplings, and expensive transistor-level simulations. Evolutionary algorithms such as Non-dominated Sorting Genetic Algorithm II (NSGA-II) are widely used but treat all parameters equally, thereby wasting effort on variables with little impact on performance, which limits their scalability. We introduce CaDRO, a causal-guided dimensionality reduction framework that embeds causal discovery into the optimization pipeline. CaDRO builds a quantitative causal map through a hybrid observational-interventional process, ranking parameters by their causal effect on the objectives. Low-impact parameters are fixed to values from high-quality solutions, while critical drivers remain active in the search. The reduced design space enables focused evolutionary optimization without modifying the underlying algorithm. Across amplifiers, regulators, and RF circuits, CaDRO converges up to 10$\times$ faster than NSGA-II while preserving or improving Pareto quality. For instance, on the Folded-Cascode Amplifier, hypervolume improves from 0.56 to 0.94, and on the LDO regulator from 0.65 to 0.81, with large gains in non-dominated solutions.
Structured Cooperative Multi-Agent Reinforcement Learning: a Bayesian Network Perspective
The empirical success of multi-agent reinforcement learning (MARL) has motivated the search for more efficient and scalable algorithms for large scale multi-agent systems. However, existing state-of-the-art algorithms do not fully exploit inter-agent coupling information to develop MARL algorithms. In this paper, we propose a systematic approach to leverage structures in the inter-agent couplings for efficient model-free reinforcement learning. We model the cooperative MARL problem via a Bayesian network and characterize the subset of agents, termed as the value dependency set, whose information is required by each agent to estimate its local action value function exactly. Moreover, we propose a partially decentralized training decentralized execution (P-DTDE) paradigm based on the value dependency set. We theoretically establish that the total variance of our P-DTDE policy gradient estimator is less than the centralized training decentralized execution (CTDE) policy gradient estimator. We derive a multi-agent policy gradient theorem based on the P-DTDE scheme and develop a scalable actor-critic algorithm. We demonstrate the efficiency and scalability of the proposed algorithm on multi-warehouse resource allocation and multi-zone temperature control examples. For dense value dependency sets, we propose an approximation scheme based on truncation of the Bayesian network and empirically show that it achieves a faster convergence than the exact value dependence set for applications with a large number of agents.
Viscosity CBFs: Bridging the Control Barrier Function and Hamilton-Jacobi Reachability Frameworks in Safe Control Theory
Control barrier functions (CBFs) and Hamilton-Jacobi reachability (HJR) are central frameworks in safe control. Traditionally, these frameworks have been viewed as distinct, with the former focusing on optimally safe controller design and the latter providing sufficient conditions for safety. A previous work introduced the notion of a control barrier value function (CB-VF), which is defined similarly to the other value functions studied in HJR but has certain CBF-like properties. In this work, we proceed the other direction by generalizing CBFs to non-differentiable ``viscosity'' CBFs. We show the deep connection between viscosity CBFs and CB-VFs, bridging the CBF and HJR frameworks. Through this bridge, we characterize the viscosity CBFs as precisely those functions which provide CBF-like safety guarantees (control invariance and smooth approach to the boundary). We then further show nice theoretical properties of viscosity CBFs, including their desirable closure under maximum and limit operations. In the process, we also extend CB-VFs to non-exponential anti-discounting and update the corresponding theory for CB-VFs along these lines.
ASTREA: Introducing Agentic Intelligence for Orbital Thermal Autonomy
This paper presents ASTREA, the first agentic system executed on flight-heritage hardware (TRL 9) for autonomous spacecraft operations, with on-orbit operation aboard the International Space Station (ISS). Using thermal control as a representative use case, we integrate a resource-constrained Large Language Model (LLM) agent with a reinforcement learning controller in an asynchronous architecture tailored for space-qualified platforms. Ground experiments show that LLM-guided supervision improves thermal stability and reduces violations, confirming the feasibility of combining semantic reasoning with adaptive control under hardware constraints. On-orbit validation aboard the ISS initially faced challenges due to inference latency misaligned with the rapid thermal cycles of Low Earth Orbit (LEO) satellites. Synchronization with the orbit length successfully surpassed the baseline with reduced violations, extended episode durations, and improved CPU utilization. These findings demonstrate the potential for scalable agentic supervision architectures in future autonomous spacecraft.
comment: Accepted for presentation at the European Space Agency's AI Start 2025 Conference (see https://atpi.eventsair.com/ai-star-2025/)
Improving cosmological reach of a gravitational wave observatory using Deep Loop Shaping
Improved low-frequency sensitivity of gravitational wave observatories would unlock study of intermediate-mass black hole mergers, binary black hole eccentricity, and provide early warnings for multi-messenger observations of binary neutron star mergers. Today's mirror stabilization control injects harmful noise, constituting a major obstacle to sensitivity improvements. We eliminated this noise through Deep Loop Shaping, a reinforcement learning method using frequency domain rewards. We proved our methodology on the LIGO Livingston Observatory (LLO). Our controller reduced control noise in the 10--30Hz band by over 30x, and up to 100x in sub-bands surpassing the design goal motivated by the quantum limit. These results highlight the potential of Deep Loop Shaping to improve current and future GW observatories, and more broadly instrumentation and control systems.
comment: Re-added a reference that was dropped by mistake in the published paper. Fixed date of experiment in text
A Vision-Based Shared-Control Teleoperation Scheme for Controlling the Robotic Arm of a Four-Legged Robot
In hazardous and remote environments, robotic systems perform critical tasks demanding improved safety and efficiency. Among these, quadruped robots with manipulator arms offer mobility and versatility for complex operations. However, teleoperating quadruped robots is challenging due to the lack of integrated obstacle detection and intuitive control methods for the robotic arm, increasing collision risks in confined or dynamically changing workspaces. Teleoperation via joysticks or pads can be non-intuitive and demands a high level of expertise due to its complexity, culminating in a high cognitive load on the operator. To address this challenge, a teleoperation approach that directly maps human arm movements to the robotic manipulator offers a simpler and more accessible solution. This work proposes an intuitive remote control by leveraging a vision-based pose estimation pipeline that utilizes an external camera with a machine learning-based model to detect the operator's wrist position. The system maps these wrist movements into robotic arm commands to control the robot's arm in real-time. A trajectory planner ensures safe teleoperation by detecting and preventing collisions with both obstacles and the robotic arm itself. The system was validated on the real robot, demonstrating robust performance in real-time control. This teleoperation approach provides a cost-effective solution for industrial applications where safety, precision, and ease of use are paramount, ensuring reliable and intuitive robotic control in high-risk environments.
Autonomous UAV Flight Navigation in Confined Spaces: A Reinforcement Learning Approach
Autonomous UAV inspection of confined industrial infrastructure, such as ventilation ducts, demands robust navigation policies where collisions are unacceptable. While Deep Reinforcement Learning (DRL) offers a powerful paradigm for developing such policies, it presents a critical trade-off between on-policy and off-policy algorithms. Off-policy methods promise high sample efficiency, a vital trait for minimizing costly and unsafe real-world fine-tuning. In contrast, on-policy methods often exhibit greater training stability, which is essential for reliable convergence in hazard-dense environments. This paper directly investigates this trade-off by comparing a leading on-policy algorithm, Proximal Policy Optimization (PPO), against an off-policy counterpart, Soft Actor-Critic (SAC), for precision flight in procedurally generated ducts within a high-fidelity simulator. Our results show that PPO consistently learned a stable, collision-free policy that completed the entire course. In contrast, SAC failed to find a complete solution, converging to a suboptimal policy that navigated only the initial segments before failure. This work provides evidence that for high-precision, safety-critical navigation tasks, the reliable convergence of a well-established on-policy method can be more decisive than the nominal sample efficiency of an off-policy algorithm.
Optimizing Grasping in Legged Robots: A Deep Learning Approach to Loco-Manipulation
This paper presents a deep learning framework designed to enhance the grasping capabilities of quadrupeds equipped with arms, with a focus on improving precision and adaptability. Our approach centers on a sim-to-real methodology that minimizes reliance on physical data collection. We developed a pipeline within the Genesis simulation environment to generate a synthetic dataset of grasp attempts on common objects. By simulating thousands of interactions from various perspectives, we created pixel-wise annotated grasp-quality maps to serve as the ground truth for our model. This dataset was used to train a custom CNN with a U-Net-like architecture that processes multi-modal input from an onboard RGB and depth cameras, including RGB images, depth maps, segmentation masks, and surface normal maps. The trained model outputs a grasp-quality heatmap to identify the optimal grasp point. We validated the complete framework on a four-legged robot. The system successfully executed a full loco-manipulation task: autonomously navigating to a target object, perceiving it with its sensors, predicting the optimal grasp pose using our model, and performing a precise grasp. This work proves that leveraging simulated training with advanced sensing offers a scalable and effective solution for object handling.
A Novel Hybrid Grey Wolf Differential Evolution Algorithm
Grey wolf optimizer (GWO) is a nature-inspired stochastic meta-heuristic of the swarm intelligence field that mimics the hunting behavior of grey wolves. Differential evolution (DE) is a popular stochastic algorithm of the evolutionary computation field that is well suited for global optimization. In this part, we introduce a new algorithm based on the hybridization of GWO and two DE variants, namely the GWO-DE algorithm. We evaluate the new algorithm by applying various numerical benchmark functions. The numerical results of the comparative study are quite satisfactory in terms of performance and solution quality.
comment: 19 pages, 32 figures, journal
Continuous body 3-D reconstruction of limbless animals
Limbless animals such as snakes, limbless lizards, worms, eels, and lampreys move their slender, long bodies in three dimensions to traverse diverse environments. Accurately quantifying their continuous body's 3-D shape and motion is important for understanding body-environment interactions in complex terrain, but this is difficult to achieve (especially for local orientation and rotation). Here, we describe an interpolation method to quantify continuous body 3-D position and orientation. We simplify the body as an elastic rod and apply a backbone optimization method to interpolate continuous body shape between end constraints imposed by tracked markers. Despite over-simplifying the biomechanics, our method achieves a higher interpolation accuracy (~50% error) in both 3-D position and orientation compared with the widely-used cubic B-spline interpolation method. Beyond snakes traversing large obstacles as demonstrated, our method applies to other long, slender, limbless animals and continuum robots. We provide codes and demo files for easy application of our method.
Optimal Planning of Electric Vehicle Charging Stations: Integrating Public Charging Networks and Transportation Congestion
The adoption of electric vehicles (EVs) represents a critical shift in personal mobility, fueled by policy support and advancements in automotive technology. However, the expansion of EVs for long-distance travel is hindered by charging time concerns, the sparse distribution of charging stations, and the worsening waiting times due to congestion. The main objective of this work is two-fold: 1) first, to comprehensively analyze the existing public charging station robustness and effectively strategize for the new ones, and 2) secondly, to select the optimal chargers for long-distance journeys, by estimating the waiting time from current traffic congestion. This is achieved by accompanying effective EV charging strategies, pinpointing on the congestion points from the existing traffic, and the robustness of the current charging station infrastructure. Utilizing a real-time transportation and charging station dataset in Texas, we identify optimal charger placement strategies to minimize travel time by examining the congestion and charging time trade-offs. Our findings suggest that maximizing the constant current phase during charging enhances efficiency, crucial for long-distance travel. On the contrary, we also explore the negative impact of congestion on travel times and we conclude that sometimes it might be beneficial to exceed the constant current phase to avoid the congested charging stations.
Robotics
Zero-shot Structure Learning and Planning for Autonomous Robot Navigation using Active Inference
Autonomous navigation in unfamiliar environments requires robots to simultaneously explore, localise, and plan under uncertainty, without relying on predefined maps or extensive training. We present a biologically inspired, Active Inference-based framework, Active Inference MAPping and Planning (AIMAPP). This model unifies mapping, localisation, and decision-making within a single generative model. Inspired by hippocampal navigation, it uses topological reasoning, place-cell encoding, and episodic memory to guide behaviour. The agent builds and updates a sparse topological map online, learns state transitions dynamically, and plans actions by minimising Expected Free Energy. This allows it to balance goal-directed and exploratory behaviours. We implemented a ROS-compatible navigation system that is sensor and robot-agnostic, capable of integrating with diverse hardware configurations. It operates in a fully self-supervised manner, is resilient to drift, and supports both exploration and goal-directed navigation without any pre-training. We demonstrate robust performance in large-scale real and simulated environments against state-of-the-art planning models, highlighting the system's adaptability to ambiguous observations, environmental changes, and sensor noise. The model offers a biologically inspired, modular solution to scalable, self-supervised navigation in unstructured settings. AIMAPP is available at https://github.com/decide-ugent/AIMAPP.
comment: yet to be submitted
Differential Analysis of Pseudo Haptic Feedback: Novel Comparative Study of Visual and Auditory Cue Integration for Psychophysical Evaluation
Pseudo-haptics exploit carefully crafted visual or auditory cues to trick the brain into "feeling" forces that are never physically applied, offering a low-cost alternative to traditional haptic hardware. Here, we present a comparative psychophysical study that quantifies how visual and auditory stimuli combine to evoke pseudo-haptic pressure sensations on a commodity tablet. Using a Unity-based Rollball game, participants (n = 4) guided a virtual ball across three textured terrains while their finger forces were captured in real time with a Robotous RFT40 force-torque sensor. Each terrain was paired with a distinct rolling-sound profile spanning 440 Hz - 4.7 kHz, 440 Hz - 13.1 kHz, or 440 Hz - 8.9 kHz; crevice collisions triggered additional "knocking" bursts to heighten realism. Average tactile forces increased systematically with cue intensity: 0.40 N, 0.79 N and 0.88 N for visual-only trials and 0.41 N, 0.81 N and 0.90 N for audio-only trials on Terrains 1-3, respectively. Higher audio frequencies and denser visual textures both elicited stronger muscle activation, and their combination further reduced the force needed to perceive surface changes, confirming multisensory integration. These results demonstrate that consumer-grade isometric devices can reliably induce and measure graded pseudo-haptic feedback without specialized actuators, opening a path toward affordable rehabilitation tools, training simulators and assistive interfaces.
comment: 17 Pages, 9 Figures
Guiding Energy-Efficient Locomotion through Impact Mitigation Rewards
Animals achieve energy-efficient locomotion by their implicit passive dynamics, a marvel that has captivated roboticists for decades.Recently, methods incorporated Adversarial Motion Prior (AMP) and Reinforcement learning (RL) shows promising progress to replicate Animals' naturalistic motion. However, such imitation learning approaches predominantly capture explicit kinematic patterns, so-called gaits, while overlooking the implicit passive dynamics. This work bridges this gap by incorporating a reward term guided by Impact Mitigation Factor (IMF), a physics-informed metric that quantifies a robot's ability to passively mitigate impacts. By integrating IMF with AMP, our approach enables RL policies to learn both explicit motion trajectories from animal reference motion and the implicit passive dynamic. We demonstrate energy efficiency improvements of up to 32%, as measured by the Cost of Transport (CoT), across both AMP and handcrafted reward structure.
Dynamic Quadrupedal Legged and Aerial Locomotion via Structure Repurposing
Multi-modal ground-aerial robots have been extensively studied, with a significant challenge lying in the integration of conflicting requirements across different modes of operation. The Husky robot family, developed at Northeastern University, and specifically the Husky v.2 discussed in this study, addresses this challenge by incorporating posture manipulation and thrust vectoring into multi-modal locomotion through structure repurposing. This quadrupedal robot features leg structures that can be repurposed for dynamic legged locomotion and flight. In this paper, we present the hardware design of the robot and report primary results on dynamic quadrupedal legged locomotion and hovering.
Toggling stiffness via multistability
Mechanical metamaterials enable unconventional and programmable mechanical responses through structural design rather than material composition. In this work, we introduce a multistable mechanical metamaterial that exhibits a toggleable stiffness effect, where the effective shear stiffness switches discretely between stable configurations. The mechanical analysis of surrogate beam models of the unit cell reveal that this behavior originates from the rotation transmitted by the support beams to the curved beam, which governs the balance between bending and axial deformation. The stiffness ratio between the two states of the unit cell can be tuned by varying the slenderness of the support beams or by incorporating localized hinges that modulate rotational transfer. Experiments on 3D-printed prototypes validate the numerical predictions, confirming consistent stiffness toggling across different geometries. Finally, we demonstrate a monolithic soft clutch that leverages this effect to achieve programmable, stepwise stiffness modulation. This work establishes a design strategy for toggleable stiffness using multistable metamaterials, paving the way for adaptive, lightweight, and autonomous systems in soft robotics and smart structures.
PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs
The ability to use, understand, and create tools is a hallmark of human intelligence, enabling sophisticated interaction with the physical world. For any general-purpose intelligent agent to achieve true versatility, it must also master these fundamental skills. While modern Multimodal Large Language Models (MLLMs) leverage their extensive common knowledge for high-level planning in embodied AI and in downstream Vision-Language-Action (VLA) models, the extent of their true understanding of physical tools remains unquantified. To bridge this gap, we present PhysToolBench, the first benchmark dedicated to evaluating the comprehension of physical tools by MLLMs. Our benchmark is structured as a Visual Question Answering (VQA) dataset comprising over 1,000 image-text pairs. It assesses capabilities across three distinct difficulty levels: (1) Tool Recognition: Requiring the recognition of a tool's primary function. (2) Tool Understanding: Testing the ability to grasp the underlying principles of a tool's operation. (3) Tool Creation: Challenging the model to fashion a new tool from surrounding objects when conventional options are unavailable. Our comprehensive evaluation of 32 MLLMs-spanning proprietary, open-source, specialized embodied, and backbones in VLAs-reveals a significant deficiency in tool understanding. Furthermore, we provide an in-depth analysis and propose preliminary solutions. Code and dataset are publicly available.
Autonomous Soft Robotic Guidewire Navigation via Imitation Learning
In endovascular surgery, endovascular interventionists push a thin tube called a catheter, guided by a thin wire to a treatment site inside the patient's blood vessels to treat various conditions such as blood clots, aneurysms, and malformations. Guidewires with robotic tips can enhance maneuverability, but they present challenges in modeling and control. Automation of soft robotic guidewire navigation has the potential to overcome these challenges, increasing the precision and safety of endovascular navigation. In other surgical domains, end-to-end imitation learning has shown promising results. Thus, we develop a transformer-based imitation learning framework with goal conditioning, relative action outputs, and automatic contrast dye injections to enable generalizable soft robotic guidewire navigation in an aneurysm targeting task. We train the model on 36 different modular bifurcated geometries, generating 647 total demonstrations under simulated fluoroscopy, and evaluate it on three previously unseen vascular geometries. The model can autonomously drive the tip of the robot to the aneurysm location with a success rate of 83% on the unseen geometries, outperforming several baselines. In addition, we present ablation and baseline studies to evaluate the effectiveness of each design and data collection choice. Project website: https://softrobotnavigation.github.io/
FOGMACHINE -- Leveraging Discrete-Event Simulation and Scene Graphs for Modeling Hierarchical, Interconnected Environments under Partial Observations from Mobile Agents
Dynamic Scene Graphs (DSGs) provide a structured representation of hierarchical, interconnected environments, but current approaches struggle to capture stochastic dynamics, partial observability, and multi-agent activity. These aspects are critical for embodied AI, where agents must act under uncertainty and delayed perception. We introduce FOGMACHINE , an open-source framework that fuses DSGs with discrete-event simulation to model object dynamics, agent observations, and interactions at scale. This setup enables the study of uncertainty propagation, planning under limited perception, and emergent multi-agent behavior. Experiments in urban scenarios illustrate realistic temporal and spatial patterns while revealing the challenges of belief estimation under sparse observations. By combining structured representations with efficient simulation, FOGMACHINE establishes an effective tool for benchmarking, model training, and advancing embodied AI in complex, uncertain environments.
comment: submitted to the IEEE for possible publication; 8 pages, 3 figures, 1 table
Scalable Multi-Agent Path Finding using Collision-Aware Dynamic Alert Mask and a Hybrid Execution Strategy
Multi-agent pathfinding (MAPF) remains a critical problem in robotics and autonomous systems, where agents must navigate shared spaces efficiently while avoiding conflicts. Traditional centralized algorithms that have global information, such as Conflict-Based Search (CBS), provide high-quality solutions but become computationally expensive in large-scale scenarios due to the combinatorial explosion of conflicts that need resolution. Conversely, distributed approaches that have local information, particularly learning-based methods, offer better scalability by operating with relaxed information availability, yet often at the cost of solution quality. To address these limitations, we propose a hybrid framework that combines decentralized path planning with a lightweight centralized coordinator. Our framework leverages reinforcement learning (RL) for decentralized planning, enabling agents to adapt their planning based on minimal, targeted alerts--such as static conflict-cell flags or brief conflict tracks--that are dynamically shared information from the central coordinator for effective conflict resolution. We empirically study the effect of the information available to an agent on its planning performance. Our approach reduces the inter-agent information sharing compared to fully centralized and distributed methods, while still consistently finding feasible, collision-free solutions--even in large-scale scenarios having higher agent counts.
Failure Prediction at Runtime for Generative Robot Policies NeurIPS 2025
Imitation learning (IL) with generative models, such as diffusion and flow matching, has enabled robots to perform complex, long-horizon tasks. However, distribution shifts from unseen environments or compounding action errors can still cause unpredictable and unsafe behavior, leading to task failure. Early failure prediction during runtime is therefore essential for deploying robots in human-centered and safety-critical environments. We propose FIPER, a general framework for Failure Prediction at Runtime for generative IL policies that does not require failure data. FIPER identifies two key indicators of impending failure: (i) out-of-distribution (OOD) observations detected via random network distillation in the policy's embedding space, and (ii) high uncertainty in generated actions measured by a novel action-chunk entropy score. Both failure prediction scores are calibrated using a small set of successful rollouts via conformal prediction. A failure alarm is triggered when both indicators, aggregated over short time windows, exceed their thresholds. We evaluate FIPER across five simulation and real-world environments involving diverse failure modes. Our results demonstrate that FIPER better distinguishes actual failures from benign OOD situations and predicts failures more accurately and earlier than existing methods. We thus consider this work an important step towards more interpretable and safer generative robot policies. Code, data and videos are available at https://tum-lsy.github.io/fiper_website.
comment: Accepted to NeurIPS 2025
SilvaScenes: Tree Segmentation and Species Classification from Under-Canopy Images in Natural Forests
Interest in robotics for forest management is growing, but perception in complex, natural environments remains a significant hurdle. Conditions such as heavy occlusion, variable lighting, and dense vegetation pose challenges to automated systems, which are essential for precision forestry, biodiversity monitoring, and the automation of forestry equipment. These tasks rely on advanced perceptual capabilities, such as detection and fine-grained species classification of individual trees. Yet, existing datasets are inadequate to develop such perception systems, as they often focus on urban settings or a limited number of species. To address this, we present SilvaScenes, a new dataset for instance segmentation of tree species from under-canopy images. Collected across five bioclimatic domains in Quebec, Canada, SilvaScenes features 1476 trees from 24 species with annotations from forestry experts. We demonstrate the relevance and challenging nature of our dataset by benchmarking modern deep learning approaches for instance segmentation. Our results show that, while tree segmentation is easy, with a top mean average precision (mAP) of 67.65%, species classification remains a significant challenge with an mAP of only 35.69%. Our dataset and source code will be available at https://github.com/norlab-ulaval/SilvaScenes.
comment: 8 pages, 5 figures
Bridging Research and Practice in Simulation-based Testing of Industrial Robot Navigation Systems
Ensuring robust robotic navigation in dynamic environments is a key challenge, as traditional testing methods often struggle to cover the full spectrum of operational requirements. This paper presents the industrial adoption of Surrealist, a simulation-based test generation framework originally for UAVs, now applied to the ANYmal quadrupedal robot for industrial inspection. Our method uses a search-based algorithm to automatically generate challenging obstacle avoidance scenarios, uncovering failures often missed by manual testing. In a pilot phase, generated test suites revealed critical weaknesses in one experimental algorithm (40.3% success rate) and served as an effective benchmark to prove the superior robustness of another (71.2% success rate). The framework was then integrated into the ANYbotics workflow for a six-month industrial evaluation, where it was used to test five proprietary algorithms. A formal survey confirmed its value, showing it enhances the development process, uncovers critical failures, provides objective benchmarks, and strengthens the overall verification pipeline.
comment: 12 pages, accepted for publication at IEEE/ACM International Conference on Automated Software Engineering (ASE) 2025 - Industry Showcase Track
Parametrized Topological Complexity for a Multi-Robot System with Variable Tasks
We study a generalized motion planning problem involving multiple autonomous robots navigating in a $d$-dimensional Euclidean space in the presence of a set of obstacles whose positions are unknown a priori. Each robot is required to visit sequentially a prescribed set of target states, with the number of targets varying between robots. This heterogeneous setting generalizes the framework considered in the prior works on sequential parametrized topological complexity by Farber and the second author of this article. To determine the topological complexity of our problem, we formulate it mathematically by constructing an appropriate fibration. Our main contribution is the determination of this invariant in the generalized setting, which captures the minimal algorithmic instability required for designing collision-free motion planning algorithms under parameter-dependent constraints. We provide a detailed analysis for both odd and even-dimensional ambient spaces, including the essential cohomological computations and explicit constructions of corresponding motion planning algorithms.
comment: 25 pages. All comments are welcome
Placeit! A Framework for Learning Robot Object Placement Skills
Robotics research has made significant strides in learning, yet mastering basic skills like object placement remains a fundamental challenge. A key bottleneck is the acquisition of large-scale, high-quality data, which is often a manual and laborious process. Inspired by Graspit!, a foundational work that used simulation to automatically generate dexterous grasp poses, we introduce Placeit!, an evolutionary-computation framework for generating valid placement positions for rigid objects. Placeit! is highly versatile, supporting tasks from placing objects on tables to stacking and inserting them. Our experiments show that by leveraging quality-diversity optimization, Placeit! significantly outperforms state-of-the-art methods across all scenarios for generating diverse valid poses. A pick&place pipeline built on our framework achieved a 90% success rate over 120 real-world deployments. This work positions Placeit! as a powerful tool for open-environment pick-and-place tasks and as a valuable engine for generating the data needed to train simulation-based foundation models in robotics.
comment: 8 pages, 8 figures. Draft version
Obstacle Avoidance using Dynamic Movement Primitives and Reinforcement Learning
Learning-based motion planning can quickly generate near-optimal trajectories. However, it often requires either large training datasets or costly collection of human demonstrations. This work proposes an alternative approach that quickly generates smooth, near-optimal collision-free 3D Cartesian trajectories from a single artificial demonstration. The demonstration is encoded as a Dynamic Movement Primitive (DMP) and iteratively reshaped using policy-based reinforcement learning to create a diverse trajectory dataset for varying obstacle configurations. This dataset is used to train a neural network that takes as inputs the task parameters describing the obstacle dimensions and location, derived automatically from a point cloud, and outputs the DMP parameters that generate the trajectory. The approach is validated in simulation and real-robot experiments, outperforming a RRT-Connect baseline in terms of computation and execution time, as well as trajectory length, while supporting multi-modal trajectory generation for different obstacle geometries and end-effector dimensions. Videos and the implementation code are available at https://github.com/DominikUrbaniak/obst-avoid-dmp-pi2.
comment: 8 pages, 7 figures
Glovity: Learning Dexterous Contact-Rich Manipulation via Spatial Wrench Feedback Teleoperation System
We present Glovity, a novel, low-cost wearable teleoperation system that integrates a spatial wrench (force-torque) feedback device with a haptic glove featuring fingertip Hall sensor calibration, enabling feedback-rich dexterous manipulation. Glovity addresses key challenges in contact-rich tasks by providing intuitive wrench and tactile feedback, while overcoming embodiment gaps through precise retargeting. User studies demonstrate significant improvements: wrench feedback boosts success rates in book-flipping tasks from 48% to 78% and reduces completion time by 25%, while fingertip calibration enhances thin-object grasping success significantly compared to commercial glove. Furthermore, incorporating wrench signals into imitation learning (via DP-R3M) achieves high success rate in novel contact-rich scenarios, such as adaptive page flipping and force-aware handovers. All hardware designs, software will be open-sourced. Project website: https://glovity.github.io/
HANDO: Hierarchical Autonomous Navigation and Dexterous Omni-loco-manipulation IROS 2025
Seamless loco-manipulation in unstructured environments requires robots to leverage autonomous exploration alongside whole-body control for physical interaction. In this work, we introduce HANDO (Hierarchical Autonomous Navigation and Dexterous Omni-loco-manipulation), a two-layer framework designed for legged robots equipped with manipulators to perform human-centered mobile manipulation tasks. The first layer utilizes a goal-conditioned autonomous exploration policy to guide the robot to semantically specified targets, such as a black office chair in a dynamic environment. The second layer employs a unified whole-body loco-manipulation policy to coordinate the arm and legs for precise interaction tasks-for example, handing a drink to a person seated on the chair. We have conducted an initial deployment of the navigation module, and will continue to pursue finer-grained deployment of whole-body loco-manipulation.
comment: 4 pages, 2 figures, this paper has been accepted for the workshop Perception and Planning for Mobile Manipulation in Changing Environments (PM2CE) at IROS 2025
PLEXUS Hand: Lightweight Four-Motor Prosthetic Hand Enabling Precision-Lateral Dexterous Manipulation
Electric prosthetic hands should be lightweight to decrease the burden on the user, shaped like human hands for cosmetic purposes, and have motors inside to protect them from damage and dirt. In addition to the ability to perform daily activities, these features are essential for everyday use of the hand. In-hand manipulation is necessary to perform daily activities such as transitioning between different postures, particularly through rotational movements, such as reorienting cards before slot insertion and operating tools such as screwdrivers. However, currently used electric prosthetic hands only achieve static grasp postures, and existing manipulation approaches require either many motors, which makes the prosthesis heavy for daily use in the hand, or complex mechanisms that demand a large internal space and force external motor placement, complicating attachment and exposing the components to damage. Alternatively, we combine a single-axis thumb and optimized thumb positioning to achieve basic posture and in-hand manipulation, that is, the reorientation between precision and lateral grasps, using only four motors in a lightweight (311 g) prosthetic hand. Experimental validation using primitive objects of various widths (5-30 mm) and shapes (cylinders and prisms) resulted in success rates of 90-100% for reorientation tasks. The hand performed seal stamping and USB device insertion, as well as rotation to operate a screwdriver.
Flow-Opt: Scalable Centralized Multi-Robot Trajectory Optimization with Flow Matching and Differentiable Optimization
Centralized trajectory optimization in the joint space of multiple robots allows access to a larger feasible space that can result in smoother trajectories, especially while planning in tight spaces. Unfortunately, it is often computationally intractable beyond a very small swarm size. In this paper, we propose Flow-Opt, a learning-based approach towards improving the computational tractability of centralized multi-robot trajectory optimization. Specifically, we reduce the problem to first learning a generative model to sample different candidate trajectories and then using a learned Safety-Filter(SF) to ensure fast inference-time constraint satisfaction. We propose a flow-matching model with a diffusion transformer (DiT) augmented with permutation invariant robot position and map encoders as the generative model. We develop a custom solver for our SF and equip it with a neural network that predicts context-specific initialization. The initialization network is trained in a self-supervised manner, taking advantage of the differentiability of the SF solver. We advance the state-of-the-art in the following respects. First, we show that we can generate trajectories of tens of robots in cluttered environments in a few tens of milliseconds. This is several times faster than existing centralized optimization approaches. Moreover, our approach also generates smoother trajectories orders of magnitude faster than competing baselines based on diffusion models. Second, each component of our approach can be batched, allowing us to solve a few tens of problem instances in a fraction of a second. We believe this is a first such result; no existing approach provides such capabilities. Finally, our approach can generate a diverse set of trajectories between a given set of start and goal locations, which can capture different collision-avoidance behaviors.
Decentralized Multi-Robot Relative Navigation in Unknown, Structurally Constrained Environments under Limited Communication
Multi-robot navigation in unknown, structurally constrained, and GPS-denied environments presents a fundamental trade-off between global strategic foresight and local tactical agility, particularly under limited communication. Centralized methods achieve global optimality but suffer from high communication overhead, while distributed methods are efficient but lack the broader awareness to avoid deadlocks and topological traps. To address this, we propose a fully decentralized, hierarchical relative navigation framework that achieves both strategic foresight and tactical agility without a unified coordinate system. At the strategic layer, robots build and exchange lightweight topological maps upon opportunistic encounters. This process fosters an emergent global awareness, enabling the planning of efficient, trap-avoiding routes at an abstract level. This high-level plan then inspires the tactical layer, which operates on local metric information. Here, a sampling-based escape point strategy resolves dense spatio-temporal conflicts by generating dynamically feasible trajectories in real time, concurrently satisfying tight environmental and kinodynamic constraints. Extensive simulations and real-world experiments demonstrate that our system significantly outperforms in success rate and efficiency, especially in communication-limited environments with complex topological structures.
When a Robot is More Capable than a Human: Learning from Constrained Demonstrators
Learning from demonstrations enables experts to teach robots complex tasks using interfaces such as kinesthetic teaching, joystick control, and sim-to-real transfer. However, these interfaces often constrain the expert's ability to demonstrate optimal behavior due to indirect control, setup restrictions, and hardware safety. For example, a joystick can move a robotic arm only in a 2D plane, even though the robot operates in a higher-dimensional space. As a result, the demonstrations collected by constrained experts lead to suboptimal performance of the learned policies. This raises a key question: Can a robot learn a better policy than the one demonstrated by a constrained expert? We address this by allowing the agent to go beyond direct imitation of expert actions and explore shorter and more efficient trajectories. We use the demonstrations to infer a state-only reward signal that measures task progress, and self-label reward for unknown states using temporal interpolation. Our approach outperforms common imitation learning in both sample efficiency and task completion time. On a real WidowX robotic arm, it completes the task in 12 seconds, 10x faster than behavioral cloning, as shown in real-robot videos on https://sites.google.com/view/constrainedexpert .
Robust Visual Teach-and-Repeat Navigation with Flexible Topo-metric Graph Map Representation
Visual Teach-and-Repeat Navigation is a direct solution for mobile robot to be deployed in unknown environments. However, robust trajectory repeat navigation still remains challenged due to environmental changing and dynamic objects. In this paper, we propose a novel visual teach-and-repeat navigation system, which consists of a flexible map representation, robust map matching and a map-less local navigation module. During the teaching process, the recorded keyframes are formulated as a topo-metric graph and each node can be further extended to save new observations. Such representation also alleviates the requirement of globally consistent mapping. To enhance the place recognition performance during repeating process, instead of using frame-to-frame matching, we firstly implement keyframe clustering to aggregate similar connected keyframes into local map and perform place recognition based on visual frame-tolocal map matching strategy. To promote the local goal persistent tracking performance, a long-term goal management algorithm is constructed, which can avoid the robot getting lost due to environmental changes or obstacle occlusion. To achieve the goal without map, a local trajectory-control candidate optimization algorithm is proposed. Extensively experiments are conducted on our mobile platform. The results demonstrate that our system is superior to the baselines in terms of robustness and effectiveness.
Training Models to Detect Successive Robot Errors from Human Reactions
As robots become more integrated into society, detecting robot errors is essential for effective human-robot interaction (HRI). When a robot fails repeatedly, how can it know when to change its behavior? Humans naturally respond to robot errors through verbal and nonverbal cues that intensify over successive failures-from confusion and subtle speech changes to visible frustration and impatience. While prior work shows that human reactions can indicate robot failures, few studies examine how these evolving responses reveal successive failures. This research uses machine learning to recognize stages of robot failure from human reactions. In a study with 26 participants interacting with a robot that made repeated conversational errors, behavioral features were extracted from video data to train models for individual users. The best model achieved 93.5% accuracy for detecting errors and 84.1% for classifying successive failures. Modeling the progression of human reactions enhances error detection and understanding of repeated interaction breakdowns in HRI.
comment: Accepted to NERC '25
Visual Anomaly Detection for Reliable Robotic Implantation of Flexible Microelectrode Array IROS 2025
Flexible microelectrode (FME) implantation into brain cortex is challenging due to the deformable fiber-like structure of FME probe and the interaction with critical bio-tissue. To ensure reliability and safety, the implantation process should be monitored carefully. This paper develops an image-based anomaly detection framework based on the microscopic cameras of the robotic FME implantation system. The unified framework is utilized at four checkpoints to check the micro-needle, FME probe, hooking result, and implantation point, respectively. Exploiting the existing object localization results, the aligned regions of interest (ROIs) are extracted from raw image and input to a pretrained vision transformer (ViT). Considering the task specifications, we propose a progressive granularity patch feature sampling method to address the sensitivity-tolerance trade-off issue at different locations. Moreover, we select a part of feature channels with higher signal-to-noise ratios from the raw general ViT features, to provide better descriptors for each specific scene. The effectiveness of the proposed methods is validated with the image datasets collected from our implantation system.
comment: Accept by IROS 2025
iMoWM: Taming Interactive Multi-Modal World Model for Robotic Manipulation
Learned world models hold significant potential for robotic manipulation, as they can serve as simulator for real-world interactions. While extensive progress has been made in 2D video-based world models, these approaches often lack geometric and spatial reasoning, which is essential for capturing the physical structure of the 3D world. To address this limitation, we introduce iMoWM, a novel interactive world model designed to generate color images, depth maps, and robot arm masks in an autoregressive manner conditioned on actions. To overcome the high computational cost associated with three-dimensional information, we propose MMTokenizer, which unifies multi-modal inputs into a compact token representation. This design enables iMoWM to leverage large-scale pretrained VideoGPT models while maintaining high efficiency and incorporating richer physical information. With its multi-modal representation, iMoWM not only improves the visual quality of future predictions but also serves as an effective simulator for model-based reinforcement learning (MBRL) and facilitates real-world imitation learning. Extensive experiments demonstrate the superiority of iMoWM across these tasks, showcasing the advantages of multi-modal world modeling for robotic manipulation. Homepage: https://xingyoujun.github.io/imowm/
Exploring Single Domain Generalization of LiDAR-based Semantic Segmentation under Imperfect Labels
Accurate perception is critical for vehicle safety, with LiDAR as a key enabler in autonomous driving. To ensure robust performance across environments, sensor types, and weather conditions without costly re-annotation, domain generalization in LiDAR-based 3D semantic segmentation is essential. However, LiDAR annotations are often noisy due to sensor imperfections, occlusions, and human errors. Such noise degrades segmentation accuracy and is further amplified under domain shifts, threatening system reliability. While noisy-label learning is well-studied in images, its extension to 3D LiDAR segmentation under domain generalization remains largely unexplored, as the sparse and irregular structure of point clouds limits direct use of 2D methods. To address this gap, we introduce the novel task Domain Generalization for LiDAR Semantic Segmentation under Noisy Labels (DGLSS-NL) and establish the first benchmark by adapting three representative noisy-label learning strategies from image classification to 3D segmentation. However, we find that existing noisy-label learning approaches adapt poorly to LiDAR data. We therefore propose DuNe, a dual-view framework with strong and weak branches that enforce feature-level consistency and apply cross-entropy loss based on confidence-aware filtering of predictions. Our approach shows state-of-the-art performance by achieving 56.86% mIoU on SemanticKITTI, 42.28% on nuScenes, and 52.58% on SemanticPOSS under 10% symmetric label noise, with an overall Arithmetic Mean (AM) of 49.57% and Harmonic Mean (HM) of 48.50%, thereby demonstrating robust domain generalization in DGLSS-NL tasks. The code is available on our project page.
Trust Modeling and Estimation in Human-Autonomy Interactions
Advances in the control of autonomous systems have accompanied an expansion in the potential applications for autonomous robotic systems. The success of applications involving humans depends on the quality of interaction between the autonomous system and the human supervisor, which is particularly affected by the degree of trust that the supervisor places in the autonomous system. Absent from the literature are models of supervisor trust dynamics that can accommodate asymmetric responses to autonomous system performance and the intermittent nature of supervisor-autonomous system communication. This paper focuses on formulating an estimated model of supervisor trust that incorporates both of these features by employing a switched linear system structure with event-triggered sampling of the model input and output. Trust response data collected in a user study with 51 participants were then used identify parameters for a switched linear model-based observer of supervisor trust.
comment: 10 pages. 13 figures
A geometrical approach to solve the proximity of a point to an axisymmetric quadric in space
This paper presents the classification of a general quadric into an axisymmetric quadric (AQ) and the solution to the problem of the proximity of a given point to an AQ. The problem of proximity in $R^3$ is reduced to the same in $R^2$, which is not found in the literature. A new method to solve the problem in $R^2$ is used based on the geometrical properties of the conics, such as sub-normal, length of the semi-major axis, eccentricity, slope and radius. Furthermore, the problem in $R^2$ is categorised into two and three more sub-cases for parabola and ellipse/hyperbola, respectively, depending on the location of the point, which is a novel approach as per the authors' knowledge. The proposed method is suitable for implementation in a common programming language, such as C and proved to be faster than a commercial library, namely, Bullet.
Direct Data-Driven Predictive Control for a Three-dimensional Cable-Driven Soft Robotic Arm
Soft robots offer significant advantages in safety and adaptability, yet achieving precise and dynamic control remains a major challenge due to their inherently complex and nonlinear dynamics. Recently, Data-enabled Predictive Control (DeePC) has emerged as a promising model-free approach that bypasses explicit system identification by directly leveraging input-output data. While DeePC has shown success in other domains, its application to soft robots remains underexplored, particularly for three-dimensional (3D) soft robotic systems. This paper addresses this gap by developing and experimentally validating an effective DeePC framework on a 3D, cable-driven soft arm. Specifically, we design and fabricate a soft robotic arm with a thick tubing backbone for stability, a dense silicone body with large cavities for strength and flexibility, and rigid endcaps for secure termination. Using this platform, we implement DeePC with singular value decomposition (SVD)-based dimension reduction for two key control tasks: fixed-point regulation and trajectory tracking in 3D space. Comparative experiments with a baseline model-based controller demonstrate DeePC's superior accuracy, robustness, and adaptability, highlighting its potential as a practical solution for dynamic control of soft robots.
Model-Based Lookahead Reinforcement Learning for in-hand manipulation
In-Hand Manipulation, as many other dexterous tasks, remains a difficult challenge in robotics by combining complex dynamic systems with the capability to control and manoeuvre various objects using its actuators. This work presents the application of a previously developed hybrid Reinforcement Learning (RL) Framework to In-Hand Manipulation task, verifying that it is capable of improving the performance of the task. The model combines concepts of both Model-Free and Model-Based Reinforcement Learning, by guiding a trained policy with the help of a dynamic model and value-function through trajectory evaluation, as done in Model Predictive Control. This work evaluates the performance of the model by comparing it with the policy that will be guided. To fully explore this, various tests are performed using both fully-actuated and under-actuated simulated robotic hands to manipulate different objects for a given task. The performance of the model will also be tested for generalization tests, by changing the properties of the objects in which both the policy and dynamic model were trained, such as density and size, and additionally by guiding a trained policy in a certain object to perform the same task in a different one. The results of this work show that, given a policy with high average reward and an accurate dynamic model, the hybrid framework improves the performance of in-hand manipulation tasks for most test cases, even when the object properties are changed. However, this improvement comes at the expense of increasing the computational cost, due to the complexity of trajectory evaluation.
Online IMU-odometer Calibration using GNSS Measurements for Autonomous Ground Vehicle Localization
Accurate calibration of intrinsic (odometer scaling factors) and extrinsic parameters (IMU-odometer translation and rotation) is essential for autonomous ground vehicle localization. Existing GNSS-aided approaches often rely on positioning results or raw measurements without ambiguity resolution, and their observability properties remain underexplored. This paper proposes a tightly coupled online calibration method that fuses IMU, odometer, and raw GNSS measurements (pseudo-range, carrier-phase, and Doppler) within an extendable factor graph optimization (FGO) framework, incorporating outlier mitigation and ambiguity resolution. Observability analysis reveals that two horizontal translation and three rotation parameters are observable under general motion, while vertical translation remains unobservable. Simulation and real-world experiments demonstrate superior calibration and localization performance over state-of-the-art loosely coupled methods. Specifically, the IMU-odometer positioning using our calibrated parameters achieves the absolute maximum error of 17.75 m while the one of LC method is 61.51 m, achieving up to 71.14 percent improvement. To foster further research, we also release the first open-source dataset that combines IMU, 2D odometer, and raw GNSS measurements from both rover and base stations.
comment: Submitted to IEEE Transactions on Intelligent Transportation Systems
Computing Safe Control Inputs using Discrete-Time Matrix Control Barrier Functions via Convex Optimization
Control barrier functions (CBFs) have seen widespread success in providing forward invariance and safety guarantees for dynamical control systems. A crucial limitation of discrete-time formulations is that CBFs that are nonconcave in their argument require the solution of nonconvex optimization problems to compute safety-preserving control inputs, which inhibits real-time computation of control inputs guaranteeing forward invariance. This paper presents a novel method for computing safety-preserving control inputs for discrete-time systems with nonconvex safety sets, utilizing convex optimization and the recently developed class of matrix control barrier function techniques. The efficacy of our methods is demonstrated through numerical simulations on a bicopter system.
comment: 17 pages, 8 figures
Cross-Sensor Touch Generation
Today's visuo-tactile sensors come in many shapes and sizes, making it challenging to develop general-purpose tactile representations. This is because most models are tied to a specific sensor design. To address this challenge, we propose two approaches to cross-sensor image generation. The first is an end-to-end method that leverages paired data (Touch2Touch). The second method builds an intermediate depth representation and does not require paired data (T2D2: Touch-to-Depth-to-Touch). Both methods enable the use of sensor-specific models across multiple sensors via the cross-sensor touch generation process. Together, these models offer flexible solutions for sensor translation, depending on data availability and application needs. We demonstrate their effectiveness on downstream tasks such as in-hand pose estimation and behavior cloning, successfully transferring models trained on one sensor to another. Project page: https://samantabelen.github.io/cross_sensor_touch_generation.
comment: CoRL 2025
Enhancing Diffusion Policy with Classifier-Free Guidance for Temporal Robotic Tasks
Temporal sequential tasks challenge humanoid robots, as existing Diffusion Policy (DP) and Action Chunking with Transformers (ACT) methods often lack temporal context, resulting in local optima traps and excessive repetitive actions. To address these issues, this paper introduces a Classifier-Free Guidance-Based Diffusion Policy (CFG-DP), a novel framework to enhance DP by integrating Classifier-Free Guidance (CFG) with conditional and unconditional models. Specifically, CFG leverages timestep inputs to track task progression and ensure precise cycle termination. It dynamically adjusts action predictions based on task phase, using a guidance factor tuned to balance temporal coherence and action accuracy. Real-world experiments on a humanoid robot demonstrate high success rates and minimal repetitive actions. Furthermore, we assessed the model's ability to terminate actions and examined how different components and parameter adjustments affect its performance. This framework significantly enhances deterministic control and execution reliability for sequential robotic tasks.
comment: 7 pages, 7 figures
Navigation and Exploration with Active Inference: from Biology to Industry
By building and updating internal cognitive maps, animals exhibit extraordinary navigation abilities in complex, dynamic environments. Inspired by these biological mechanisms, we present a real time robotic navigation system grounded in the Active Inference Framework (AIF). Our model incrementally constructs a topological map, infers the agent's location, and plans actions by minimising expected uncertainty and fulfilling perceptual goals without any prior training. Integrated into the ROS2 ecosystem, we validate its adaptability and efficiency across both 2D and 3D environments (simulated and real world), demonstrating competitive performance with traditional and state of the art exploration approaches while offering a biologically inspired navigation approach.
comment: conference IWAI 2025 - accepted (in processing)
Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning Code LLMs
Code LLMs have shown promising results with converting tasks in natural language to programs that can be executed by service robots. We are interested in finetuning small, specialized LLMs for this purpose, but collecting datasets of task-program pairs specific to each robot is time-consuming and expensive. While approaches such as SELF-INSTRUCT and EVOL-INSTRUCT are capable of generating novel tasks given a few examples, they are unable to provide the corresponding programs that correctly abide by physical-world and robot-constraints using the provided programming interface. Using a simulator is a natural potential solution to checking for such constraints, but building simulation environments that can handle arbitrary tasks and their necessary objects and locations, is challenging. To address these challenges, we introduce ROBO-INSTRUCT, which synthesizes task-specific simulation environments on the fly during program execution, by opportunistically inferring entity properties and enforcing corresponding constraints based on how the entities are used in the task program. Additionally, ROBO-INSTRUCT integrates an LLM-aided post-processing procedure to refine instructions for better alignment with robot programs. We demonstrate the effectiveness of ROBO-INSTRUCT across multiple LLMs, showing that our fine-tuned models outperform all baseline methods and even match or surpass the performance of several larger and proprietary models.
comment: Conference on Language Modeling (COLM) 2025, Project site: https://amrl.cs.utexas.edu/robo-instruct/
SwarmGPT: Combining Large Language Models with Safe Motion Planning for Drone Swarm Choreography
Drone swarm performances -- synchronized, expressive aerial displays set to music -- have emerged as a captivating application of modern robotics. Yet designing smooth, safe choreographies remains a complex task requiring expert knowledge. We present SwarmGPT, a language-based choreographer that leverages the reasoning power of large language models (LLMs) to streamline drone performance design. The LLM is augmented by a safety filter that ensures deployability by making minimal corrections when safety or feasibility constraints are violated. By decoupling high-level choreographic design from low-level motion planning, our system enables non-experts to iteratively refine choreographies using natural language without worrying about collisions or actuator limits. We validate our approach through simulations with swarms up to 200 drones and real-world experiments with up to 20 drones performing choreographies to diverse types of songs, demonstrating scalable, synchronized, and safe performances. Beyond entertainment, this work offers a blueprint for integrating foundation models into safety-critical swarm robotics applications.
comment: Accepted at RA-L 2025
Extending First-order Robotic Motion Planners to Second-order Robot Dynamics
This paper extends first-order motion planners to robots governed by second-order dynamics. Two control schemes are proposed based on the knowledge of a scalar function whose negative gradient aligns with a given first-order motion planner. When such a function is known, the first-order motion planner is combined with a damping velocity vector with a dynamic gain to extend the safety and convergence guarantees of the first-order motion planner to second-order systems. If no such function is available, we propose an alternative control scheme ensuring that the error between the robot's velocity and the first-order motion planner converges to zero. The theoretical developments are supported by simulation results demonstrating the effectiveness of the proposed approaches.
comment: 14 pages, 10 figures
MP1: MeanFlow Tames Policy Learning in 1-step for Robotic Manipulation
In robot manipulation, robot learning has become a prevailing approach. However, generative models within this field face a fundamental trade-off between the slow, iterative sampling of diffusion models and the architectural constraints of faster Flow-based methods, which often rely on explicit consistency losses. To address these limitations, we introduce MP1, which pairs 3D point-cloud inputs with the MeanFlow paradigm to generate action trajectories in one network function evaluation (1-NFE). By directly learning the interval-averaged velocity via the "MeanFlow Identity", our policy avoids any additional consistency constraints. This formulation eliminates numerical ODE-solver errors during inference, yielding more precise trajectories. MP1 further incorporates CFG for improved trajectory controllability while retaining 1-NFE inference without reintroducing structural constraints. Because subtle scene-context variations are critical for robot learning, especially in few-shot learning, we introduce a lightweight Dispersive Loss that repels state embeddings during training, boosting generalization without slowing inference. We validate our method on the Adroit and Meta-World benchmarks, as well as in real-world scenarios. Experimental results show MP1 achieves superior average task success rates, outperforming DP3 by 10.2% and FlowPolicy by 7.3%. Its average inference time is only 6.8 ms-19x faster than DP3 and nearly 2x faster than FlowPolicy. Our project page is available at https://mp1-2254.github.io/, and the code can be accessed at https://github.com/LogSSim/MP1.
The Impact of 2D Segmentation Backbones on Point Cloud Predictions Using 4D Radar
LiDAR's dense, sharp point cloud (PC) representations of the surrounding environment enable accurate perception and significantly improve road safety by offering greater scene awareness and understanding. However, LiDAR's high cost continues to restrict the broad adoption of high-level Autonomous Driving (AD) systems in commercially available vehicles. Prior research has shown progress towards circumventing the need for LiDAR by training a neural network, using LiDAR point clouds as ground truth (GT), to produce LiDAR-like 3D point clouds using only 4D Radars. One of the best examples is a neural network created to train a more efficient radar target detector with a modular 2D convolutional neural network (CNN) backbone and a temporal coherence network at its core that uses the RaDelft dataset for training (see arXiv:2406.04723). In this work, we investigate the impact of higher-capacity segmentation backbones on the quality of the produced point clouds. Our results show that while very high-capacity models may actually hurt performance, an optimal segmentation backbone can provide a 23.7% improvement over the state-of-the-art (SOTA).
PeRoI: A Pedestrian-Robot Interaction Dataset for Learning Avoidance, Neutrality, and Attraction Behaviors in Social Navigation
Robots are increasingly being deployed in public spaces such as shopping malls, sidewalks, and hospitals, where safe and socially aware navigation depends on anticipating how pedestrians respond to their presence. However, existing datasets rarely capture the full spectrum of robot-induced reactions, e.g., avoidance, neutrality, attraction, which limits progress in modeling these interactions. In this paper, we present the Pedestrian-Robot Interaction~(PeRoI) dataset that captures pedestrian motions categorized into attraction, neutrality, and repulsion across two outdoor sites under three controlled conditions: no robot present, with stationary robot, and with moving robot. This design explicitly reveals how pedestrian behavior varies across robot contexts, and we provide qualitative and quantitative comparisons to established state-of-the-art datasets. Building on these data, we propose the Neural Robot Social Force Model~(NeuRoSFM), an extension of the Social Force Model that integrates neural networks to augment inter-human dynamics with learned components and explicit robot-induced forces to better predict pedestrian motion in vicinity of robots. We evaluate NeuRoSFM by generating trajectories on multiple real-world datasets. The results demonstrate improved modeling of pedestrian-robot interactions, leading to better prediction accuracy, and highlight the value of our dataset and method for advancing socially aware navigation strategies in human-centered environments.
SMapper: A Multi-Modal Data Acquisition Platform for SLAM Benchmarking
Advancing research in fields such as Simultaneous Localization and Mapping (SLAM) and autonomous navigation critically depends on the availability of reliable and reproducible multimodal datasets. While several influential datasets have driven progress in these domains, they often suffer from limitations in sensing modalities, environmental diversity, and the reproducibility of the underlying hardware setups. To address these challenges, this paper introduces SMapper, a novel open-hardware, multi-sensor platform designed explicitly for, though not limited to, SLAM research. The device integrates synchronized LiDAR, multi-camera, and inertial sensing, supported by a robust calibration and synchronization pipeline that ensures precise spatio-temporal alignment across modalities. Its open and replicable design allows researchers to extend its capabilities and reproduce experiments across both handheld and robot-mounted scenarios. To demonstrate its practicality, we additionally release SMapper-light, a publicly available SLAM dataset containing representative indoor and outdoor sequences. The dataset includes tightly synchronized multimodal data and ground truth trajectories derived from offline LiDAR-based SLAM with sub-centimeter accuracy, alongside dense 3D reconstructions. Furthermore, the paper contains benchmarking results on state-of-the-art LiDAR and visual SLAM frameworks using the SMapper-light dataset. By combining open-hardware design, reproducible data collection, and comprehensive benchmarking, SMapper establishes a robust foundation for advancing SLAM algorithm development, evaluation, and reproducibility. The project's documentation, including source code, CAD models, and dataset links, is publicly available at https://snt-arg.github.io/smapper_docs.
comment: 13 pages, 5 figures, 6 tables
Multi-robot Rigid Formation Navigation via Synchronous Motion and Discrete-time Communication-Control Optimization
Rigid-formation navigation of multiple robots is essential for applications such as cooperative transportation. This process involves a team of collaborative robots maintaining a predefined geometric configuration, such as a square, while in motion. For untethered collaborative motion, inter-robot communication must be conducted through a wireless network. Notably, few existing works offer a comprehensive solution for multi-robot formation navigation executable on microprocessor platforms via wireless networks, particularly for formations that must traverse complex curvilinear paths. To address this gap, we introduce a novel "hold-and-hit" communication-control framework designed to work seamlessly with the widely-used Robotic Operating System (ROS) platform. The hold-and-hit framework synchronizes robot movements in a manner robust against wireless network delays and packet loss. It operates over discrete-time communication-control cycles, making it suitable for implementation on contemporary microprocessors. Complementary to hold-and-hit, we propose an intra-cycle optimization approach that enables rigid formations to closely follow desired curvilinear paths, even under the nonholonomic movement constraints inherent to most vehicular robots. The combination of hold-and-hit and intra-cycle optimization ensures precise and reliable navigation even in challenging scenarios. Simulations in a virtual environment demonstrate the superiority of our method in maintaining a four-robot square formation along an S-shaped path, outperforming two existing approaches. Furthermore, real-world experiments validate the effectiveness of our framework: the robots maintained an inter-distance error within $\pm 0.069m$ and an inter-angular orientation error within $\pm19.15^{\circ}$ while navigating along an S-shaped path at a fixed linear velocity of $0.1 m/s$.
A Multimodal Depth-Aware Method For Embodied Reference Understanding
Embodied Reference Understanding requires identifying a target object in a visual scene based on both language instructions and pointing cues. While prior works have shown progress in open-vocabulary object detection, they often fail in ambiguous scenarios where multiple candidate objects exist in the scene. To address these challenges, we propose a novel ERU framework that jointly leverages LLM-based data augmentation, depth-map modality, and a depth-aware decision module. This design enables robust integration of linguistic and embodied cues, improving disambiguation in complex or cluttered environments. Experimental results on two datasets demonstrate that our approach significantly outperforms existing baselines, achieving more accurate and reliable referent detection.
A Knowledge-Informed Deep Learning Paradigm for Generalizable and Stability-Optimized Car-Following Models
Car-following models (CFMs) are fundamental to traffic flow analysis and autonomous driving. Although calibrated physics-based and trained data-driven CFMs can replicate human driving behavior, their reliance on specific datasets limits generalization across diverse scenarios and reduces reliability in real-world deployment. Moreover, these models typically focus on behavioral fidelity and do not support the explicit optimization of local and string stability, which are increasingly important for the safe and efficient operation of autonomous vehicles (AVs). To address these limitations, we propose a Knowledge-Informed Deep Learning (KIDL) paradigm that distills the generalization capabilities of pre-trained Large Language Models (LLMs) into a lightweight and stability-aware neural architecture. LLMs are used to extract fundamental car-following knowledge beyond dataset-specific patterns, and this knowledge is transferred to a reliable, tractable, and computationally efficient model through knowledge distillation. KIDL also incorporates stability constraints directly into its training objective, ensuring that the resulting model not only emulates human-like behavior but also satisfies the local and string stability requirements essential for real-world AV deployment. We evaluate KIDL on the real-world NGSIM and HighD datasets, comparing its performance with representative physics-based, data-driven, and hybrid CFMs. Both empirical and theoretical results consistently demonstrate KIDL's superior behavioral generalization and traffic flow stability, offering a robust and scalable solution for next-generation traffic systems.
An Introduction to Zero-Order Optimization Techniques for Robotics
Zero-order optimization techniques are becoming increasingly popular in robotics due to their ability to handle non-differentiable functions and escape local minima. These advantages make them particularly useful for trajectory optimization and policy optimization. In this work, we propose a mathematical tutorial on random search. It offers a simple and unifying perspective for understanding a wide range of algorithms commonly used in robotics. Leveraging this viewpoint, we classify many trajectory optimization methods under a common framework and derive novel competitive RL algorithms.
CCDP: Composition of Conditional Diffusion Policies with Guided Sampling IROS 2025
Imitation Learning offers a promising approach to learn directly from data without requiring explicit models, simulations, or detailed task definitions. During inference, actions are sampled from the learned distribution and executed on the robot. However, sampled actions may fail for various reasons, and simply repeating the sampling step until a successful action is obtained can be inefficient. In this work, we propose an enhanced sampling strategy that refines the sampling distribution to avoid previously unsuccessful actions. We demonstrate that by solely utilizing data from successful demonstrations, our method can infer recovery actions without the need for additional exploratory behavior or a high-level controller. Furthermore, we leverage the concept of diffusion model decomposition to break down the primary problem, which may require long-horizon history to manage failures, into multiple smaller, more manageable sub-problems in learning, data collection, and inference, thereby enabling the system to adapt to variable failure counts. Our approach yields a low-level controller that dynamically adjusts its sampling space to improve efficiency when prior samples fall short. We validate our method across several tasks, including door opening with unknown directions, object manipulation, and button-searching scenarios, demonstrating that our approach outperforms traditional baselines.
comment: Accepted to IROS 2025
A Real-Time System for Scheduling and Managing UAV Delivery in Urban Areas
As urban logistics demand continues to grow, UAV delivery has become a key solution to improve delivery efficiency, reduce traffic congestion, and lower logistics costs. However, to fully leverage the potential of UAV delivery networks, efficient swarm scheduling and management are crucial. In this paper, we propose a real-time scheduling and management system based on the ``Airport-Unloading Station" model, aiming to bridge the gap between high-level scheduling algorithms and low-level execution systems. This system, acting as middleware, accurately translates the requirements from the scheduling layer into specific execution instructions, ensuring that the scheduling algorithms perform effectively in real-world environments. Additionally, we implement three collaborative scheduling schemes involving autonomous ground vehicles (AGVs), unmanned aerial vehicles (UAVs), and ground staff to further optimize overall delivery efficiency. Through extensive experiments, this study demonstrates the rationality and feasibility of the proposed management system, providing practical solution for the commercial application of UAVs delivery in urban. Code: https://github.com/chengji253/UAVDeliverySystem
Aegis: Automated Error Generation and Attribution for Multi-Agent Systems
Large language model based multi-agent systems (MAS) have unlocked significant advancements in tackling complex problems, but their increasing capability introduces a structural fragility that makes them difficult to debug. A key obstacle to improving their reliability is the severe scarcity of large-scale, diverse datasets for error attribution, as existing resources rely on costly and unscalable manual annotation. To address this bottleneck, we introduce Aegis, a novel framework for Automated error generation and attribution for multi-agent systems. Aegis constructs a large dataset of 9,533 trajectories with annotated faulty agents and error modes, covering diverse MAS architectures and task domains. This is achieved using a LLM-based manipulator that can adaptively inject context-aware errors into successful execution trajectories. Leveraging fine-grained labels and the structured arrangement of positive-negative sample pairs, Aegis supports three different learning paradigms: Supervised Fine-Tuning, Reinforcement Learning, and Contrastive Learning. We develop learning methods for each paradigm. Comprehensive experiments show that trained models consistently achieve substantial improvements in error attribution. Notably, several of our fine-tuned LLMs demonstrate performance competitive with or superior to proprietary models an order of magnitude larger, validating our automated data generation framework as a crucial resource for developing more robust and interpretable multi-agent systems. Our project website is available at https://kfq20.github.io/Aegis-Website/.
Nav-EE: Navigation-Guided Early Exiting for Efficient Vision-Language Models in Autonomous Driving
Vision-Language Models (VLMs) are increasingly applied in autonomous driving for unified perception and reasoning, but high inference latency hinders real-time deployment. Early-exit reduces latency by terminating inference at intermediate layers, yet its task-dependent nature limits generalization across diverse scenarios. We observe that this limitation aligns with autonomous driving: navigation systems can anticipate upcoming contexts (e.g., intersections, traffic lights), indicating which tasks will be required. We propose Nav-EE, a navigation-guided early-exit framework that precomputes task-specific exit layers offline and dynamically applies them online based on navigation priors. Experiments on CODA, Waymo, and BOSCH show that Nav-EE achieves accuracy comparable to full inference while reducing latency by up to 63.9%. Real-vehicle integration with Autoware Universe further demonstrates reduced inference latency (600ms to 300ms), supporting faster decision-making in complex scenarios. These results suggest that coupling navigation foresight with early-exit offers a viable path toward efficient deployment of large models in autonomous systems. Code and data are available at our anonymous repository: https://anonymous.4open.science/r/Nav-EE-BBC4
Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functions
In reinforcement learning, off-policy actor-critic methods like DDPG and TD3 use deterministic policy gradients: the Q-function is learned from environment data, while the actor maximizes it via gradient ascent. We observe that in complex tasks such as dexterous manipulation and restricted locomotion with mobility constraints, the Q-function exhibits many local optima, making gradient ascent prone to getting stuck. To address this, we introduce SAVO, an actor architecture that (i) generates multiple action proposals and selects the one with the highest Q-value, and (ii) approximates the Q-function repeatedly by truncating poor local optima to guide gradient ascent more effectively. We evaluate tasks such as restricted locomotion, dexterous manipulation, and large discrete-action space recommender systems and show that our actor finds optimal actions more frequently and outperforms alternate actor architectures.
comment: Outstanding Paper Award on Empirical Reinforcement Learning Research, RLC 2025
AirScape: An Aerial Generative World Model with Motion Controllability
How to enable agents to predict the outcomes of their own motion intentions in three-dimensional space has been a fundamental problem in embodied intelligence. To explore general spatial imagination capability, we present AirScape, the first world model designed for six-degree-of-freedom aerial agents. AirScape predicts future observation sequences based on current visual inputs and motion intentions. Specifically, we construct a dataset for aerial world model training and testing, which consists of 11k video-intention pairs. This dataset includes first-person-view videos capturing diverse drone actions across a wide range of scenarios, with over 1,000 hours spent annotating the corresponding motion intentions. Then we develop a two-phase schedule to train a foundation model--initially devoid of embodied spatial knowledge--into a world model that is controllable by motion intentions and adheres to physical spatio-temporal constraints. Experimental results demonstrate that AirScape significantly outperforms existing foundation models in 3D spatial imagination capabilities, especially with over a 50% improvement in metrics reflecting motion alignment. The project is available at: https://embodiedcity.github.io/AirScape/.
Maximizing UAV Cellular Connectivity with Reinforcement Learning for BVLoS Path Planning
This paper presents a reinforcement learning (RL) based approach for path planning of cellular connected unmanned aerial vehicles (UAVs) operating beyond visual line of sight (BVLoS). The objective is to minimize travel distance while maximizing the quality of cellular link connectivity by considering real world aerial coverage constraints and employing an empirical aerial channel model. The proposed solution employs RL techniques to train an agent, using the quality of communication links between the UAV and base stations (BSs) as the reward function. Simulation results demonstrate the effectiveness of the proposed method in training the agent and generating feasible UAV path plans. The proposed approach addresses the challenges due to limitations in UAV cellular communications, highlighting the need for investigations and considerations in this area. The RL algorithm efficiently identifies optimal paths, ensuring maximum connectivity with ground BSs to ensure safe and reliable BVLoS flight operation. Moreover, the solution can be deployed as an offline path planning module that can be integrated into future ground control systems (GCS) for UAV operations, enhancing their capabilities and safety. The method holds potential for complex long range UAV applications, advancing the technology in the field of cellular connected UAV path planning.
comment: Submitted to an IEEE Conference
DPL: Depth-only Perceptive Humanoid Locomotion via Realistic Depth Synthesis and Cross-Attention Terrain Reconstruction
Recent advancements in legged robot perceptive locomotion have shown promising progress. However, terrain-aware humanoid locomotion remains largely constrained to two paradigms: depth image-based end-to-end learning and elevation map-based methods. The former suffers from limited training efficiency and a significant sim-to-real gap in depth perception, while the latter depends heavily on multiple vision sensors and localization systems, resulting in latency and reduced robustness. To overcome these challenges, we propose a novel framework that tightly integrates three key components: (1) Terrain-Aware Locomotion Policy with a Blind Backbone, which leverages pre-trained elevation map-based perception to guide reinforcement learning with minimal visual input; (2) Multi-Modality Cross-Attention Transformer, which reconstructs structured terrain representations from noisy depth images; (3) Realistic Depth Images Synthetic Method, which employs self-occlusion-aware ray casting and noise-aware modeling to synthesize realistic depth observations, achieving over 30\% reduction in terrain reconstruction error. This combination enables efficient policy training with limited data and hardware resources, while preserving critical terrain features essential for generalization. We validate our framework on a full-sized humanoid robot, demonstrating agile and adaptive locomotion across diverse and challenging terrains.
Automating eHMI Action Design with LLMs for Automated Vehicle Communication EMNLP 2025
The absence of explicit communication channels between automated vehicles (AVs) and other road users requires the use of external Human-Machine Interfaces (eHMIs) to convey messages effectively in uncertain scenarios. Currently, most eHMI studies employ predefined text messages and manually designed actions to perform these messages, which limits the real-world deployment of eHMIs, where adaptability in dynamic scenarios is essential. Given the generalizability and versatility of large language models (LLMs), they could potentially serve as automated action designers for the message-action design task. To validate this idea, we make three contributions: (1) We propose a pipeline that integrates LLMs and 3D renderers, using LLMs as action designers to generate executable actions for controlling eHMIs and rendering action clips. (2) We collect a user-rated Action-Design Scoring dataset comprising a total of 320 action sequences for eight intended messages and four representative eHMI modalities. The dataset validates that LLMs can translate intended messages into actions close to a human level, particularly for reasoning-enabled LLMs. (3) We introduce two automated raters, Action Reference Score (ARS) and Vision-Language Models (VLMs), to benchmark 18 LLMs, finding that the VLM aligns with human preferences yet varies across eHMI modalities.
comment: Accepted as findings for EMNLP 2025
SHeRLoc: Synchronized Heterogeneous Radar Place Recognition for Cross-Modal Localization
Despite the growing adoption of radar in robotics, the majority of research has been confined to homogeneous sensor types, overlooking the integration and cross-modality challenges inherent in heterogeneous radar technologies. This leads to significant difficulties in generalizing across diverse radar data types, with modality-aware approaches that could leverage the complementary strengths of heterogeneous radar remaining unexplored. To bridge these gaps, we propose SHeRLoc, the first deep network tailored for heterogeneous radar, which utilizes RCS polar matching to align multimodal radar data. Our hierarchical optimal transport-based feature aggregation method generates rotationally robust multi-scale descriptors. By employing FFT-similarity-based data mining and adaptive margin-based triplet loss, SHeRLoc enables FOV-aware metric learning. SHeRLoc achieves an order of magnitude improvement in heterogeneous radar place recognition, increasing recall@1 from below 0.1 to 0.9 on a public dataset and outperforming state of-the-art methods. Also applicable to LiDAR, SHeRLoc paves the way for cross-modal place recognition and heterogeneous sensor SLAM. The supplementary materials and source code are available at https://sites.google.com/view/radar-sherloc.
comment: 9 pages, 9 figures, accepted to RA-L
USIM and U0: A Vision-Language-Action Dataset and Model for General Underwater Robots
Underwater environments present unique challenges for robotic operation, including complex hydrodynamics, limited visibility, and constrained communication. Although data-driven approaches have advanced embodied intelligence in terrestrial robots and enabled task-specific autonomous underwater robots, developing underwater intelligence capable of autonomously performing multiple tasks remains highly challenging, as large-scale, high-quality underwater datasets are still scarce. To address these limitations, we introduce USIM, a simulation-based multi-task Vision-Language-Action (VLA) dataset for underwater robots. USIM comprises over 561K frames from 1,852 trajectories, totaling approximately 15.6 hours of BlueROV2 interactions across 20 tasks in 9 diverse scenarios, ranging from visual navigation to mobile manipulation. Building upon this dataset, we propose U0, a VLA model for general underwater robots, which integrates binocular vision and other sensor modalities through multimodal fusion, and further incorporates a convolution-attention-based perception focus enhancement module (CAP) to improve spatial understanding and mobile manipulation. Across tasks such as inspection, obstacle avoidance, scanning, and dynamic tracking, the framework achieves a success rate of 80%, while in challenging mobile manipulation tasks, it reduces the distance to the target by 21.2% compared with baseline methods, demonstrating its effectiveness. USIM and U0 show that VLA models can be effectively applied to underwater robotic applications, providing a foundation for scalable dataset construction, improved task autonomy, and the practical realization of intelligent general underwater robots.
comment: Project Page: https://vincentgu2000.github.io/u0project/
An Imitative Reinforcement Learning Framework for Pursuit-Lock-Launch Missions
Unmanned Combat Aerial Vehicle (UCAV) Within-Visual-Range (WVR) engagement, referring to a fight between two or more UCAVs at close quarters, plays a decisive role on the aerial battlefields. With the development of artificial intelligence, WVR engagement progressively advances towards intelligent and autonomous modes. However, autonomous WVR engagement policy learning is hindered by challenges such as weak exploration capabilities, low learning efficiency, and unrealistic simulated environments. To overcome these challenges, we propose a novel imitative reinforcement learning framework, which efficiently leverages expert data while enabling autonomous exploration. The proposed framework not only enhances learning efficiency through expert imitation, but also ensures adaptability to dynamic environments via autonomous exploration with reinforcement learning. Therefore, the proposed framework can learn a successful policy of `pursuit-lock-launch' for UCAVs. To support data-driven learning, we establish an environment based on the Harfang3D sandbox. The extensive experiment results indicate that the proposed framework excels in this multistage task, and significantly outperforms state-of-the-art reinforcement learning and imitation learning methods. Thanks to the ability of imitating experts and autonomous exploration, our framework can quickly learn the critical knowledge in complex aerial combat tasks, achieving up to a 100% success rate and demonstrating excellent robustness.
Event-RGB Fusion for Spacecraft Pose Estimation Under Harsh Lighting
Spacecraft pose estimation is crucial for autonomous in-space operations, such as rendezvous, docking and on-orbit servicing. Vision-based pose estimation methods, which typically employ RGB imaging sensors, is a compelling solution for spacecraft pose estimation, but are challenged by harsh lighting conditions, which produce imaging artifacts such as glare, over-exposure, blooming and lens flare. Due to their much higher dynamic range, neuromorphic or event sensors are more resilient to extreme lighting conditions. However, event sensors generally have lower spatial resolution and suffer from reduced signal-to-noise ratio during periods of low relative motion. This work addresses these individual sensor limitations by introducing a sensor fusion approach combining RGB and event sensors. A beam-splitter prism was employed to achieve precise optical and temporal alignment. Then, a RANSAC-based technique was developed to fuse the information from the RGB and event channels to achieve pose estimation that leveraged the strengths of the two modalities. The pipeline was complemented by dropout uncertainty estimation to detect extreme conditions that affect either channel. To benchmark the performance of the proposed event-RGB fusion method, we collected a comprehensive real dataset of RGB and event data for satellite pose estimation in a laboratory setting under a variety of challenging illumination conditions. Encouraging results on the dataset demonstrate the efficacy of our event-RGB fusion approach and further supports the usage of event sensors for spacecraft pose estimation. To support community research on this topic, our dataset has been released publicly.
Learning a Shape-adaptive Assist-as-needed Rehabilitation Policy from Therapist-informed Input
Therapist-in-the-loop robotic rehabilitation has shown great promise in enhancing rehabilitation outcomes by integrating the strengths of therapists and robotic systems. However, its broader adoption remains limited due to insufficient safe interaction and limited adaptation capability. This article proposes a novel telerobotics-mediated framework that enables therapists to intuitively and safely deliver assist-as-needed~(AAN) therapy based on two primary contributions. First, our framework encodes the therapist-informed corrective force into via-points in a latent space, allowing the therapist to provide only minimal assistance while encouraging patient maintaining own motion preferences. Second, a shape-adaptive ANN rehabilitation policy is learned to partially and progressively deform the reference trajectory for movement therapy based on encoded patient motion preferences and therapist-informed via-points. The effectiveness of the proposed shape-adaptive AAN strategy was validated on a telerobotic rehabilitation system using two representative tasks. The results demonstrate its practicality for remote AAN therapy and its superiority over two state-of-the-art methods in reducing corrective force and improving movement smoothness.
NAMOUnc: Navigation Among Movable Obstacles with Decision Making on Uncertainty Interval
Navigation among movable obstacles (NAMO) is a critical task in robotics, often challenged by real-world uncertainties such as observation noise, model approximations, action failures, and partial observability. Existing solutions frequently assume ideal conditions, leading to suboptimal or risky decisions. This paper introduces NAMOUnc, a novel framework designed to address these uncertainties by integrating them into the decision-making process. We first estimate them and compare the corresponding time cost intervals for removing and bypassing obstacles, optimizing both the success rate and time efficiency, ensuring safer and more efficient navigation. We validate our method through extensive simulations and real-world experiments, demonstrating significant improvements over existing NAMO frameworks. More details can be found in our website: https://kai-zhang-er.github.io/namo-uncertainty/
comment: 11 pages, ICINCO2025
MLLM-Fabric: Multimodal Large Language Model-Driven Robotic Framework for Fabric Sorting and Selection
Choosing appropriate fabrics is critical for meeting functional and quality demands in robotic textile manufacturing, apparel production, and smart retail. We propose MLLM-Fabric, a robotic framework leveraging multimodal large language models (MLLMs) for fabric sorting and selection. Built on a multimodal robotic platform, the system is trained through supervised fine-tuning and explanation-guided distillation to rank fabric properties. We also release a dataset of 220 diverse fabrics, each with RGB images and synchronized visuotactile and pressure data. Experiments show that our Fabric-Llama-90B consistently outperforms pretrained vision-language baselines in both attribute ranking and selection reliability. Code and dataset are publicly available at https://github.com/limanwang/MLLM-Fabric.
comment: Accepted to IEEE Robotics and Automation Letters (RAL)
Motion Tracks: A Unified Representation for Human-Robot Transfer in Few-Shot Imitation Learning
Teaching robots to autonomously complete everyday tasks remains a challenge. Imitation Learning (IL) is a powerful approach that imbues robots with skills via demonstrations, but is limited by the labor-intensive process of collecting teleoperated robot data. Human videos offer a scalable alternative, but it remains difficult to directly train IL policies from them due to the lack of robot action labels. To address this, we propose to represent actions as short-horizon 2D trajectories on an image. These actions, or motion tracks, capture the predicted direction of motion for either human hands or robot end-effectors. We instantiate an IL policy called Motion Track Policy (MT-pi) which receives image observations and outputs motion tracks as actions. By leveraging this unified, cross-embodiment action space, MT-pi completes tasks with high success given just minutes of human video and limited additional robot demonstrations. At test time, we predict motion tracks from two camera views, recovering 6DoF trajectories via multi-view synthesis. MT-pi achieves an average success rate of 86.5% across 4 real-world tasks, outperforming state-of-the-art IL baselines which do not leverage human data or our action space by 40%, and generalizes to scenarios seen only in human videos. Code and videos are available on our website https://portal-cornell.github.io/motion_track_policy/.
Effect of Performance Feedback Timing on Motor Learning for a Surgical Training Task
Objective: Robot-assisted minimally invasive surgery (RMIS) has become the gold standard for a variety of surgical procedures, but the optimal method of training surgeons for RMIS is unknown. We hypothesized that real-time, rather than post-task, error feedback would better increase learning speed and reduce errors. Methods: Forty-two surgical novices learned a virtual version of the ring-on-wire task, a canonical task in RMIS training. We investigated the impact of feedback timing with multi-sensory (haptic and visual) cues in three groups: (1) real-time error feedback, (2) trial replay with error feedback, and (3) no error feedback. Results: Participant performance was evaluated based on the accuracy of ring position and orientation during the task. Participants who received real-time feedback outperformed other groups in ring orientation. Additionally, participants who received feedback in replay outperformed participants who did not receive any error feedback on ring orientation during long, straight path sections. There were no significant differences between groups for ring position overall, but participants who received real-time feedback outperformed the other groups in positional accuracy on tightly curved path sections. Conclusion: The addition of real-time haptic and visual error feedback improves learning outcomes in a virtual surgical task over error feedback in replay or no error feedback at all. Significance: This work demonstrates that multi-sensory error feedback delivered in real time leads to better training outcomes as compared to the same feedback delivered after task completion. This novel method of training may enable surgical trainees to develop skills with greater speed and accuracy.
comment: Submitted to IEEE Transactions on Biomedical Engineering
Adaptive Dynamics Planning for Robot Navigation
Autonomous robot navigation systems often rely on hierarchical planning, where global planners compute collision-free paths without considering dynamics, and local planners enforce dynamics constraints to produce executable commands. This discontinuity in dynamics often leads to trajectory tracking failure in highly constrained environments. Recent approaches integrate dynamics within the entire planning process by gradually decreasing its fidelity, e.g., increasing integration steps and reducing collision checking resolution, for real-time planning efficiency. However, they assume that the fidelity of the dynamics should decrease according to a manually designed scheme. Such static settings fail to adapt to environmental complexity variations, resulting in computational overhead in simple environments or insufficient dynamics consideration in obstacle-rich scenarios. To overcome this limitation, we propose Adaptive Dynamics Planning (ADP), a learning-augmented paradigm that uses reinforcement learning to dynamically adjust robot dynamics properties, enabling planners to adapt across diverse environments. We integrate ADP into three different planners and further design a standalone ADP-based navigation system, benchmarking them against other baselines. Experiments in both simulation and real-world tests show that ADP consistently improves navigation success, safety, and efficiency.
comment: 8 pages, 4 figures
Adaptive Stress Testing Black-Box LLM Planners
Large language models (LLMs) have recently demonstrated success in generalizing across decision-making tasks including planning, control, and prediction, but their tendency to hallucinate unsafe and undesired outputs poses risks. We argue that detecting such failures is necessary, especially in safety-critical scenarios. Existing methods for black-box models often detect hallucinations by identifying inconsistencies across multiple samples. Many of these approaches typically introduce prompt perturbations like randomizing detail order or generating adversarial inputs, with the intuition that a confident model should produce stable outputs. We first perform a manual case study showing that other forms of perturbations (e.g., adding noise, removing sensor details) cause LLMs to hallucinate in a multi-agent driving environment. We then propose a novel method for efficiently searching the space of prompt perturbations using adaptive stress testing (AST) with Monte-Carlo tree search (MCTS). Our AST formulation enables discovery of scenarios and prompts that cause language models to act with high uncertainty or even crash. By generating MCTS prompt perturbation trees across diverse scenarios, we show through extensive experiments that offline analyses can be used at runtime to automatically generate prompts that influence model uncertainty, and to inform real-time trust assessments of an LLM. We further characterize LLMs deployed as planners in a single-agent lunar lander environment and in a multi-agent robot crowd navigation simulation. Overall, ours is one of the first hallucination intervention algorithms to pave a path towards rigorous characterization of black-box LLM planners.
comment: 25 pages, 24 figures, 5 tables
Multiagent Systems
Scalable Multi-Agent Path Finding using Collision-Aware Dynamic Alert Mask and a Hybrid Execution Strategy
Multi-agent pathfinding (MAPF) remains a critical problem in robotics and autonomous systems, where agents must navigate shared spaces efficiently while avoiding conflicts. Traditional centralized algorithms that have global information, such as Conflict-Based Search (CBS), provide high-quality solutions but become computationally expensive in large-scale scenarios due to the combinatorial explosion of conflicts that need resolution. Conversely, distributed approaches that have local information, particularly learning-based methods, offer better scalability by operating with relaxed information availability, yet often at the cost of solution quality. To address these limitations, we propose a hybrid framework that combines decentralized path planning with a lightweight centralized coordinator. Our framework leverages reinforcement learning (RL) for decentralized planning, enabling agents to adapt their planning based on minimal, targeted alerts--such as static conflict-cell flags or brief conflict tracks--that are dynamically shared information from the central coordinator for effective conflict resolution. We empirically study the effect of the information available to an agent on its planning performance. Our approach reduces the inter-agent information sharing compared to fully centralized and distributed methods, while still consistently finding feasible, collision-free solutions--even in large-scale scenarios having higher agent counts.
Identifying & Interactively Refining Ambiguous User Goals for Data Visualization Code Generation
Establishing shared goals is a fundamental step in human-AI communication. However, ambiguities can lead to outputs that seem correct but fail to reflect the speaker's intent. In this paper, we explore this issue with a focus on the data visualization domain, where ambiguities in natural language impact the generation of code that visualizes data. The availability of multiple views on the contextual (e.g., the intended plot and the code rendering the plot) allows for a unique and comprehensive analysis of diverse ambiguity types. We develop a taxonomy of types of ambiguity that arise in this task and propose metrics to quantify them. Using Matplotlib problems from the DS-1000 dataset, we demonstrate that our ambiguity metrics better correlate with human annotations than uncertainty baselines. Our work also explores how multi-turn dialogue can reduce ambiguity, therefore, improve code accuracy by better matching user goals. We evaluate three pragmatic models to inform our dialogue strategies: Gricean Cooperativity, Discourse Representation Theory, and Questions under Discussion. A simulated user study reveals how pragmatic dialogues reduce ambiguity and enhance code accuracy, highlighting the value of multi-turn exchanges in code generation.
Decentralized Multi-Robot Relative Navigation in Unknown, Structurally Constrained Environments under Limited Communication
Multi-robot navigation in unknown, structurally constrained, and GPS-denied environments presents a fundamental trade-off between global strategic foresight and local tactical agility, particularly under limited communication. Centralized methods achieve global optimality but suffer from high communication overhead, while distributed methods are efficient but lack the broader awareness to avoid deadlocks and topological traps. To address this, we propose a fully decentralized, hierarchical relative navigation framework that achieves both strategic foresight and tactical agility without a unified coordinate system. At the strategic layer, robots build and exchange lightweight topological maps upon opportunistic encounters. This process fosters an emergent global awareness, enabling the planning of efficient, trap-avoiding routes at an abstract level. This high-level plan then inspires the tactical layer, which operates on local metric information. Here, a sampling-based escape point strategy resolves dense spatio-temporal conflicts by generating dynamically feasible trajectories in real time, concurrently satisfying tight environmental and kinodynamic constraints. Extensive simulations and real-world experiments demonstrate that our system significantly outperforms in success rate and efficiency, especially in communication-limited environments with complex topological structures.
GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare
Large Language Models (LLMs) have achieved remarkable progress in reasoning, yet sometimes produce responses that are suboptimal for users in tasks such as writing, information seeking, or providing practical guidance. Conventional alignment practices typically assume that maximizing model reward also maximizes user welfare, but this assumption frequently fails in practice: models may over-clarify or generate overly verbose reasoning when users prefer concise answers. Such behaviors resemble the prisoner's dilemma, where individually rational choices lead to socially suboptimal outcomes. The fundamental challenge is the lack of a principled decision making mechanism that mutually benefits both the LLM and the user. We propose Game-Theoretic Alignment (GTAlign), an alignment framework that integrates game-theoretic decision making into both reasoning and training. During reasoning, the model explicitly treats user-LLM interaction as a strategic game: it constructs payoff matrices within its reasoning chain to estimate welfare for both itself and the user, and then selects actions that are mutually beneficial. During training, we introduce a mutual welfare reward that reinforces cooperative responses, aligning model behavior with socially efficient outcomes. In addition, we introduce an inference technique that leverages game-theoretic reasoning to dynamically adapt LLM's response when pricing policies of LLM service change. Extensive experiments demonstrate that GTAlign substantially improves reasoning efficiency, answer quality, and mutual welfare compared to baselines across diverse tasks. The code is available at https://github.com/ulab-uiuc/GTAlign .
comment: 31 pages, 6 figures
CLARITY: Clinical Assistant for Routing, Inference, and Triage EMNLP 2025
We present CLARITY (Clinical Assistant for Routing, Inference and Triage), an AI-driven platform designed to facilitate patient-to-specialist routing, clinical consultations, and severity assessment of patient conditions. Its hybrid architecture combines a Finite State Machine (FSM) for structured dialogue flows with collaborative agents that employ Large Language Model (LLM) to analyze symptoms and prioritize referrals to appropriate specialists. Built on a modular microservices framework, CLARITY ensures safe, efficient, and robust performance, flexible and readily scalable to meet the demands of existing workflows and IT solutions in healthcare. We report integration of our clinical assistant into a large-scale national interhospital platform, with more than 55,000 content-rich user dialogues completed within the two months of deployment, 2,500 of which were expert-annotated for subsequent validation. The validation results show that CLARITY surpasses human-level performance in terms of the first-attempt routing precision, naturally requiring up to 3 times shorter duration of the consultation than with a human.
comment: Accepted to EMNLP 2025 (Industrial Track)
Anemoi: A Semi-Centralized Multi-agent System Based on Agent-to-Agent Communication MCP server from Coral Protocol
Recent advances in generalist multi-agent systems (MAS) have largely followed a context-engineering plus centralized paradigm, where a planner agent coordinates multiple worker agents through unidirectional prompt passing. While effective under strong planner models, this design suffers from two critical limitations: (1) strong dependency on the planner's capability, which leads to degraded performance when a smaller LLM powers the planner; and (2) limited inter-agent communication, where collaboration relies on prompt concatenation rather than genuine refinement through structured discussions. To address these challenges, we propose Anemoi, a semi-centralized MAS built on the Agent-to-Agent (A2A) communication MCP server from Coral Protocol. Unlike traditional designs, Anemoi enables structured and direct inter-agent collaboration, allowing all agents to monitor progress, assess results, identify bottlenecks, and propose refinements in real time. This paradigm reduces reliance on a single planner, supports adaptive plan updates, and minimizes redundant context passing, resulting in more scalable execution. Evaluated on the GAIA benchmark, Anemoi achieved 52.73% accuracy with a small LLM (GPT-4.1-mini) as the planner, surpassing the strongest open-source baseline OWL (43.63%) by +9.09% under identical LLM settings. Our implementation is publicly available at https://github.com/Coral-Protocol/Anemoi.
Aegis: Automated Error Generation and Attribution for Multi-Agent Systems
Large language model based multi-agent systems (MAS) have unlocked significant advancements in tackling complex problems, but their increasing capability introduces a structural fragility that makes them difficult to debug. A key obstacle to improving their reliability is the severe scarcity of large-scale, diverse datasets for error attribution, as existing resources rely on costly and unscalable manual annotation. To address this bottleneck, we introduce Aegis, a novel framework for Automated error generation and attribution for multi-agent systems. Aegis constructs a large dataset of 9,533 trajectories with annotated faulty agents and error modes, covering diverse MAS architectures and task domains. This is achieved using a LLM-based manipulator that can adaptively inject context-aware errors into successful execution trajectories. Leveraging fine-grained labels and the structured arrangement of positive-negative sample pairs, Aegis supports three different learning paradigms: Supervised Fine-Tuning, Reinforcement Learning, and Contrastive Learning. We develop learning methods for each paradigm. Comprehensive experiments show that trained models consistently achieve substantial improvements in error attribution. Notably, several of our fine-tuned LLMs demonstrate performance competitive with or superior to proprietary models an order of magnitude larger, validating our automated data generation framework as a crucial resource for developing more robust and interpretable multi-agent systems. Our project website is available at https://kfq20.github.io/Aegis-Website/.
Reimagining Agent-based Modeling with Large Language Model Agents via Shachi
The study of emergent behaviors in large language model (LLM)-driven multi-agent systems is a critical research challenge, yet progress is limited by a lack of principled methodologies for controlled experimentation. To address this, we introduce Shachi, a formal methodology and modular framework that decomposes an agent's policy into core cognitive components: Configuration for intrinsic traits, Memory for contextual persistence, and Tools for expanded capabilities, all orchestrated by an LLM reasoning engine. This principled architecture moves beyond brittle, ad-hoc agent designs and enables the systematic analysis of how specific architectural choices influence collective behavior. We validate our methodology on a comprehensive 10-task benchmark and demonstrate its power through novel scientific inquiries. Critically, we establish the external validity of our approach by modeling a real-world U.S. tariff shock, showing that agent behaviors align with observed market reactions only when their cognitive architecture is appropriately configured with memory and tools. Our work provides a rigorous, open-source foundation for building and evaluating LLM agents, aimed at fostering more cumulative and scientifically grounded research.
DDO: Dual-Decision Optimization for LLM-Based Medical Consultation via Multi-Agent Collaboration EMNLP 2025
Large Language Models (LLMs) demonstrate strong generalization and reasoning abilities, making them well-suited for complex decision-making tasks such as medical consultation (MC). However, existing LLM-based methods often fail to capture the dual nature of MC, which entails two distinct sub-tasks: symptom inquiry, a sequential decision-making process, and disease diagnosis, a classification problem. This mismatch often results in ineffective symptom inquiry and unreliable disease diagnosis. To address this, we propose \textbf{DDO}, a novel LLM-based framework that performs \textbf{D}ual-\textbf{D}ecision \textbf{O}ptimization by decoupling the two sub-tasks and optimizing them with distinct objectives through a collaborative multi-agent workflow. Experiments on three real-world MC datasets show that DDO consistently outperforms existing LLM-based approaches and achieves competitive performance with state-of-the-art generation-based methods, demonstrating its effectiveness in the MC task. The code is available at https://github.com/zh-jia/DDO.
comment: Accepted to EMNLP 2025
What Do Agents Think One Another Want? Level-2 Inverse Games for Inferring Agents' Estimates of Others' Objectives
Effectively interpreting strategic interactions among multiple agents requires us to infer each agent's objective from limited information. Existing inverse game-theoretic approaches frame this challenge in terms of a "level-1" inference problem, in which we take the perspective of a third-party observer and assume that individual agents share complete knowledge of one another's objectives. However, this assumption breaks down in decentralized, real-world scenarios like urban driving and bargaining, in which agents may act based on conflicting views of one another's objectives. We demonstrate the necessity of inferring agents' different estimates of each other's objectives through empirical examples, and by theoretically characterizing the prediction error of level-1 inference on fictitious gameplay data from linear-quadratic games. To address this fundamental issue, we propose a framework for level-2 inference to address the question: "What does each agent believe about other agents' objectives?" We prove that the level-2 inference problem is non-convex even in benign settings like linear-quadratic games, and we develop an efficient gradient-based approach for identifying local solutions. Experiments on a synthetic urban driving example show that our approach uncovers nuanced misalignments that level-1 methods miss.
comment: 8 pages + references + appendix + supplement
LegalWiz: A Multi-Agent Generation Framework for Contradiction Detection in Legal Documents
Retrieval-Augmented Generation (RAG) integrates large language models (LLMs) with external sources, but unresolved contradictions in retrieved evidence often lead to hallucinations and legally unsound outputs. Benchmarks currently used for contradiction detection lack domain realism, cover only limited conflict types, and rarely extend beyond single-sentence pairs, making them unsuitable for legal applications. Controlled generation of documents with embedded contradictions is therefore essential: it enables systematic stress-testing of models, ensures coverage of diverse conflict categories, and provides a reliable basis for evaluating contradiction detection and resolution. We present a multi-agent contradiction-aware benchmark framework for the legal domain that generates synthetic legal-style documents, injects six structured contradiction types, and models both self- and pairwise inconsistencies. Automated contradiction mining is combined with human-in-the-loop validation to guarantee plausibility and fidelity. This benchmark offers one of the first structured resources for contradiction-aware evaluation in legal RAG pipelines, supporting more consistent, interpretable, and trustworthy systems.
Co-Alignment: Rethinking Alignment as Bidirectional Human-AI Cognitive Adaptation
Current AI alignment through RLHF follows a single directional paradigm that AI conforms to human preferences while treating human cognition as fixed. We propose a shift to co-alignment through Bidirectional Cognitive Alignment (BiCA), where humans and AI mutually adapt. BiCA uses learnable protocols, representation mapping, and KL-budget constraints for controlled co-evolution. In collaborative navigation, BiCA achieved 85.5% success versus 70.3% baseline, with 230% better mutual adaptation and 332% better protocol convergence. Emergent protocols outperformed handcrafted ones by 84%, while bidirectional adaptation unexpectedly improved safety (+23% out-of-distribution robustness). The 46% synergy improvement demonstrates optimal collaboration exists at the intersection, not union, of human and AI capabilities, validating the shift from single-directional to co-alignment paradigms.
Systems and Control (CS)
Robust reset control design for piezo-actuated nano-positioner in presence of hysteresis nonlinearity
In this paper, a robust nonlinear control scheme is designed for the motion control of a class of piezo-actuated nano-positioning systems using frequency-domain analysis. The hysteresis, the nonlinearity in the piezoelectric material, degrades the precision in tracking references with high frequency contents and different travel ranges. The hysteresis compensation by the inverse model, as the state-of-the-art solution, is not reliable alone. Therefore, a control framework with robustness against the remaining nonlinearity is needed. It is shown that there is an unavoidable limitation in robust linear control design to improve the performance. A robust control methodology based on a complex-order element is established to relax the limitation. Then, a constant-in-gain-lead-in-phase (CgLp) reset controller is utilized to realize the complex-order control. The control design is based on the sinusoidal input describing function (SIDF) and the higher-order SIDF (HOSIDF) tools. A constrained optimization problem is provided to tune the control parameters. The achieved improvements by the CgLp control is validated by the simulation.
Demystifying and Navigating AI Ethics in Power Electronics
Artificial intelligence (AI) is rapidly transforming power electronics, with AI-related publications in IEEE Power Electronics Society selected journals increasing more than fourfold from 2020 to 2025. However, the ethical dimensions of this transformation have received limited attention. This article underscores the urgent need for an ethical framework to guide responsible AI integration in power electronics, not only to prevent AI-related incidents but also to comply with legal and regulatory responsibilities. In this context, this article identifies four core pillars of AI ethics in power electronics: Security & Safety, Explainability & Transparency, Energy Sustainability, and Evolving Roles of Engineers. Each pillar is supported by practical and actionable insights to ensure that ethical principles are embedded in algorithm design, system deployment, and workforce development. The authors advocate for power electronics engineers to lead the ethical discourse, given their deep technical understanding of both AI systems and power conversion technologies. The paper concludes by calling on the IEEE Power Electronics Society to spearhead the establishment of ethical standards and best practices that ensure AI innovations are not only technically advanced but also trustworthy, safe, and sustainable.
Critical States Identiffcation in Power System via Lattice Partition and Its Application in Reliability Assessment
With the increasing complexity of power systems,accurately identifying critical states (the states corresponding to minimal cut sets) and assessing system reliability have become crucial tasks. In this paper, a mathematical lattice structure is employed to represent and partition the state space of power system. Based on this structure, a novel recursive method is proposed to efffciently identify critical states by leveraging lattice partitioning and Optimal Power Flow(OPF) calculations. This method not only enables the extension of failure system states,but also calculates the upper and lower bounds of the Loss of Load Probability (LOLP) in a progressively converging manner. Compared to traditional reliability assessment methods such as State Enumeration (SE) and Monte Carlo Simulation (MCS), this approach offers greater accuracy and efffciency. Experiments conducted on the RBTS and RTS79 systems demonstrate that the proposed method accurately identiffes all critical states up to a preset order, which are high-risk states. The contribution of these critical states to LOLP highlights their signiffcance in the system. Moreover, the proposed method achieves the analytical value with signiffcantly fewer OPF calculations in RBTS system, reaching acceptable precision of LOLP up to 100 times faster than SE in both the RBTS and RTS systems.
Grid-forming Control of Converter Infinite Bus System: Modeling by Data-driven Methods
This study explores data-driven modeling techniques to capture the dynamics of a grid-forming converter-based infinite bus system, critical for renewable-integrated power grids. Using sparse identification of nonlinear dynamics and deep symbolic regression, models were generated from synthetic data simulating key disturbances in active power, reactive power, and voltage references. Deep symbolic regression demonstrated more accuracy in capturing complex system dynamics, though it required substantially more computational time than sparse identification of nonlinear dynamics. These findings suggest that while deep symbolic regression offers high fidelity, sparse identification of nonlinear dynamics provides a more computationally efficient approach, balancing accuracy and runtime for real-time grid applications.
3C Resources Joint Allocation for Time-Deterministic Remote Sensing Image Backhaul in the Space-Ground Integrated Network
Low-Earth-orbit (LEO) satellites assist observation satellites (OSs) to compress and backhaul more time-determined images (TDI) has become a new paradigm, which is used to enhance the timeout caused by the limited computing resources of OSs. However, how to capture the time-varying and dynamic characteristics of multi-dimensional resources is challenging for efficient collaborative scheduling. Motivated by this factor, we design a highly succinct multi-dimensional resource time-expanded graph (MDR-TEG) modell. Specifically, by employing a slots division mechanism and introducing an external virtual node, the time-varying communication, caching, and computing (3C) resources are depicted in low complexity by the link weights within, between, and outside the slots. Based on the MDR-TEG, the maximizing successful transmission ratio of TDI (MSTR-TDI) is modeled as a mixed integer linear programming (MILP) problem. Which further relaxed decomposed into two tractable sub-problems: maximizing the successful transmission rate of images (MSTRI) and ensuring the timeliness problem (ETP). Subsequently, an efficient subgradient of relaxation computing constraint (SRCC) algorithm is proposed. The upper and lower bounds of MSTR-TDI are obtained by solving the two subproblems and the dual problem (DP), and the direction of the next iteration is obtained by feedback. Furthermore, arranging the sending sequences of images to improve the quality of the solution. The approximate optimal solution of MSTR-TDI is eventually obtained through repeated iterations. The simulation results verify the superiority of the proposed MDR-TEG model and the effectiveness of the SRCC.
Task-Level Insights from Eigenvalues across Sequence Models
Although softmax attention drives state-of-the-art performance for sequence models, its quadratic complexity limits scalability, motivating linear alternatives such as state space models (SSMs). While these alternatives improve efficiency, their fundamental differences in information processing remain poorly understood. In this work, we leverage the recently proposed dynamical systems framework to represent softmax, norm and linear attention as dynamical systems, enabling a structured comparison with SSMs by analyzing their respective eigenvalue spectra. Since eigenvalues capture essential aspects of dynamical system behavior, we conduct an extensive empirical analysis across diverse sequence models and benchmarks. We first show that eigenvalues influence essential aspects of memory and long-range dependency modeling, revealing spectral signatures that align with task requirements. Building on these insights, we then investigate how architectural modifications in sequence models impact both eigenvalue spectra and task performance. This correspondence further strengthens the position of eigenvalue analysis as a principled metric for interpreting, understanding, and ultimately improving the capabilities of sequence models.
MPA-DNN: Projection-Aware Unsupervised Learning for Multi-period DC-OPF
Ensuring both feasibility and efficiency in optimal power flow (OPF) operations has become increasingly important in modern power systems with high penetrations of renewable energy and energy storage. While deep neural networks (DNNs) have emerged as promising fast surrogates for OPF solvers, they often fail to satisfy critical operational constraints, especially those involving inter-temporal coupling, such as generator ramping limits and energy storage operations. To deal with these issues, we propose a Multi-Period Projection-Aware Deep Neural Network (MPA-DNN) that incorporates a projection layer for multi-period dispatch into the network. By doing so, our model enforces physical feasibility through the projection, enabling end-to-end learning of constraint-compliant dispatch trajectories without relying on labeled data. Experimental results demonstrate that the proposed method achieves near-optimal performance while strictly satisfying all constraints in varying load conditions.
Data-Driven Control Of Power Converters
The fundamental role of power converters is to efficiently manage and control the flow of electrical energy, ensuring compatibility between power sources and loads. All these applications of power converters need the design of an appropriate control law. Control of power converters is a challenging problem due to the presence of switching devices which are difficult to handle using traditional control approaches. The objective of this paper is to investigate the use of data-driven techniques, in particular the Virtual References Feedback Tuning (VRFT) method, in the context of power converters feedback control. This study considers a buck \pauline{mode} power converter circuit provided by the OwnTech foundation.
comment: Conference paper for the french national electrical engineering symposium, SGE 2025
Weighting Factors Tuning by Direct Feedback in Predictive Control of Multiphase Motors
Predictive Stator Current Control (PSCC) has been proposed for control of multi-phase drives. The flexibility offered by the use of a Cost Function has been used to deal with the increased number of phases. However, tuning of the Weighting Factors constitutes a problem. Intensive trial and error tests are usual in this context. Existing on-line selection methods, on the other hand, require large amounts of data and/or complex optimization procedures. The proposal of this paper is a closed-loop scheme that links Weighting Factors to performance indicators. In this way, optimal Weighting Factors are determined for each operating point. Also, changes in reference values for performance indicators are easily tackled. Unlike previous methods, the proposal carries very little computational burden. A case study is developed for a five-phase induction motor and assessed with real experimentation on a laboratory set-up.
Safety Analysis of eVTOL Operations based on STPA
Electric Vertical Take-Off and Landing (eVTOL) aircraft are expected to be quieter and more cost-effective than helicopters, offering major economic and social benefits through improved connectivity. Their adoption will require new ground infrastructure and airspace redesign, introducing risks involving multiple stakeholders (Regulators, eVTOL operators, Air navigation service providers, Vertiport operators, OEMs, Pilots, etc.). To assess these risks for the UK airspace, systems-thinking based System Theoretic Process Analysis (STPA) was conducted. To manage the large number of Unsafe Control Actions (UCAs) and requirements generated due to the complexity of the analysis, a novel extension to STPA for the prioritization of results was applied. 317 UCAs were identified in total out of which 110 high-priority UCAs were analyzed (Step-4), resulting in 377 causal factors and 432 requirements. These were prioritized to produce a targeted list of 124 distinct high-priority requirements, 56 of which were identified as gaps in existing aviation regulations, policies, or procedures.. These highlight opportunities for regulatory updates in areas such as organizational performance, certification processes, training, collision avoidance, energy management, and automation. The findings provide regulators with safety considerations that could shape new or updated regulations, compliance methods, and guidance materials for the safe deployment of eVTOLs.
Single vs Multi Vector Predictive Control of Five-phase Drives
The field of Finite State Model Predictive Control for multiphase drives has produced many contributions. Many variants of FSMPC exist, each aiming at some aspect such as complexity of the cost function, switching frequency, etc. Despite past efforts to compare different techniques, the field is still out of consensus regarding the relative merits of each one. This paper presents a new method to compare FSMPC variants. The method is based on analyzing the modulation, implicit or explicit, used by each variant. In the paper the method is used to compare single-vector state-of-the-art FSMPC with a multi-vector variant designed to cancel xy currents and simplify the cost function. The results show the strengths and weaknesses of each technique. Also, it is found that the trade-offs between figures, previously thought to concern just individual regimes, extend to the whole operating space and also can be pinpoint to each FSMPC variant. Finally, it is shown that the flexibility of the single-vector approach and its better DC-link usage makes it, arguably, superior over the multi-vector variant.
Robust Adaptive Boundary Control of a Thermal Process with Thermoelectric Actuators: Theory and Experimental Validation
A sliding-mode-based adaptive boundary control law is proposed for a class of uncertain thermal reaction-diffusion processes subject to matched disturbances. The disturbances are assumed to be bounded, but the corresponding bounds are unknown, thus motivating the use of adaptive control strategies. A boundary control law comprising a proportional and discontinuous term is proposed, wherein the magnitude of the discontinuous relay term is adjusted via a gradient-based adaptation algorithm. Depending on how the adaptation algorithm is parameterized, the adaptive gain can be either a nondecreasing function of time (monodirectional adaptation) or it can both increase and decrease (bidirectional adaptation). The convergence and stability properties of these two solutions are investigated by Lyapunov analyses, and two distinct stability results are derived, namely, asymptotic stability for the monodirectional adaptation and globally uniformly ultimately bounded solutions for the bidirectional adaptation. The proposed algorithms are then specified to address the control problem of stabilizing a desired temperature profile in a metal beam equipped with thermoelectric boundary actuators. Experiments are conducted to investigate the real-world performance of the proposed sliding-mode-based adaptive control, with a particular focus on comparing the monodirectional and bidirectional adaptation laws.
comment: Extended version of the preprint submitted to the journal Automatica
Antenna's Performance in Microwave Imaging of Stratified Media
Numerous types of antennas have been employed for microwave imaging of stratified media for ground penetrating radar (GPR), through-the-wall-radar imaging (TWRI), etc. This letter aims to investigate the impact of the different antennas with their characteristics on the image reconstruction of those media. Hence, three types of antennas, including horn antennas, open waveguide and Vivaldi antennas, are chosen as almost directional antennas, operating at X-band 8-12 GHz. The antenna's far-field and near-field characteristics are analyzed. A diffraction tomography (DT)-based algorithm is used to reconstruct the target location within the stratified media using monostatic and multistatic data. It is observed that the more directional antennas provide a better-reconstructed image with less shadowing image of the stratified media.
Sensing, Detection and Localization for Low Altitude UAV: A RF-Based Framework via Multiple BSs Collaboration
The rapid growth of the low-altitude economy has resulted in a significant increase in the number of Low, slow, and small (LLS) unmanned aerial vehicles (UAVs), raising critical challenges for secure airspace management and reliable trajectory planning. To address this, this paper proposes a cooperative radio-frequency (RF) detection and localization framework that leverages existing cellular base stations. The proposed approach features a robust scheme for LSS target identification, integrating a cell averaging-constant false alarm rate (CA-CFAR) detector with a micro-Doppler signature (MDS) based recognition method. Multi-station measurements are fused through a grid-based probabilistic algorithm combined with clustering techniques, effectively mitigating ghost targets and improving localization accuracy in multi-UAV scenarios. Furthermore, the Cramer-Rao lower bound (CRLB) is derived as a performance benchmark and reinforcement learning (RL)-based optimization is employed to balance localization accuracy against station resource usage. Simulations demonstrate that increasing from one to multiple BSs reduces the positioning error to near the CRLB, while practical experiments further verify the framework's effectiveness. Furthermore, our RL-based optimization can find solutions that maintain high accuracy while minimizing resource usage, highlighting its potential as a scalable solution for ensuring airspace safety in the emerging low-altitude economy.
MAKO: Meta-Adaptive Koopman Operators for Learning-based Model Predictive Control of Parametrically Uncertain Nonlinear Systems
In this work, we propose a meta-learning-based Koopman modeling and predictive control approach for nonlinear systems with parametric uncertainties. An adaptive deep meta-learning-based modeling approach, called Meta Adaptive Koopman Operator (MAKO), is proposed. Without knowledge of the parametric uncertainty, the proposed MAKO approach can learn a meta-model from a multi-modal dataset and efficiently adapt to new systems with previously unseen parameter settings by using online data. Based on the learned meta Koopman model, a predictive control scheme is developed, and the stability of the closed-loop system is ensured even in the presence of previously unseen parameter settings. Through extensive simulations, our proposed approach demonstrates superior performance in both modeling accuracy and control efficacy as compared to competitive baselines.
Trust Modeling and Estimation in Human-Autonomy Interactions
Advances in the control of autonomous systems have accompanied an expansion in the potential applications for autonomous robotic systems. The success of applications involving humans depends on the quality of interaction between the autonomous system and the human supervisor, which is particularly affected by the degree of trust that the supervisor places in the autonomous system. Absent from the literature are models of supervisor trust dynamics that can accommodate asymmetric responses to autonomous system performance and the intermittent nature of supervisor-autonomous system communication. This paper focuses on formulating an estimated model of supervisor trust that incorporates both of these features by employing a switched linear system structure with event-triggered sampling of the model input and output. Trust response data collected in a user study with 51 participants were then used identify parameters for a switched linear model-based observer of supervisor trust.
comment: 10 pages. 13 figures
Traffic-Aware Eco-Driving Control in CAVs via Learning-based Terminal Cost Model
Connected and Automated Vehicles (CAVs) offer significant potential for improving energy efficiency and lowering vehicle emissions through eco-driving technologies. Control algorithms in CAVs leverage look-ahead route information and Vehicle-to-Everything (V2X) communication to optimize vehicle performance. However, existing eco-driving strategies often neglect macroscopic traffic effects, such as upstream traffic jams, that occur outside the optimization horizon but significantly impact vehicle energy efficiency. This work presents a novel Neural Network (NN)-based methodology to approximate the terminal cost within a model predictive control (MPC) problem framework, explicitly incorporating upstream traffic dynamics. By incorporating traffic jams into the optimization process, the proposed traffic-aware approach yields more energy-efficient speed trajectories compared to traffic-agnostic methods, with minimal impact on travel time. The framework is scalable for real-time implementation while effectively addressing uncertainties from dynamic traffic conditions and macroscopic traffic events.
Direct Data-Driven Predictive Control for a Three-dimensional Cable-Driven Soft Robotic Arm
Soft robots offer significant advantages in safety and adaptability, yet achieving precise and dynamic control remains a major challenge due to their inherently complex and nonlinear dynamics. Recently, Data-enabled Predictive Control (DeePC) has emerged as a promising model-free approach that bypasses explicit system identification by directly leveraging input-output data. While DeePC has shown success in other domains, its application to soft robots remains underexplored, particularly for three-dimensional (3D) soft robotic systems. This paper addresses this gap by developing and experimentally validating an effective DeePC framework on a 3D, cable-driven soft arm. Specifically, we design and fabricate a soft robotic arm with a thick tubing backbone for stability, a dense silicone body with large cavities for strength and flexibility, and rigid endcaps for secure termination. Using this platform, we implement DeePC with singular value decomposition (SVD)-based dimension reduction for two key control tasks: fixed-point regulation and trajectory tracking in 3D space. Comparative experiments with a baseline model-based controller demonstrate DeePC's superior accuracy, robustness, and adaptability, highlighting its potential as a practical solution for dynamic control of soft robots.
A pilot cohort study of a microfluidic-based point-of-care bilirubin measurement system
Objective The concentration of bilirubin in blood or serum is useful for assessing liver function as well as monitoring treatment. This study evaluates the clinical performance of a novel point-of-care (PoC) device for the detection of bilirubin in serum. The PoC device incorporates an integrated miniature optoelectronic sensing module and a microfluidic test cartridge. Methods Patients' serum total bilirubin concentrations, ranging from 2 {\mu}mol/L to 480 {\mu}mol/L, were measured using the PoC device and the standard laboratory method (n=20). Bland-Altman analysis and regression analysis using Passing-Bablok method were used to benchmark the PoC device against the standard laboratory measurements. The diagnostic capability of the PoC device in categorising the serum samples within clinically relevant bilirubin concentration thresholds of 200, 300, and 450 {\mu}mol/L was assessed using receiver operating characteristic (ROC) analysis. Results The mean difference between the PoC device and the standard laboratory method was -5.6 {\mu}mol/L, with a 95% confidence interval (CI) of -45.1 {\mu}mol/L to 33.9 {\mu}mol/L. The coefficient of determination (R2) was 0.986. The PoC device achieved a detection sensitivity of 90% and specificity of 97% in categorising bilirubin concentrations within bands used in clinical decision-making. Conclusions This study demonstrates that the proposed PoC device is capable of measuring bilirubin levels in patient samples with clinically acceptable accuracy.
Cognitive Radio for Asymmetric Cellular Downlink with Multi-User MIMO
Cognitive radio (CR) is an important technique for improving spectral efficiency, letting a secondary system operate in a wireless spectrum when the primary system does not make use of it. While it has been widely explored over the past 25 years, many common assumptions are not aligned with the realities of 5G networks. In this paper, we consider the CR problem for the following setup: (i) infrastructure-based systems, where downlink transmissions might occur to receivers whose positions are not, or not exactly, known; (ii) multi-beam antennas at both primary and secondary base stations. We formulate a detailed protocol to determine when secondary transmissions into different beam directions can interfere with primary users at potential locations and create probability-based interference rules. We then analyze the "catastrophic interference" probability and the "missed transmission opportunity" probability, as well as the achievable throughput, as a function of the transmit powers of the primary and secondary base stations and the sensing window of the secondary base station. Results can serve to more realistically assess the spectral efficiency gains in 5G infrastructure-based cognitive systems.
Observation Matrix Design for Densifying MIMO Channel Estimation via 2D Ice Filling
In recent years, densifying multiple-input multiple-output (MIMO) has attracted much attention from the communication community. Thanks to the subwavelength antenna spacing, the strong correlations among densifying antennas provide sufficient prior knowledge about channel state information (CSI). This inspires the careful design of observation matrices (e.g., transmit precoders and receive combiners), that exploits the CSI prior knowledge, to boost channel estimation performance. Aligned with this vision, this work proposes to jointly design the combiners and precoders by maximizing the mutual information between the received pilots and densifying MIMO channels. A two-dimensional ice-filling (2DIF) algorithm is proposed to efficiently accomplish this objective. The algorithm is motivated by the fact that the eigenspace of MIMO channel covariance can be decoupled into two sub-eigenspaces, which are associated with the correlations of transmitter antennas and receiver antennas, respectively. By properly setting the precoder and the combiner as the eigenvectors from these two sub-eigenspaces, the 2DIF promises to generate near-optimal observation matrices. Moreover, we further extend the 2DIF method to the popular hybrid combining systems, where a two-stage 2DIF (TS-2DIF) algorithm is developed to handle the analog combining circuits realized by phase shifters. Simulation results demonstrate that, compared to the state-of-the-art schemes, the proposed 2DIF and TS-2DIF methods can achieve superior channel estimation accuracy.
comment: 17 pages, 8 figures
Computing Safe Control Inputs using Discrete-Time Matrix Control Barrier Functions via Convex Optimization
Control barrier functions (CBFs) have seen widespread success in providing forward invariance and safety guarantees for dynamical control systems. A crucial limitation of discrete-time formulations is that CBFs that are nonconcave in their argument require the solution of nonconvex optimization problems to compute safety-preserving control inputs, which inhibits real-time computation of control inputs guaranteeing forward invariance. This paper presents a novel method for computing safety-preserving control inputs for discrete-time systems with nonconvex safety sets, utilizing convex optimization and the recently developed class of matrix control barrier function techniques. The efficacy of our methods is demonstrated through numerical simulations on a bicopter system.
comment: 17 pages, 8 figures
Cyber-Physical Systems on the Megawatt Scale: The impact of battery control on grid frequency stability
Electric power systems are undergoing fundamental change. The shift to inverter-based generation challenges frequency stability, while growing digitalisation heightens vulnerability to errors and attacks. Here we identify an emerging risk at the intersection of cyber-physical coupling and control system design. We show that grid frequency time series worldwide exhibit a persistent one-minute oscillatory pattern, whose origin has remained largely unexplained. We trace this pattern back to the energy management systems of battery electric storage systems and demonstrate that the pattern amplitude has increased substantially in the Nordic and British grids. We argue that this effect is a potential burden for stability in future grids with low inertia and an increasing penetration with batteries and smart devices, though it can be mitigated by a revision of battery control algorithms.
comment: 19 pages, 23 figures
Latent-Feature-Informed Neural ODE Modeling for Lightweight Stability Evaluation of Black-box Grid-Tied Inverters
Stability evaluation of black-box grid-tied inverters is vital for grid reliability, yet identification techniques are both data-hungry and blocked by proprietary internals. {To solve this, this letter proposes a latent-feature-informed neural ordinary differential equation (LFI-NODE) modeling method that can achieve lightweight stability evaluation directly from trajectory data.} LFI-NODE parameterizes the entire system ODE with a single continuous-time neural network, allowing each new sample to refine a unified global model. It faithfully captures nonlinear large-signal dynamics to preserve uniform predictive accuracy as the inverter transitions between operating points. Meanwhile, latent perturbation features distilled from every trajectory steer the learning process and concurrently reveal the small-signal eigenstructure essential for rigorous stability analysis. Validated on a grid-forming inverter, {The LFI-NODE requires one to two orders of magnitude fewer training samples compared with traditional methods, collected from short time-domain trajectories instead of extensive frequency-domain measurements.} {Furthermore, the LFI-NODE requires only 48 short transients to achieve a trajectory prediction error at the hundredth level and an eigenvalue estimation error at the tenth level, outperforming benchmark methods by one to two orders of magnitude.} This makes LFI-NODE a practical and lightweight approach for achieving high-fidelity stability assessment of complex black-box power-electronic systems.
comment: 6 pages 8fugures
Designing Control Barrier Functions Using a Dynamic Backup Policy
This paper presents a systematic approach to construct control barrier functions for nonlinear control affine systems subject to arbitrary state and input constraints. Taking inspiration from the reference governor literature, the proposed method defines a family of backup policies, parametrized by the equilibrium manifold of the system. The control barrier function is defined on the augmented state-and-reference space: given a state-reference pair, the approach quantifies the distance to constraint violation at any time in the future, should the current backup policy reference remain constant. Sensitivity analysis is then used to compute the (possibly nonsmooth) Jacobian with respect to the augmented state vector. To showcase its simple yet general nature, the proposed method is applied to an inverted pendulum on cart.
comment: 7 pages, 1 figure
Science ouverte et collaborative pour l'élaboration d'un banc automatisé de caractérisation de pertes en commutation par opposition
The switching losses of power transistors are generally measured using the so-called double pulse method. Measuring the opposition of two switching cells is a complementary method that is more accurate but indirect. However, implementing this method can be more complex and requires calibration steps and comprehensive control, with the added issue of thermal management. In this context, we proposed to address this topic through open and collaborative science, first in the form of a two-day hackathon, followed by monthly open sessions. More than 20 participants contributed to the two-day hackathon, followed by monthly sessions for those wishing to continue working together. This enabled us to set up an automated bench, in open science, including the generation of switching commands, the configuration and control of measuring instruments, and the hardware part. Here we present and share our work and this open approach.
comment: Paper in french, presented at the french national electrical engineering conference SGE 2025
Robustness Analysis for Quantum Systems Controlled by Continuous-Time Pulses
Differential sensitivity techniques originally developed to study the robustness of energy landscape controllers are generalized to the important case of closed quantum systems subject to continuously varying controls. Vanishing sensitivity to parameter variation is shown to coincide with perfect fidelity, as was the case for time-invariant controls. Upper bounds on the magnitude of the differential sensitivity to any parameter variation are derived based simply on knowledge of the system Hamiltonian and the maximum size of the control inputs.
comment: 6 pages, 2 figures
Safe and Optimal N-Spacecraft Swarm Reconfiguration in Non-Keplerian Cislunar Orbits
This paper presents a novel fuel-optimal guidance and control methodology for spacecraft swarm reconfiguration in Restricted Multi-Body Problems (RMBPs) with a guarantee of passive safety, maintaining miss distance even under abrupt loss of control authority. A new set of constraints exploits a quasi-periodic structure of RMBPs to guarantee passive safety. Particularly, the condition for passive safety is expressed as simple geometric constraints by solving optimal control in Local Toroidal Coordinates, which is based on a local eigenspace of a quasi-periodic motion around the corresponding periodic orbit. The proposed formulation enables a significant simplification of problem structure, which is applicable to large-scale swarm reconfiguration in cislunar orbits. The method is demonstrated in the Circular Restricted Three-Body Problem, the Elliptic Restricted Three-Body Problem, and the Bi-Circular Restricted Four-Body Problem. Furthermore, the optimized control profiles are validated in the full-ephemeris dynamics model. By extending and generalizing well-known concepts of relative orbital elements within the restricted two-body problem to the three- and four-body problems, this paper lays the foundation for practical control schemes of relative motion in cislunar space.
comment: 41 pages, 19 figures. Submitted and accepted to Journal of Guidance, Control, and Dynamics
A Digital Twin for Diesel Engines: Operator-infused Physics-Informed Neural Networks with Transfer Learning for Engine Health Monitoring
Improving diesel engine efficiency, reducing emissions, and enabling robust health monitoring have been critical research topics in engine modelling. While recent advancements in the use of neural networks for system monitoring have shown promising results, such methods often focus on component-level analysis, lack generalizability, and physical interpretability. In this study, we propose a novel hybrid framework that combines physics-informed neural networks (PINNs) with deep operator networks (DeepONet) to enable accurate and computationally efficient parameter identification in mean-value diesel engine models. Our method leverages physics-based system knowledge in combination with data-driven training of neural networks to enhance model applicability. Incorporating offline-trained DeepONets to predict actuator dynamics significantly lowers the online computation cost when compared to the existing PINN framework. To address the re-training burden typical of PINNs under varying input conditions, we propose two transfer learning (TL) strategies: (i) a multi-stage TL scheme offering better runtime efficiency than full online training of the PINN model and (ii) a few-shot TL scheme that freezes a shared multi-head network body and computes physics-based derivatives required for model training outside the training loop. The second strategy offers a computationally inexpensive and physics-based approach for predicting engine dynamics and parameter identification, offering computational efficiency over the existing PINN framework. Compared to existing health monitoring methods, our framework combines the interpretability of physics-based models with the flexibility of deep learning, offering substantial gains in generalization, accuracy, and deployment efficiency for diesel engine diagnostics.
SwarmGPT: Combining Large Language Models with Safe Motion Planning for Drone Swarm Choreography
Drone swarm performances -- synchronized, expressive aerial displays set to music -- have emerged as a captivating application of modern robotics. Yet designing smooth, safe choreographies remains a complex task requiring expert knowledge. We present SwarmGPT, a language-based choreographer that leverages the reasoning power of large language models (LLMs) to streamline drone performance design. The LLM is augmented by a safety filter that ensures deployability by making minimal corrections when safety or feasibility constraints are violated. By decoupling high-level choreographic design from low-level motion planning, our system enables non-experts to iteratively refine choreographies using natural language without worrying about collisions or actuator limits. We validate our approach through simulations with swarms up to 200 drones and real-world experiments with up to 20 drones performing choreographies to diverse types of songs, demonstrating scalable, synchronized, and safe performances. Beyond entertainment, this work offers a blueprint for integrating foundation models into safety-critical swarm robotics applications.
comment: Accepted at RA-L 2025
Extending First-order Robotic Motion Planners to Second-order Robot Dynamics
This paper extends first-order motion planners to robots governed by second-order dynamics. Two control schemes are proposed based on the knowledge of a scalar function whose negative gradient aligns with a given first-order motion planner. When such a function is known, the first-order motion planner is combined with a damping velocity vector with a dynamic gain to extend the safety and convergence guarantees of the first-order motion planner to second-order systems. If no such function is available, we propose an alternative control scheme ensuring that the error between the robot's velocity and the first-order motion planner converges to zero. The theoretical developments are supported by simulation results demonstrating the effectiveness of the proposed approaches.
comment: 14 pages, 10 figures
Direction Estimation of Sound Sources Using Microphone Arrays and Signal Strength ICSE
Sound-tracking refers to the process of determining the direction from which a sound originates, making it a fundamental component of sound source localization. This capability is essential in a variety of applications, including security systems, acoustic monitoring, and speaker tracking, where accurately identifying the direction of a sound source enables real-time responses, efficient resource allocation, and improved situational awareness. While sound-tracking is closely related to localization, it specifically focuses on identifying the direction of the sound source rather than estimating its exact position in space. Despite its utility, sound-tracking systems face several challenges, such as maintaining directional accuracy and precision, along with the need for sophisticated hardware configurations and complex signal processing algorithms. This paper presents a sound-tracking method using three electret microphones. We estimate the direction of a sound source using a lightweight method that analyzes signals from three strategically placed microphones. By comparing the average power of the received signals, the system infers the most probable direction of the sound. The results indicate that the power level from each microphone effectively determines the sound source direction. Our system employs a straightforward and cost-effective hardware design, ensuring simplicity and affordability in implementation. It achieves a localization error of less than 6 degrees and a precision of 98%. Additionally, its effortless integration with various systems makes it versatile and adaptable. Consequently, this technique presents a robust and reliable solution for sound-tracking and localization, with potential applications spanning diverse domains such as security systems, smart homes, and acoustic monitoring.
comment: Accepted to the 32nd International Conference on Systems Engineering (ICSEng'2025)
Optimization via a Control-Centric Framework
Optimization plays a central role in intelligent systems and cyber-physical technologies, where speed and reliability of convergence directly impact performance. In control theory, optimization-centric methods are standard: controllers are designed by repeatedly solving optimization problems, as in linear quadratic regulation, $H_\infty$ control, and model predictive control. In contrast, this paper develops a control-centric framework for optimization itself, where algorithms are constructed directly from Lyapunov stability principles rather than being proposed first and analyzed afterward. A key element is the stationarity vector, which encodes first-order optimality conditions and enables Lyapunov-based convergence analysis. By pairing a Lyapunov function with a selectable decay law, we obtain continuous-time dynamics with guaranteed exponential, finite-time, fixed-time, or prescribed-time convergence. Within this framework, we introduce three feedback realizations of increasing restrictiveness: the Hessian-gradient, Newton, and gradient dynamics. Each realization shapes the decay of the stationarity vector to achieve the desired rate. These constructions unify unconstrained optimization, extend naturally to constrained problems via Lyapunov-consistent primal-dual dynamics, and broaden the results for minimax and generalized Nash equilibrium seeking problems beyond exponential stability. The framework provides systematic design tools for optimization algorithms in control and game-theoretic problems.
comment: This work has been submitted to the IEEE for possible publication. 12 pages, 3 figures
A view on learning robust goal-conditioned value functions: Interplay between RL and MPC
Reinforcement learning (RL) and model predictive control (MPC) offer a wealth of distinct approaches for automatic decision-making under uncertainty. Given the impact both fields have had independently across numerous domains, there is growing interest in combining the general-purpose learning capability of RL with the safety and robustness features of MPC. To this end, this paper presents a tutorial-style treatment of RL and MPC, treating them as alternative approaches to solving Markov decision processes. In our formulation, RL aims to learn a global value function through offline exploration in an uncertain environment, whereas MPC constructs a local value function through online optimization. This local-global perspective suggests new ways to design policies that combine robustness and goal-conditioned learning. Robustness is incorporated into the RL and MPC pipelines through a scenario-based approach. Goal-conditioned learning aims to alleviate the burden of engineering a reward function for RL. Combining the two leads to a single policy that unites a robust, high-level RL terminal value function with short-term, scenario-based MPC planning for reliable constraint satisfaction. This approach leverages the benefits of both RL and MPC, the effectiveness of which is demonstrated on classical control benchmarks.
comment: Postprint; 37 pages
Decentralized CBF-based Safety Filters for Collision Avoidance of Cooperative Missile Systems with Input Constraints
This paper presents a decentralized safety filter for collision avoidance in multi-agent aerospace interception scenarios. The approach leverages robust control barrier functions (RCBFs) to guarantee forward invariance of safety sets under bounded inputs and high-relative-degree dynamics. Each effector executes its nominal cooperative guidance command, while a local quadratic program (QP) modifies the input only when necessary. Event-triggered activation based on range and zero-effort miss (ZEM) criteria ensures scalability by restricting active constraints to relevant neighbors. To resolve feasibility issues from simultaneous constraints, a slack-variable relaxation scheme is introduced that prioritizes critical agents in a Pareto-optimal manner. Simulation results in many-on-many interception scenarios demonstrate that the proposed framework maintains collision-free operation with minimal deviation from nominal guidance, providing a computationally efficient and scalable solution for safety-critical multi-agent aerospace systems.
comment: 7 pages, 5 figures
A Control Allocation Algorithm for Hypersonic Glide Vehicles with Input Limitations
Hypersonic glide vehicles (HGVs) operate in challenging flight regimes characterized by strong nonlinearities in actuation and stringent physical constraints. These include state-dependent actuator limitations, asymmetric control bounds, and thermal loads that vary with maneuvering conditions. This paper introduces an iterative control allocation method to address these challenges in real time. The proposed algorithm searches for control inputs that achieve the desired moment commands while respecting constraints on input magnitude and rate. For slender HGV configurations, thermal loads and drag generation are strongly correlated-lower drag typically results in reduced surface heating. By embedding drag-sensitive soft constraints, the method improves energy efficiency and implicitly reduces surface temperatures, lowering the vehicle's infrared signature. These features are particularly advantageous for long-range military operations that require low observability. The approach is demonstrated using the DLR's Generic Hypersonic Glide Vehicle 2 (GHGV-2) simulation model. The results confirm the method's effectiveness in maintaining control authority under realistic, constrained flight conditions.
comment: 38 pages, 20 figures, submitted to the AIAA Journal of Guidance, Control, and Dynamics
Machine Learning Detection of Lithium Plating in Lithium-ion Cells: A Gaussian Process Approach
Lithium plating during fast charging is a critical degradation mechanism that accelerates capacity fade and can trigger catastrophic safety failures. Recent work has identified a distinctive dQ/dV peak above 4.0 V as a reliable signature of plating onset; however, conventional methods for computing dQ/dV rely on finite differencing with filtering, which amplifies sensor noise and introduces bias in peak location. In this paper, we propose a Gaussian Process (GP) framework for lithium plating detection by directly modeling the charge-voltage relationship Q(V) as a stochastic process with calibrated uncertainty. Leveraging the property that derivatives of GPs remain GPs, we infer dQ/dV analytically and probabilistically from the posterior, enabling robust detection without ad hoc smoothing. The framework provides three key benefits: (i) noise-aware inference with hyperparameters learned from data, (ii) closed-form derivatives with credible intervals for uncertainty quantification, and (iii) scalability to online variants suitable for embedded BMS. Experimental validation on Li-ion coin cells across a range of C-rates (0.2C-1C) and temperatures (0-40\deg C) demonstrates that the GP-based method reliably detects plating peaks under low-temperature, high-rate charging, while correctly reporting no peaks in baseline cases. The concurrence of GP-identified differential peaks, reduced charge throughput, and capacity fade measured via reference performance tests confirms the method's accuracy and robustness, establishing a practical pathway for real-time lithium plating detection.
comment: Submitted to American Control Conference 2026 - ACC 2026
Techno-economic analysis of self-sustainable thermophotovoltaic systems for grid-scale energy generation
To facilitate the widespread adoption of renewable energy, dispatchable, zero-emission power sources are essential for grid stability. This work performs a comprehensive techno-economic analysis of a self-sustainable thermophotovoltaic (TPV) system, an architecture that integrates solar charging to function as a standalone power generation asset. Using theory-based models for conventional air-bridge InGaAs and Si diode cells, our analysis reveals that while the system is not currently competitive from a pure levelized of storage cost (LCOS) perspective due to the high capital expenditure for thermal battery materials, its primary value lies in its competitive levelized cost of electricity (LCOE), which is comparable to that of conventional dispatchable generators such as gas turbines. Furthermore, we show that a full Si-based TPV system, utilizing a 50-{\mu}m-thick air-bridge cell for enhanced photon utilization, can also achieve an LCOE that is competitive with such conventional power sources at scales exceeding the gigawatt-hour level, despite its lower conversion efficiency relative to its InGaAs counterpart. This highlights a practical engineering pathway for leveraging the immense manufacturing scalability of Si, offering a lower-risk route to deployment compared to III-V materials. Ultimately, this work establishes the self-sustainable TPV architecture as a compelling pathway toward providing grid-scale, on-demand, zero-emission power.
comment: 27 pages, 6 figures, 1 table
Transferable Parasitic Estimation via Graph Contrastive Learning and Label Rebalancing in AMS Circuits
Graph representation learning on Analog-Mixed Signal (AMS) circuits is crucial for various downstream tasks, e.g., parasitic estimation. However, the scarcity of design data, the unbalanced distribution of labels, and the inherent diversity of circuit implementations pose significant challenges to learning robust and transferable circuit representations. To address these limitations, we propose CircuitGCL, a novel graph contrastive learning framework that integrates representation scattering and label rebalancing to enhance transferability across heterogeneous circuit graphs. CircuitGCL employs a self-supervised strategy to learn topology-invariant node embeddings through hyperspherical representation scattering, eliminating dependency on large-scale data. Simultaneously, balanced mean squared error (BMSE) and balanced softmax cross-entropy (BSCE) losses are introduced to mitigate label distribution disparities between circuits, enabling robust and transferable parasitic estimation. Evaluated on parasitic capacitance estimation (edge-level task) and ground capacitance classification (node-level task) across TSMC 28nm AMS designs, CircuitGCL outperforms all state-of-the-art (SOTA) methods, with the $R^2$ improvement of $33.64\% \sim 44.20\%$ for edge regression and F1-score gain of $0.9\times \sim 2.1\times$ for node classification. Our code is available at https://github.com/ShenShan123/CircuitGCL.
comment: Final version accepted by the International Conference on Computer-Aided Design (ICCAD) 2025. First two authors have equal contributions
Geometry of Distance Protection
Distance relays detect faults on transmission lines. They face uncertainty from the fault's location and resistance, as well as the current from the line's remote terminal. In this paper, we aggregate this uncertainty with the Minkowski sum. This allows us to explicitly model the power grid surrounding the relay's line, and in turn accommodate any mix of synchronous machines and inverter-based resources. To make the relay's task easier, inverters can inject perturbations, or auxiliary signals, such as negative-sequence current. We use Farkas' lemma to construct an optimization for designing inverter auxiliary signals.
A Real-Time System for Scheduling and Managing UAV Delivery in Urban Areas
As urban logistics demand continues to grow, UAV delivery has become a key solution to improve delivery efficiency, reduce traffic congestion, and lower logistics costs. However, to fully leverage the potential of UAV delivery networks, efficient swarm scheduling and management are crucial. In this paper, we propose a real-time scheduling and management system based on the ``Airport-Unloading Station" model, aiming to bridge the gap between high-level scheduling algorithms and low-level execution systems. This system, acting as middleware, accurately translates the requirements from the scheduling layer into specific execution instructions, ensuring that the scheduling algorithms perform effectively in real-world environments. Additionally, we implement three collaborative scheduling schemes involving autonomous ground vehicles (AGVs), unmanned aerial vehicles (UAVs), and ground staff to further optimize overall delivery efficiency. Through extensive experiments, this study demonstrates the rationality and feasibility of the proposed management system, providing practical solution for the commercial application of UAVs delivery in urban. Code: https://github.com/chengji253/UAVDeliverySystem
Coordinated Control of Deformation and Flight for Morphing Aircraft via Meta-Learning and Coupled State-Dependent Riccati Equations
In this paper, the coordinated control problem of deformation and flight for morphing aircraft (MA) is studied by using meta-learning (ML) and coupled state-dependent Riccati equations (CSDREs). Our method is built on two principal observations that dynamic models of MA under varying morphing conditions share a morphing condition independent representation function and that the specific morphing condition part lies in a set of linear coefficients. To that end, the domain adversarially invariant meta-learning (DAIML) is employed to learn the shared representation with offline flight data. Based on the learned representation function, the coordinated control of the deformation and flight for MA is formulated as a non-cooperative differential game. The state-dependent feedback control solutions can be derived by addressing a pair of CSDREs. For this purpose, Lyapunov iterations are extended to obtain the positive semidefinite (definite) stabilizing solutions of the CSDREs, and the convergence proof of the proposed algorithm is provided. Finally, a simulation study is carried out to validate the efficacy of the developed coordinated game control strategies.
Modeling and Simulation of an Active Car Suspension with a Robust LQR Controller under Road Disturbance, Parameter Uncertainty and White Noise
Vehicle suspension is important for passengers to travel comfortably and to be less exposed to effects such as vibration and shock. A good suspension system increases the road holding of vehicles, allows them to take turns safely, and reduces the risk of traffic accidents. A passive suspension system is the most widely used suspension system in vehicles due to its simple structure and low cost. Passive suspension systems do not have an actuator and therefore do not have a controller. Active suspension systems have an actuator and a controller. Although their structures are more complex and costly, they are safer. PID controller is widely used in active suspension systems due to its simple structure, reasonable cost, and easy adjustment of coefficients. In this study, a more robust LQR-controlled active suspension was designed than a passive suspension and a PID-controlled active suspension. Robustness analyses were performed for passive suspension, PID-controlled active suspension, and LQR-controlled active suspension. Suspension travel, sprung mass acceleration, and sprung mass motion simulations were performed for all three suspensions under road disturbance, under simultaneous road disturbance and parameter uncertainty and under road disturbance with white noise. A comparative analysis was performed by obtaining the rise time, overshoot, and settling time data of the suspensions under different conditions. It was observed that the LQR-controlled active suspension showed the fastest rise time, the least overshoot and had the shortest settling time. In this case, it was proven that the LQR controlled active suspension provided a more comfortable and safe ride compared to the other two suspension systems.
comment: 20 pages, 19 figures
Resilient Multi-Dimensional Consensus and Distributed Optimization against Agent-Based and Denial-of-Service Attacks
In this paper, we consider the resilient multi-dimensional consensus and distributed optimization problems of multi-agent systems (MASs) in the presence of both agent-based and denial-of-service (DoS) attacks. The considered agent-based attacks can cover malicious, Byzantine, and stubborn agents. The links between agents in the network can be blocked by DoS attacks, which may lead the digraph to be time-varying and even disconnected. The objective is to ensure that the remaining benign agents achieve consensus. To this end, an "auxiliary point"-based resilient control algorithm is proposed for MASs. Under the proposed algorithm, each healthy agent constructs a "safe kernel" utilizing the states of its in-neighbors and updates its state toward a specific point within this kernel at each iteration. If an agent cannot receive its neighbors' states owing to DoS attacks, it will use the states received immediately before the DoS period. Moreover, a resilient multi-dimensional distributed optimization (RMDO) algorithm is also proposed. Theoretical proofs and numerical examples are presented to demonstrate the effectiveness of the proposed algorithms.
Learning a Shape-adaptive Assist-as-needed Rehabilitation Policy from Therapist-informed Input
Therapist-in-the-loop robotic rehabilitation has shown great promise in enhancing rehabilitation outcomes by integrating the strengths of therapists and robotic systems. However, its broader adoption remains limited due to insufficient safe interaction and limited adaptation capability. This article proposes a novel telerobotics-mediated framework that enables therapists to intuitively and safely deliver assist-as-needed~(AAN) therapy based on two primary contributions. First, our framework encodes the therapist-informed corrective force into via-points in a latent space, allowing the therapist to provide only minimal assistance while encouraging patient maintaining own motion preferences. Second, a shape-adaptive ANN rehabilitation policy is learned to partially and progressively deform the reference trajectory for movement therapy based on encoded patient motion preferences and therapist-informed via-points. The effectiveness of the proposed shape-adaptive AAN strategy was validated on a telerobotic rehabilitation system using two representative tasks. The results demonstrate its practicality for remote AAN therapy and its superiority over two state-of-the-art methods in reducing corrective force and improving movement smoothness.
Hierarchical Analysis and Control of Epidemic Spreading over Networks using Dissipativity and Mesh Stability
Analyzing and controlling spreading processes are challenging problems due to the involved non-linear node (subsystem) dynamics, unknown disturbances, complex interconnections, and the large-scale and multi-level nature of the problems. The dissipativity concept provides a practical framework for addressing such concerns, thanks to the energy-based representation it offers for subsystems and the compositional properties it provides for the analysis and control of interconnected (networked) systems comprised of such subsystems. Therefore, in this paper, we utilize the dissipativity concept to analyze and control a spreading process that occurs over a hierarchy of nodes, groups, and a network (i.e., a spreading network). We start by generalizing several existing results on dissipativity-based topology design for networked systems. Next, we model the considered spreading network as a networked system and establish the dissipativity properties of its nodes. The generalized topology design method is then applied at multiple levels of the considered spreading network to formulate its analysis and control problems as Linear Matrix Inequality (LMI) problems. We identify and enforce localized necessary conditions to support the feasibility of the LMI problem solved at each subsequent hierarchical level of the spreading network. Consequently, the proposed method does not involve iterative multi-level optimization stages that are computationally inefficient. The proposed control solution ensures that the spreading network is not only stable but also dissipative and mesh-stable. Compared to conventional methods, such as threshold pruning and high-degree edge removal, our approach offers superior performance in terms of infection containment, control efficiency, and disturbance robustness. Extensive numerical results demonstrate the effectiveness of the proposed technique.
comment: To be submitted to Automatica
Connecting the Equinoctial Elements and Rodrigues Parameters: A New Set of Elements
A geometric interpretation of the equinoctial elements is given with a connection to orthogonal rotations and attitude dynamics in Euclidean 3-space. An identification is made between the equinoctial elements and classic Rodrigues parameters. A new set of equinoctial elements are developed using the modified Rodrigues parameters, thereby removing the coordinate singularity for retrograde equatorial orbits present in previous versions of these elements. A low-thrust trajectory optimization problem is set up using the new elements to numerically verify convergence for the two-point boundary problem, as compared to their predecessors.
comment: formatting corrected for better readability
Optimal Assignment and Motion Control in Two-Class Continuum Swarms
We consider optimal swarm control problems where two different classes of agents are present. Continuum idealizations of large-scale swarms are used where the dynamics describe the evolution of the spatially-distributed densities of each agent class. The problem formulation we adopt is motivated by applications where agents of one class are assigned to agents of the other class, which we refer to as demand and resource agents respectively. Assignments have costs related to the distances between mutually assigned agents, and the overall cost of an assignment is quantified by a Wasserstein distance between the densities of the two agent classes. When agents can move, the assignment cost can decrease at the expense of a physical motion cost, and this tradeoff sets up a nonlinear infinite-dimensional optimal control problem. We show that in one spatial dimension, this problem can be converted to an infinite-dimensional, but decoupled, linear-quadratic (LQ) tracking problem when expressed in terms of the quantile functions of the respective agent densities. Solutions are given in the general one-dimensional case, as well as in the special cases of constant and periodically time-varying demands.
comment: Extended version including periodic-demand case. 13 pages, 7 figures
Data-driven Model Predictive Control using MATLAB
This paper presents a comprehensive overview of data-driven model predictive control, highlighting state-of-the-art methodologies and their numerical implementation. The discussion begins with a brief review of conventional model predictive control (MPC), which discusses both linear MPC (LMPC) and nonlinear MPC (NMPC). This is followed by a section on data-driven LMPC, outlining fundamental concepts and the implementation of various approaches, including subspace predictive control and prediction error methods. Subsequently, the focus shifts to data-driven NMPC, emphasizing approaches based on neural network models. The paper concludes with a review of recent advancements in data-driven MPC and explores potential directions for future research.
comment: 22 pages, 8 figures
Systems and Control (EESS)
Robust reset control design for piezo-actuated nano-positioner in presence of hysteresis nonlinearity
In this paper, a robust nonlinear control scheme is designed for the motion control of a class of piezo-actuated nano-positioning systems using frequency-domain analysis. The hysteresis, the nonlinearity in the piezoelectric material, degrades the precision in tracking references with high frequency contents and different travel ranges. The hysteresis compensation by the inverse model, as the state-of-the-art solution, is not reliable alone. Therefore, a control framework with robustness against the remaining nonlinearity is needed. It is shown that there is an unavoidable limitation in robust linear control design to improve the performance. A robust control methodology based on a complex-order element is established to relax the limitation. Then, a constant-in-gain-lead-in-phase (CgLp) reset controller is utilized to realize the complex-order control. The control design is based on the sinusoidal input describing function (SIDF) and the higher-order SIDF (HOSIDF) tools. A constrained optimization problem is provided to tune the control parameters. The achieved improvements by the CgLp control is validated by the simulation.
Demystifying and Navigating AI Ethics in Power Electronics
Artificial intelligence (AI) is rapidly transforming power electronics, with AI-related publications in IEEE Power Electronics Society selected journals increasing more than fourfold from 2020 to 2025. However, the ethical dimensions of this transformation have received limited attention. This article underscores the urgent need for an ethical framework to guide responsible AI integration in power electronics, not only to prevent AI-related incidents but also to comply with legal and regulatory responsibilities. In this context, this article identifies four core pillars of AI ethics in power electronics: Security & Safety, Explainability & Transparency, Energy Sustainability, and Evolving Roles of Engineers. Each pillar is supported by practical and actionable insights to ensure that ethical principles are embedded in algorithm design, system deployment, and workforce development. The authors advocate for power electronics engineers to lead the ethical discourse, given their deep technical understanding of both AI systems and power conversion technologies. The paper concludes by calling on the IEEE Power Electronics Society to spearhead the establishment of ethical standards and best practices that ensure AI innovations are not only technically advanced but also trustworthy, safe, and sustainable.
Critical States Identiffcation in Power System via Lattice Partition and Its Application in Reliability Assessment
With the increasing complexity of power systems,accurately identifying critical states (the states corresponding to minimal cut sets) and assessing system reliability have become crucial tasks. In this paper, a mathematical lattice structure is employed to represent and partition the state space of power system. Based on this structure, a novel recursive method is proposed to efffciently identify critical states by leveraging lattice partitioning and Optimal Power Flow(OPF) calculations. This method not only enables the extension of failure system states,but also calculates the upper and lower bounds of the Loss of Load Probability (LOLP) in a progressively converging manner. Compared to traditional reliability assessment methods such as State Enumeration (SE) and Monte Carlo Simulation (MCS), this approach offers greater accuracy and efffciency. Experiments conducted on the RBTS and RTS79 systems demonstrate that the proposed method accurately identiffes all critical states up to a preset order, which are high-risk states. The contribution of these critical states to LOLP highlights their signiffcance in the system. Moreover, the proposed method achieves the analytical value with signiffcantly fewer OPF calculations in RBTS system, reaching acceptable precision of LOLP up to 100 times faster than SE in both the RBTS and RTS systems.
Grid-forming Control of Converter Infinite Bus System: Modeling by Data-driven Methods
This study explores data-driven modeling techniques to capture the dynamics of a grid-forming converter-based infinite bus system, critical for renewable-integrated power grids. Using sparse identification of nonlinear dynamics and deep symbolic regression, models were generated from synthetic data simulating key disturbances in active power, reactive power, and voltage references. Deep symbolic regression demonstrated more accuracy in capturing complex system dynamics, though it required substantially more computational time than sparse identification of nonlinear dynamics. These findings suggest that while deep symbolic regression offers high fidelity, sparse identification of nonlinear dynamics provides a more computationally efficient approach, balancing accuracy and runtime for real-time grid applications.
3C Resources Joint Allocation for Time-Deterministic Remote Sensing Image Backhaul in the Space-Ground Integrated Network
Low-Earth-orbit (LEO) satellites assist observation satellites (OSs) to compress and backhaul more time-determined images (TDI) has become a new paradigm, which is used to enhance the timeout caused by the limited computing resources of OSs. However, how to capture the time-varying and dynamic characteristics of multi-dimensional resources is challenging for efficient collaborative scheduling. Motivated by this factor, we design a highly succinct multi-dimensional resource time-expanded graph (MDR-TEG) modell. Specifically, by employing a slots division mechanism and introducing an external virtual node, the time-varying communication, caching, and computing (3C) resources are depicted in low complexity by the link weights within, between, and outside the slots. Based on the MDR-TEG, the maximizing successful transmission ratio of TDI (MSTR-TDI) is modeled as a mixed integer linear programming (MILP) problem. Which further relaxed decomposed into two tractable sub-problems: maximizing the successful transmission rate of images (MSTRI) and ensuring the timeliness problem (ETP). Subsequently, an efficient subgradient of relaxation computing constraint (SRCC) algorithm is proposed. The upper and lower bounds of MSTR-TDI are obtained by solving the two subproblems and the dual problem (DP), and the direction of the next iteration is obtained by feedback. Furthermore, arranging the sending sequences of images to improve the quality of the solution. The approximate optimal solution of MSTR-TDI is eventually obtained through repeated iterations. The simulation results verify the superiority of the proposed MDR-TEG model and the effectiveness of the SRCC.
Task-Level Insights from Eigenvalues across Sequence Models
Although softmax attention drives state-of-the-art performance for sequence models, its quadratic complexity limits scalability, motivating linear alternatives such as state space models (SSMs). While these alternatives improve efficiency, their fundamental differences in information processing remain poorly understood. In this work, we leverage the recently proposed dynamical systems framework to represent softmax, norm and linear attention as dynamical systems, enabling a structured comparison with SSMs by analyzing their respective eigenvalue spectra. Since eigenvalues capture essential aspects of dynamical system behavior, we conduct an extensive empirical analysis across diverse sequence models and benchmarks. We first show that eigenvalues influence essential aspects of memory and long-range dependency modeling, revealing spectral signatures that align with task requirements. Building on these insights, we then investigate how architectural modifications in sequence models impact both eigenvalue spectra and task performance. This correspondence further strengthens the position of eigenvalue analysis as a principled metric for interpreting, understanding, and ultimately improving the capabilities of sequence models.
MPA-DNN: Projection-Aware Unsupervised Learning for Multi-period DC-OPF
Ensuring both feasibility and efficiency in optimal power flow (OPF) operations has become increasingly important in modern power systems with high penetrations of renewable energy and energy storage. While deep neural networks (DNNs) have emerged as promising fast surrogates for OPF solvers, they often fail to satisfy critical operational constraints, especially those involving inter-temporal coupling, such as generator ramping limits and energy storage operations. To deal with these issues, we propose a Multi-Period Projection-Aware Deep Neural Network (MPA-DNN) that incorporates a projection layer for multi-period dispatch into the network. By doing so, our model enforces physical feasibility through the projection, enabling end-to-end learning of constraint-compliant dispatch trajectories without relying on labeled data. Experimental results demonstrate that the proposed method achieves near-optimal performance while strictly satisfying all constraints in varying load conditions.
Data-Driven Control Of Power Converters
The fundamental role of power converters is to efficiently manage and control the flow of electrical energy, ensuring compatibility between power sources and loads. All these applications of power converters need the design of an appropriate control law. Control of power converters is a challenging problem due to the presence of switching devices which are difficult to handle using traditional control approaches. The objective of this paper is to investigate the use of data-driven techniques, in particular the Virtual References Feedback Tuning (VRFT) method, in the context of power converters feedback control. This study considers a buck \pauline{mode} power converter circuit provided by the OwnTech foundation.
comment: Conference paper for the french national electrical engineering symposium, SGE 2025
Weighting Factors Tuning by Direct Feedback in Predictive Control of Multiphase Motors
Predictive Stator Current Control (PSCC) has been proposed for control of multi-phase drives. The flexibility offered by the use of a Cost Function has been used to deal with the increased number of phases. However, tuning of the Weighting Factors constitutes a problem. Intensive trial and error tests are usual in this context. Existing on-line selection methods, on the other hand, require large amounts of data and/or complex optimization procedures. The proposal of this paper is a closed-loop scheme that links Weighting Factors to performance indicators. In this way, optimal Weighting Factors are determined for each operating point. Also, changes in reference values for performance indicators are easily tackled. Unlike previous methods, the proposal carries very little computational burden. A case study is developed for a five-phase induction motor and assessed with real experimentation on a laboratory set-up.
Safety Analysis of eVTOL Operations based on STPA
Electric Vertical Take-Off and Landing (eVTOL) aircraft are expected to be quieter and more cost-effective than helicopters, offering major economic and social benefits through improved connectivity. Their adoption will require new ground infrastructure and airspace redesign, introducing risks involving multiple stakeholders (Regulators, eVTOL operators, Air navigation service providers, Vertiport operators, OEMs, Pilots, etc.). To assess these risks for the UK airspace, systems-thinking based System Theoretic Process Analysis (STPA) was conducted. To manage the large number of Unsafe Control Actions (UCAs) and requirements generated due to the complexity of the analysis, a novel extension to STPA for the prioritization of results was applied. 317 UCAs were identified in total out of which 110 high-priority UCAs were analyzed (Step-4), resulting in 377 causal factors and 432 requirements. These were prioritized to produce a targeted list of 124 distinct high-priority requirements, 56 of which were identified as gaps in existing aviation regulations, policies, or procedures.. These highlight opportunities for regulatory updates in areas such as organizational performance, certification processes, training, collision avoidance, energy management, and automation. The findings provide regulators with safety considerations that could shape new or updated regulations, compliance methods, and guidance materials for the safe deployment of eVTOLs.
Single vs Multi Vector Predictive Control of Five-phase Drives
The field of Finite State Model Predictive Control for multiphase drives has produced many contributions. Many variants of FSMPC exist, each aiming at some aspect such as complexity of the cost function, switching frequency, etc. Despite past efforts to compare different techniques, the field is still out of consensus regarding the relative merits of each one. This paper presents a new method to compare FSMPC variants. The method is based on analyzing the modulation, implicit or explicit, used by each variant. In the paper the method is used to compare single-vector state-of-the-art FSMPC with a multi-vector variant designed to cancel xy currents and simplify the cost function. The results show the strengths and weaknesses of each technique. Also, it is found that the trade-offs between figures, previously thought to concern just individual regimes, extend to the whole operating space and also can be pinpoint to each FSMPC variant. Finally, it is shown that the flexibility of the single-vector approach and its better DC-link usage makes it, arguably, superior over the multi-vector variant.
Robust Adaptive Boundary Control of a Thermal Process with Thermoelectric Actuators: Theory and Experimental Validation
A sliding-mode-based adaptive boundary control law is proposed for a class of uncertain thermal reaction-diffusion processes subject to matched disturbances. The disturbances are assumed to be bounded, but the corresponding bounds are unknown, thus motivating the use of adaptive control strategies. A boundary control law comprising a proportional and discontinuous term is proposed, wherein the magnitude of the discontinuous relay term is adjusted via a gradient-based adaptation algorithm. Depending on how the adaptation algorithm is parameterized, the adaptive gain can be either a nondecreasing function of time (monodirectional adaptation) or it can both increase and decrease (bidirectional adaptation). The convergence and stability properties of these two solutions are investigated by Lyapunov analyses, and two distinct stability results are derived, namely, asymptotic stability for the monodirectional adaptation and globally uniformly ultimately bounded solutions for the bidirectional adaptation. The proposed algorithms are then specified to address the control problem of stabilizing a desired temperature profile in a metal beam equipped with thermoelectric boundary actuators. Experiments are conducted to investigate the real-world performance of the proposed sliding-mode-based adaptive control, with a particular focus on comparing the monodirectional and bidirectional adaptation laws.
comment: Extended version of the preprint submitted to the journal Automatica
Antenna's Performance in Microwave Imaging of Stratified Media
Numerous types of antennas have been employed for microwave imaging of stratified media for ground penetrating radar (GPR), through-the-wall-radar imaging (TWRI), etc. This letter aims to investigate the impact of the different antennas with their characteristics on the image reconstruction of those media. Hence, three types of antennas, including horn antennas, open waveguide and Vivaldi antennas, are chosen as almost directional antennas, operating at X-band 8-12 GHz. The antenna's far-field and near-field characteristics are analyzed. A diffraction tomography (DT)-based algorithm is used to reconstruct the target location within the stratified media using monostatic and multistatic data. It is observed that the more directional antennas provide a better-reconstructed image with less shadowing image of the stratified media.
Sensing, Detection and Localization for Low Altitude UAV: A RF-Based Framework via Multiple BSs Collaboration
The rapid growth of the low-altitude economy has resulted in a significant increase in the number of Low, slow, and small (LLS) unmanned aerial vehicles (UAVs), raising critical challenges for secure airspace management and reliable trajectory planning. To address this, this paper proposes a cooperative radio-frequency (RF) detection and localization framework that leverages existing cellular base stations. The proposed approach features a robust scheme for LSS target identification, integrating a cell averaging-constant false alarm rate (CA-CFAR) detector with a micro-Doppler signature (MDS) based recognition method. Multi-station measurements are fused through a grid-based probabilistic algorithm combined with clustering techniques, effectively mitigating ghost targets and improving localization accuracy in multi-UAV scenarios. Furthermore, the Cramer-Rao lower bound (CRLB) is derived as a performance benchmark and reinforcement learning (RL)-based optimization is employed to balance localization accuracy against station resource usage. Simulations demonstrate that increasing from one to multiple BSs reduces the positioning error to near the CRLB, while practical experiments further verify the framework's effectiveness. Furthermore, our RL-based optimization can find solutions that maintain high accuracy while minimizing resource usage, highlighting its potential as a scalable solution for ensuring airspace safety in the emerging low-altitude economy.
MAKO: Meta-Adaptive Koopman Operators for Learning-based Model Predictive Control of Parametrically Uncertain Nonlinear Systems
In this work, we propose a meta-learning-based Koopman modeling and predictive control approach for nonlinear systems with parametric uncertainties. An adaptive deep meta-learning-based modeling approach, called Meta Adaptive Koopman Operator (MAKO), is proposed. Without knowledge of the parametric uncertainty, the proposed MAKO approach can learn a meta-model from a multi-modal dataset and efficiently adapt to new systems with previously unseen parameter settings by using online data. Based on the learned meta Koopman model, a predictive control scheme is developed, and the stability of the closed-loop system is ensured even in the presence of previously unseen parameter settings. Through extensive simulations, our proposed approach demonstrates superior performance in both modeling accuracy and control efficacy as compared to competitive baselines.
Trust Modeling and Estimation in Human-Autonomy Interactions
Advances in the control of autonomous systems have accompanied an expansion in the potential applications for autonomous robotic systems. The success of applications involving humans depends on the quality of interaction between the autonomous system and the human supervisor, which is particularly affected by the degree of trust that the supervisor places in the autonomous system. Absent from the literature are models of supervisor trust dynamics that can accommodate asymmetric responses to autonomous system performance and the intermittent nature of supervisor-autonomous system communication. This paper focuses on formulating an estimated model of supervisor trust that incorporates both of these features by employing a switched linear system structure with event-triggered sampling of the model input and output. Trust response data collected in a user study with 51 participants were then used identify parameters for a switched linear model-based observer of supervisor trust.
comment: 10 pages. 13 figures
Traffic-Aware Eco-Driving Control in CAVs via Learning-based Terminal Cost Model
Connected and Automated Vehicles (CAVs) offer significant potential for improving energy efficiency and lowering vehicle emissions through eco-driving technologies. Control algorithms in CAVs leverage look-ahead route information and Vehicle-to-Everything (V2X) communication to optimize vehicle performance. However, existing eco-driving strategies often neglect macroscopic traffic effects, such as upstream traffic jams, that occur outside the optimization horizon but significantly impact vehicle energy efficiency. This work presents a novel Neural Network (NN)-based methodology to approximate the terminal cost within a model predictive control (MPC) problem framework, explicitly incorporating upstream traffic dynamics. By incorporating traffic jams into the optimization process, the proposed traffic-aware approach yields more energy-efficient speed trajectories compared to traffic-agnostic methods, with minimal impact on travel time. The framework is scalable for real-time implementation while effectively addressing uncertainties from dynamic traffic conditions and macroscopic traffic events.
Direct Data-Driven Predictive Control for a Three-dimensional Cable-Driven Soft Robotic Arm
Soft robots offer significant advantages in safety and adaptability, yet achieving precise and dynamic control remains a major challenge due to their inherently complex and nonlinear dynamics. Recently, Data-enabled Predictive Control (DeePC) has emerged as a promising model-free approach that bypasses explicit system identification by directly leveraging input-output data. While DeePC has shown success in other domains, its application to soft robots remains underexplored, particularly for three-dimensional (3D) soft robotic systems. This paper addresses this gap by developing and experimentally validating an effective DeePC framework on a 3D, cable-driven soft arm. Specifically, we design and fabricate a soft robotic arm with a thick tubing backbone for stability, a dense silicone body with large cavities for strength and flexibility, and rigid endcaps for secure termination. Using this platform, we implement DeePC with singular value decomposition (SVD)-based dimension reduction for two key control tasks: fixed-point regulation and trajectory tracking in 3D space. Comparative experiments with a baseline model-based controller demonstrate DeePC's superior accuracy, robustness, and adaptability, highlighting its potential as a practical solution for dynamic control of soft robots.
A pilot cohort study of a microfluidic-based point-of-care bilirubin measurement system
Objective The concentration of bilirubin in blood or serum is useful for assessing liver function as well as monitoring treatment. This study evaluates the clinical performance of a novel point-of-care (PoC) device for the detection of bilirubin in serum. The PoC device incorporates an integrated miniature optoelectronic sensing module and a microfluidic test cartridge. Methods Patients' serum total bilirubin concentrations, ranging from 2 {\mu}mol/L to 480 {\mu}mol/L, were measured using the PoC device and the standard laboratory method (n=20). Bland-Altman analysis and regression analysis using Passing-Bablok method were used to benchmark the PoC device against the standard laboratory measurements. The diagnostic capability of the PoC device in categorising the serum samples within clinically relevant bilirubin concentration thresholds of 200, 300, and 450 {\mu}mol/L was assessed using receiver operating characteristic (ROC) analysis. Results The mean difference between the PoC device and the standard laboratory method was -5.6 {\mu}mol/L, with a 95% confidence interval (CI) of -45.1 {\mu}mol/L to 33.9 {\mu}mol/L. The coefficient of determination (R2) was 0.986. The PoC device achieved a detection sensitivity of 90% and specificity of 97% in categorising bilirubin concentrations within bands used in clinical decision-making. Conclusions This study demonstrates that the proposed PoC device is capable of measuring bilirubin levels in patient samples with clinically acceptable accuracy.
Cognitive Radio for Asymmetric Cellular Downlink with Multi-User MIMO
Cognitive radio (CR) is an important technique for improving spectral efficiency, letting a secondary system operate in a wireless spectrum when the primary system does not make use of it. While it has been widely explored over the past 25 years, many common assumptions are not aligned with the realities of 5G networks. In this paper, we consider the CR problem for the following setup: (i) infrastructure-based systems, where downlink transmissions might occur to receivers whose positions are not, or not exactly, known; (ii) multi-beam antennas at both primary and secondary base stations. We formulate a detailed protocol to determine when secondary transmissions into different beam directions can interfere with primary users at potential locations and create probability-based interference rules. We then analyze the "catastrophic interference" probability and the "missed transmission opportunity" probability, as well as the achievable throughput, as a function of the transmit powers of the primary and secondary base stations and the sensing window of the secondary base station. Results can serve to more realistically assess the spectral efficiency gains in 5G infrastructure-based cognitive systems.
Observation Matrix Design for Densifying MIMO Channel Estimation via 2D Ice Filling
In recent years, densifying multiple-input multiple-output (MIMO) has attracted much attention from the communication community. Thanks to the subwavelength antenna spacing, the strong correlations among densifying antennas provide sufficient prior knowledge about channel state information (CSI). This inspires the careful design of observation matrices (e.g., transmit precoders and receive combiners), that exploits the CSI prior knowledge, to boost channel estimation performance. Aligned with this vision, this work proposes to jointly design the combiners and precoders by maximizing the mutual information between the received pilots and densifying MIMO channels. A two-dimensional ice-filling (2DIF) algorithm is proposed to efficiently accomplish this objective. The algorithm is motivated by the fact that the eigenspace of MIMO channel covariance can be decoupled into two sub-eigenspaces, which are associated with the correlations of transmitter antennas and receiver antennas, respectively. By properly setting the precoder and the combiner as the eigenvectors from these two sub-eigenspaces, the 2DIF promises to generate near-optimal observation matrices. Moreover, we further extend the 2DIF method to the popular hybrid combining systems, where a two-stage 2DIF (TS-2DIF) algorithm is developed to handle the analog combining circuits realized by phase shifters. Simulation results demonstrate that, compared to the state-of-the-art schemes, the proposed 2DIF and TS-2DIF methods can achieve superior channel estimation accuracy.
comment: 17 pages, 8 figures
Computing Safe Control Inputs using Discrete-Time Matrix Control Barrier Functions via Convex Optimization
Control barrier functions (CBFs) have seen widespread success in providing forward invariance and safety guarantees for dynamical control systems. A crucial limitation of discrete-time formulations is that CBFs that are nonconcave in their argument require the solution of nonconvex optimization problems to compute safety-preserving control inputs, which inhibits real-time computation of control inputs guaranteeing forward invariance. This paper presents a novel method for computing safety-preserving control inputs for discrete-time systems with nonconvex safety sets, utilizing convex optimization and the recently developed class of matrix control barrier function techniques. The efficacy of our methods is demonstrated through numerical simulations on a bicopter system.
comment: 17 pages, 8 figures
Cyber-Physical Systems on the Megawatt Scale: The impact of battery control on grid frequency stability
Electric power systems are undergoing fundamental change. The shift to inverter-based generation challenges frequency stability, while growing digitalisation heightens vulnerability to errors and attacks. Here we identify an emerging risk at the intersection of cyber-physical coupling and control system design. We show that grid frequency time series worldwide exhibit a persistent one-minute oscillatory pattern, whose origin has remained largely unexplained. We trace this pattern back to the energy management systems of battery electric storage systems and demonstrate that the pattern amplitude has increased substantially in the Nordic and British grids. We argue that this effect is a potential burden for stability in future grids with low inertia and an increasing penetration with batteries and smart devices, though it can be mitigated by a revision of battery control algorithms.
comment: 19 pages, 23 figures
Latent-Feature-Informed Neural ODE Modeling for Lightweight Stability Evaluation of Black-box Grid-Tied Inverters
Stability evaluation of black-box grid-tied inverters is vital for grid reliability, yet identification techniques are both data-hungry and blocked by proprietary internals. {To solve this, this letter proposes a latent-feature-informed neural ordinary differential equation (LFI-NODE) modeling method that can achieve lightweight stability evaluation directly from trajectory data.} LFI-NODE parameterizes the entire system ODE with a single continuous-time neural network, allowing each new sample to refine a unified global model. It faithfully captures nonlinear large-signal dynamics to preserve uniform predictive accuracy as the inverter transitions between operating points. Meanwhile, latent perturbation features distilled from every trajectory steer the learning process and concurrently reveal the small-signal eigenstructure essential for rigorous stability analysis. Validated on a grid-forming inverter, {The LFI-NODE requires one to two orders of magnitude fewer training samples compared with traditional methods, collected from short time-domain trajectories instead of extensive frequency-domain measurements.} {Furthermore, the LFI-NODE requires only 48 short transients to achieve a trajectory prediction error at the hundredth level and an eigenvalue estimation error at the tenth level, outperforming benchmark methods by one to two orders of magnitude.} This makes LFI-NODE a practical and lightweight approach for achieving high-fidelity stability assessment of complex black-box power-electronic systems.
comment: 6 pages 8fugures
Designing Control Barrier Functions Using a Dynamic Backup Policy
This paper presents a systematic approach to construct control barrier functions for nonlinear control affine systems subject to arbitrary state and input constraints. Taking inspiration from the reference governor literature, the proposed method defines a family of backup policies, parametrized by the equilibrium manifold of the system. The control barrier function is defined on the augmented state-and-reference space: given a state-reference pair, the approach quantifies the distance to constraint violation at any time in the future, should the current backup policy reference remain constant. Sensitivity analysis is then used to compute the (possibly nonsmooth) Jacobian with respect to the augmented state vector. To showcase its simple yet general nature, the proposed method is applied to an inverted pendulum on cart.
comment: 7 pages, 1 figure
Science ouverte et collaborative pour l'élaboration d'un banc automatisé de caractérisation de pertes en commutation par opposition
The switching losses of power transistors are generally measured using the so-called double pulse method. Measuring the opposition of two switching cells is a complementary method that is more accurate but indirect. However, implementing this method can be more complex and requires calibration steps and comprehensive control, with the added issue of thermal management. In this context, we proposed to address this topic through open and collaborative science, first in the form of a two-day hackathon, followed by monthly open sessions. More than 20 participants contributed to the two-day hackathon, followed by monthly sessions for those wishing to continue working together. This enabled us to set up an automated bench, in open science, including the generation of switching commands, the configuration and control of measuring instruments, and the hardware part. Here we present and share our work and this open approach.
comment: Paper in french, presented at the french national electrical engineering conference SGE 2025
Robustness Analysis for Quantum Systems Controlled by Continuous-Time Pulses
Differential sensitivity techniques originally developed to study the robustness of energy landscape controllers are generalized to the important case of closed quantum systems subject to continuously varying controls. Vanishing sensitivity to parameter variation is shown to coincide with perfect fidelity, as was the case for time-invariant controls. Upper bounds on the magnitude of the differential sensitivity to any parameter variation are derived based simply on knowledge of the system Hamiltonian and the maximum size of the control inputs.
comment: 6 pages, 2 figures
Safe and Optimal N-Spacecraft Swarm Reconfiguration in Non-Keplerian Cislunar Orbits
This paper presents a novel fuel-optimal guidance and control methodology for spacecraft swarm reconfiguration in Restricted Multi-Body Problems (RMBPs) with a guarantee of passive safety, maintaining miss distance even under abrupt loss of control authority. A new set of constraints exploits a quasi-periodic structure of RMBPs to guarantee passive safety. Particularly, the condition for passive safety is expressed as simple geometric constraints by solving optimal control in Local Toroidal Coordinates, which is based on a local eigenspace of a quasi-periodic motion around the corresponding periodic orbit. The proposed formulation enables a significant simplification of problem structure, which is applicable to large-scale swarm reconfiguration in cislunar orbits. The method is demonstrated in the Circular Restricted Three-Body Problem, the Elliptic Restricted Three-Body Problem, and the Bi-Circular Restricted Four-Body Problem. Furthermore, the optimized control profiles are validated in the full-ephemeris dynamics model. By extending and generalizing well-known concepts of relative orbital elements within the restricted two-body problem to the three- and four-body problems, this paper lays the foundation for practical control schemes of relative motion in cislunar space.
comment: 41 pages, 19 figures. Submitted and accepted to Journal of Guidance, Control, and Dynamics
A Digital Twin for Diesel Engines: Operator-infused Physics-Informed Neural Networks with Transfer Learning for Engine Health Monitoring
Improving diesel engine efficiency, reducing emissions, and enabling robust health monitoring have been critical research topics in engine modelling. While recent advancements in the use of neural networks for system monitoring have shown promising results, such methods often focus on component-level analysis, lack generalizability, and physical interpretability. In this study, we propose a novel hybrid framework that combines physics-informed neural networks (PINNs) with deep operator networks (DeepONet) to enable accurate and computationally efficient parameter identification in mean-value diesel engine models. Our method leverages physics-based system knowledge in combination with data-driven training of neural networks to enhance model applicability. Incorporating offline-trained DeepONets to predict actuator dynamics significantly lowers the online computation cost when compared to the existing PINN framework. To address the re-training burden typical of PINNs under varying input conditions, we propose two transfer learning (TL) strategies: (i) a multi-stage TL scheme offering better runtime efficiency than full online training of the PINN model and (ii) a few-shot TL scheme that freezes a shared multi-head network body and computes physics-based derivatives required for model training outside the training loop. The second strategy offers a computationally inexpensive and physics-based approach for predicting engine dynamics and parameter identification, offering computational efficiency over the existing PINN framework. Compared to existing health monitoring methods, our framework combines the interpretability of physics-based models with the flexibility of deep learning, offering substantial gains in generalization, accuracy, and deployment efficiency for diesel engine diagnostics.
SwarmGPT: Combining Large Language Models with Safe Motion Planning for Drone Swarm Choreography
Drone swarm performances -- synchronized, expressive aerial displays set to music -- have emerged as a captivating application of modern robotics. Yet designing smooth, safe choreographies remains a complex task requiring expert knowledge. We present SwarmGPT, a language-based choreographer that leverages the reasoning power of large language models (LLMs) to streamline drone performance design. The LLM is augmented by a safety filter that ensures deployability by making minimal corrections when safety or feasibility constraints are violated. By decoupling high-level choreographic design from low-level motion planning, our system enables non-experts to iteratively refine choreographies using natural language without worrying about collisions or actuator limits. We validate our approach through simulations with swarms up to 200 drones and real-world experiments with up to 20 drones performing choreographies to diverse types of songs, demonstrating scalable, synchronized, and safe performances. Beyond entertainment, this work offers a blueprint for integrating foundation models into safety-critical swarm robotics applications.
comment: Accepted at RA-L 2025
Extending First-order Robotic Motion Planners to Second-order Robot Dynamics
This paper extends first-order motion planners to robots governed by second-order dynamics. Two control schemes are proposed based on the knowledge of a scalar function whose negative gradient aligns with a given first-order motion planner. When such a function is known, the first-order motion planner is combined with a damping velocity vector with a dynamic gain to extend the safety and convergence guarantees of the first-order motion planner to second-order systems. If no such function is available, we propose an alternative control scheme ensuring that the error between the robot's velocity and the first-order motion planner converges to zero. The theoretical developments are supported by simulation results demonstrating the effectiveness of the proposed approaches.
comment: 14 pages, 10 figures
Direction Estimation of Sound Sources Using Microphone Arrays and Signal Strength ICSE
Sound-tracking refers to the process of determining the direction from which a sound originates, making it a fundamental component of sound source localization. This capability is essential in a variety of applications, including security systems, acoustic monitoring, and speaker tracking, where accurately identifying the direction of a sound source enables real-time responses, efficient resource allocation, and improved situational awareness. While sound-tracking is closely related to localization, it specifically focuses on identifying the direction of the sound source rather than estimating its exact position in space. Despite its utility, sound-tracking systems face several challenges, such as maintaining directional accuracy and precision, along with the need for sophisticated hardware configurations and complex signal processing algorithms. This paper presents a sound-tracking method using three electret microphones. We estimate the direction of a sound source using a lightweight method that analyzes signals from three strategically placed microphones. By comparing the average power of the received signals, the system infers the most probable direction of the sound. The results indicate that the power level from each microphone effectively determines the sound source direction. Our system employs a straightforward and cost-effective hardware design, ensuring simplicity and affordability in implementation. It achieves a localization error of less than 6 degrees and a precision of 98%. Additionally, its effortless integration with various systems makes it versatile and adaptable. Consequently, this technique presents a robust and reliable solution for sound-tracking and localization, with potential applications spanning diverse domains such as security systems, smart homes, and acoustic monitoring.
comment: Accepted to the 32nd International Conference on Systems Engineering (ICSEng'2025)
Optimization via a Control-Centric Framework
Optimization plays a central role in intelligent systems and cyber-physical technologies, where speed and reliability of convergence directly impact performance. In control theory, optimization-centric methods are standard: controllers are designed by repeatedly solving optimization problems, as in linear quadratic regulation, $H_\infty$ control, and model predictive control. In contrast, this paper develops a control-centric framework for optimization itself, where algorithms are constructed directly from Lyapunov stability principles rather than being proposed first and analyzed afterward. A key element is the stationarity vector, which encodes first-order optimality conditions and enables Lyapunov-based convergence analysis. By pairing a Lyapunov function with a selectable decay law, we obtain continuous-time dynamics with guaranteed exponential, finite-time, fixed-time, or prescribed-time convergence. Within this framework, we introduce three feedback realizations of increasing restrictiveness: the Hessian-gradient, Newton, and gradient dynamics. Each realization shapes the decay of the stationarity vector to achieve the desired rate. These constructions unify unconstrained optimization, extend naturally to constrained problems via Lyapunov-consistent primal-dual dynamics, and broaden the results for minimax and generalized Nash equilibrium seeking problems beyond exponential stability. The framework provides systematic design tools for optimization algorithms in control and game-theoretic problems.
comment: This work has been submitted to the IEEE for possible publication. 12 pages, 3 figures
A view on learning robust goal-conditioned value functions: Interplay between RL and MPC
Reinforcement learning (RL) and model predictive control (MPC) offer a wealth of distinct approaches for automatic decision-making under uncertainty. Given the impact both fields have had independently across numerous domains, there is growing interest in combining the general-purpose learning capability of RL with the safety and robustness features of MPC. To this end, this paper presents a tutorial-style treatment of RL and MPC, treating them as alternative approaches to solving Markov decision processes. In our formulation, RL aims to learn a global value function through offline exploration in an uncertain environment, whereas MPC constructs a local value function through online optimization. This local-global perspective suggests new ways to design policies that combine robustness and goal-conditioned learning. Robustness is incorporated into the RL and MPC pipelines through a scenario-based approach. Goal-conditioned learning aims to alleviate the burden of engineering a reward function for RL. Combining the two leads to a single policy that unites a robust, high-level RL terminal value function with short-term, scenario-based MPC planning for reliable constraint satisfaction. This approach leverages the benefits of both RL and MPC, the effectiveness of which is demonstrated on classical control benchmarks.
comment: Postprint; 37 pages
Decentralized CBF-based Safety Filters for Collision Avoidance of Cooperative Missile Systems with Input Constraints
This paper presents a decentralized safety filter for collision avoidance in multi-agent aerospace interception scenarios. The approach leverages robust control barrier functions (RCBFs) to guarantee forward invariance of safety sets under bounded inputs and high-relative-degree dynamics. Each effector executes its nominal cooperative guidance command, while a local quadratic program (QP) modifies the input only when necessary. Event-triggered activation based on range and zero-effort miss (ZEM) criteria ensures scalability by restricting active constraints to relevant neighbors. To resolve feasibility issues from simultaneous constraints, a slack-variable relaxation scheme is introduced that prioritizes critical agents in a Pareto-optimal manner. Simulation results in many-on-many interception scenarios demonstrate that the proposed framework maintains collision-free operation with minimal deviation from nominal guidance, providing a computationally efficient and scalable solution for safety-critical multi-agent aerospace systems.
comment: 7 pages, 5 figures
A Control Allocation Algorithm for Hypersonic Glide Vehicles with Input Limitations
Hypersonic glide vehicles (HGVs) operate in challenging flight regimes characterized by strong nonlinearities in actuation and stringent physical constraints. These include state-dependent actuator limitations, asymmetric control bounds, and thermal loads that vary with maneuvering conditions. This paper introduces an iterative control allocation method to address these challenges in real time. The proposed algorithm searches for control inputs that achieve the desired moment commands while respecting constraints on input magnitude and rate. For slender HGV configurations, thermal loads and drag generation are strongly correlated-lower drag typically results in reduced surface heating. By embedding drag-sensitive soft constraints, the method improves energy efficiency and implicitly reduces surface temperatures, lowering the vehicle's infrared signature. These features are particularly advantageous for long-range military operations that require low observability. The approach is demonstrated using the DLR's Generic Hypersonic Glide Vehicle 2 (GHGV-2) simulation model. The results confirm the method's effectiveness in maintaining control authority under realistic, constrained flight conditions.
comment: 38 pages, 20 figures, submitted to the AIAA Journal of Guidance, Control, and Dynamics
Machine Learning Detection of Lithium Plating in Lithium-ion Cells: A Gaussian Process Approach
Lithium plating during fast charging is a critical degradation mechanism that accelerates capacity fade and can trigger catastrophic safety failures. Recent work has identified a distinctive dQ/dV peak above 4.0 V as a reliable signature of plating onset; however, conventional methods for computing dQ/dV rely on finite differencing with filtering, which amplifies sensor noise and introduces bias in peak location. In this paper, we propose a Gaussian Process (GP) framework for lithium plating detection by directly modeling the charge-voltage relationship Q(V) as a stochastic process with calibrated uncertainty. Leveraging the property that derivatives of GPs remain GPs, we infer dQ/dV analytically and probabilistically from the posterior, enabling robust detection without ad hoc smoothing. The framework provides three key benefits: (i) noise-aware inference with hyperparameters learned from data, (ii) closed-form derivatives with credible intervals for uncertainty quantification, and (iii) scalability to online variants suitable for embedded BMS. Experimental validation on Li-ion coin cells across a range of C-rates (0.2C-1C) and temperatures (0-40\deg C) demonstrates that the GP-based method reliably detects plating peaks under low-temperature, high-rate charging, while correctly reporting no peaks in baseline cases. The concurrence of GP-identified differential peaks, reduced charge throughput, and capacity fade measured via reference performance tests confirms the method's accuracy and robustness, establishing a practical pathway for real-time lithium plating detection.
comment: Submitted to American Control Conference 2026 - ACC 2026
Techno-economic analysis of self-sustainable thermophotovoltaic systems for grid-scale energy generation
To facilitate the widespread adoption of renewable energy, dispatchable, zero-emission power sources are essential for grid stability. This work performs a comprehensive techno-economic analysis of a self-sustainable thermophotovoltaic (TPV) system, an architecture that integrates solar charging to function as a standalone power generation asset. Using theory-based models for conventional air-bridge InGaAs and Si diode cells, our analysis reveals that while the system is not currently competitive from a pure levelized of storage cost (LCOS) perspective due to the high capital expenditure for thermal battery materials, its primary value lies in its competitive levelized cost of electricity (LCOE), which is comparable to that of conventional dispatchable generators such as gas turbines. Furthermore, we show that a full Si-based TPV system, utilizing a 50-{\mu}m-thick air-bridge cell for enhanced photon utilization, can also achieve an LCOE that is competitive with such conventional power sources at scales exceeding the gigawatt-hour level, despite its lower conversion efficiency relative to its InGaAs counterpart. This highlights a practical engineering pathway for leveraging the immense manufacturing scalability of Si, offering a lower-risk route to deployment compared to III-V materials. Ultimately, this work establishes the self-sustainable TPV architecture as a compelling pathway toward providing grid-scale, on-demand, zero-emission power.
comment: 27 pages, 6 figures, 1 table
Transferable Parasitic Estimation via Graph Contrastive Learning and Label Rebalancing in AMS Circuits
Graph representation learning on Analog-Mixed Signal (AMS) circuits is crucial for various downstream tasks, e.g., parasitic estimation. However, the scarcity of design data, the unbalanced distribution of labels, and the inherent diversity of circuit implementations pose significant challenges to learning robust and transferable circuit representations. To address these limitations, we propose CircuitGCL, a novel graph contrastive learning framework that integrates representation scattering and label rebalancing to enhance transferability across heterogeneous circuit graphs. CircuitGCL employs a self-supervised strategy to learn topology-invariant node embeddings through hyperspherical representation scattering, eliminating dependency on large-scale data. Simultaneously, balanced mean squared error (BMSE) and balanced softmax cross-entropy (BSCE) losses are introduced to mitigate label distribution disparities between circuits, enabling robust and transferable parasitic estimation. Evaluated on parasitic capacitance estimation (edge-level task) and ground capacitance classification (node-level task) across TSMC 28nm AMS designs, CircuitGCL outperforms all state-of-the-art (SOTA) methods, with the $R^2$ improvement of $33.64\% \sim 44.20\%$ for edge regression and F1-score gain of $0.9\times \sim 2.1\times$ for node classification. Our code is available at https://github.com/ShenShan123/CircuitGCL.
comment: Final version accepted by the International Conference on Computer-Aided Design (ICCAD) 2025. First two authors have equal contributions
Geometry of Distance Protection
Distance relays detect faults on transmission lines. They face uncertainty from the fault's location and resistance, as well as the current from the line's remote terminal. In this paper, we aggregate this uncertainty with the Minkowski sum. This allows us to explicitly model the power grid surrounding the relay's line, and in turn accommodate any mix of synchronous machines and inverter-based resources. To make the relay's task easier, inverters can inject perturbations, or auxiliary signals, such as negative-sequence current. We use Farkas' lemma to construct an optimization for designing inverter auxiliary signals.
A Real-Time System for Scheduling and Managing UAV Delivery in Urban Areas
As urban logistics demand continues to grow, UAV delivery has become a key solution to improve delivery efficiency, reduce traffic congestion, and lower logistics costs. However, to fully leverage the potential of UAV delivery networks, efficient swarm scheduling and management are crucial. In this paper, we propose a real-time scheduling and management system based on the ``Airport-Unloading Station" model, aiming to bridge the gap between high-level scheduling algorithms and low-level execution systems. This system, acting as middleware, accurately translates the requirements from the scheduling layer into specific execution instructions, ensuring that the scheduling algorithms perform effectively in real-world environments. Additionally, we implement three collaborative scheduling schemes involving autonomous ground vehicles (AGVs), unmanned aerial vehicles (UAVs), and ground staff to further optimize overall delivery efficiency. Through extensive experiments, this study demonstrates the rationality and feasibility of the proposed management system, providing practical solution for the commercial application of UAVs delivery in urban. Code: https://github.com/chengji253/UAVDeliverySystem
Coordinated Control of Deformation and Flight for Morphing Aircraft via Meta-Learning and Coupled State-Dependent Riccati Equations
In this paper, the coordinated control problem of deformation and flight for morphing aircraft (MA) is studied by using meta-learning (ML) and coupled state-dependent Riccati equations (CSDREs). Our method is built on two principal observations that dynamic models of MA under varying morphing conditions share a morphing condition independent representation function and that the specific morphing condition part lies in a set of linear coefficients. To that end, the domain adversarially invariant meta-learning (DAIML) is employed to learn the shared representation with offline flight data. Based on the learned representation function, the coordinated control of the deformation and flight for MA is formulated as a non-cooperative differential game. The state-dependent feedback control solutions can be derived by addressing a pair of CSDREs. For this purpose, Lyapunov iterations are extended to obtain the positive semidefinite (definite) stabilizing solutions of the CSDREs, and the convergence proof of the proposed algorithm is provided. Finally, a simulation study is carried out to validate the efficacy of the developed coordinated game control strategies.
Modeling and Simulation of an Active Car Suspension with a Robust LQR Controller under Road Disturbance, Parameter Uncertainty and White Noise
Vehicle suspension is important for passengers to travel comfortably and to be less exposed to effects such as vibration and shock. A good suspension system increases the road holding of vehicles, allows them to take turns safely, and reduces the risk of traffic accidents. A passive suspension system is the most widely used suspension system in vehicles due to its simple structure and low cost. Passive suspension systems do not have an actuator and therefore do not have a controller. Active suspension systems have an actuator and a controller. Although their structures are more complex and costly, they are safer. PID controller is widely used in active suspension systems due to its simple structure, reasonable cost, and easy adjustment of coefficients. In this study, a more robust LQR-controlled active suspension was designed than a passive suspension and a PID-controlled active suspension. Robustness analyses were performed for passive suspension, PID-controlled active suspension, and LQR-controlled active suspension. Suspension travel, sprung mass acceleration, and sprung mass motion simulations were performed for all three suspensions under road disturbance, under simultaneous road disturbance and parameter uncertainty and under road disturbance with white noise. A comparative analysis was performed by obtaining the rise time, overshoot, and settling time data of the suspensions under different conditions. It was observed that the LQR-controlled active suspension showed the fastest rise time, the least overshoot and had the shortest settling time. In this case, it was proven that the LQR controlled active suspension provided a more comfortable and safe ride compared to the other two suspension systems.
comment: 20 pages, 19 figures
Resilient Multi-Dimensional Consensus and Distributed Optimization against Agent-Based and Denial-of-Service Attacks
In this paper, we consider the resilient multi-dimensional consensus and distributed optimization problems of multi-agent systems (MASs) in the presence of both agent-based and denial-of-service (DoS) attacks. The considered agent-based attacks can cover malicious, Byzantine, and stubborn agents. The links between agents in the network can be blocked by DoS attacks, which may lead the digraph to be time-varying and even disconnected. The objective is to ensure that the remaining benign agents achieve consensus. To this end, an "auxiliary point"-based resilient control algorithm is proposed for MASs. Under the proposed algorithm, each healthy agent constructs a "safe kernel" utilizing the states of its in-neighbors and updates its state toward a specific point within this kernel at each iteration. If an agent cannot receive its neighbors' states owing to DoS attacks, it will use the states received immediately before the DoS period. Moreover, a resilient multi-dimensional distributed optimization (RMDO) algorithm is also proposed. Theoretical proofs and numerical examples are presented to demonstrate the effectiveness of the proposed algorithms.
Learning a Shape-adaptive Assist-as-needed Rehabilitation Policy from Therapist-informed Input
Therapist-in-the-loop robotic rehabilitation has shown great promise in enhancing rehabilitation outcomes by integrating the strengths of therapists and robotic systems. However, its broader adoption remains limited due to insufficient safe interaction and limited adaptation capability. This article proposes a novel telerobotics-mediated framework that enables therapists to intuitively and safely deliver assist-as-needed~(AAN) therapy based on two primary contributions. First, our framework encodes the therapist-informed corrective force into via-points in a latent space, allowing the therapist to provide only minimal assistance while encouraging patient maintaining own motion preferences. Second, a shape-adaptive ANN rehabilitation policy is learned to partially and progressively deform the reference trajectory for movement therapy based on encoded patient motion preferences and therapist-informed via-points. The effectiveness of the proposed shape-adaptive AAN strategy was validated on a telerobotic rehabilitation system using two representative tasks. The results demonstrate its practicality for remote AAN therapy and its superiority over two state-of-the-art methods in reducing corrective force and improving movement smoothness.
Hierarchical Analysis and Control of Epidemic Spreading over Networks using Dissipativity and Mesh Stability
Analyzing and controlling spreading processes are challenging problems due to the involved non-linear node (subsystem) dynamics, unknown disturbances, complex interconnections, and the large-scale and multi-level nature of the problems. The dissipativity concept provides a practical framework for addressing such concerns, thanks to the energy-based representation it offers for subsystems and the compositional properties it provides for the analysis and control of interconnected (networked) systems comprised of such subsystems. Therefore, in this paper, we utilize the dissipativity concept to analyze and control a spreading process that occurs over a hierarchy of nodes, groups, and a network (i.e., a spreading network). We start by generalizing several existing results on dissipativity-based topology design for networked systems. Next, we model the considered spreading network as a networked system and establish the dissipativity properties of its nodes. The generalized topology design method is then applied at multiple levels of the considered spreading network to formulate its analysis and control problems as Linear Matrix Inequality (LMI) problems. We identify and enforce localized necessary conditions to support the feasibility of the LMI problem solved at each subsequent hierarchical level of the spreading network. Consequently, the proposed method does not involve iterative multi-level optimization stages that are computationally inefficient. The proposed control solution ensures that the spreading network is not only stable but also dissipative and mesh-stable. Compared to conventional methods, such as threshold pruning and high-degree edge removal, our approach offers superior performance in terms of infection containment, control efficiency, and disturbance robustness. Extensive numerical results demonstrate the effectiveness of the proposed technique.
comment: To be submitted to Automatica
Connecting the Equinoctial Elements and Rodrigues Parameters: A New Set of Elements
A geometric interpretation of the equinoctial elements is given with a connection to orthogonal rotations and attitude dynamics in Euclidean 3-space. An identification is made between the equinoctial elements and classic Rodrigues parameters. A new set of equinoctial elements are developed using the modified Rodrigues parameters, thereby removing the coordinate singularity for retrograde equatorial orbits present in previous versions of these elements. A low-thrust trajectory optimization problem is set up using the new elements to numerically verify convergence for the two-point boundary problem, as compared to their predecessors.
comment: formatting corrected for better readability
Optimal Assignment and Motion Control in Two-Class Continuum Swarms
We consider optimal swarm control problems where two different classes of agents are present. Continuum idealizations of large-scale swarms are used where the dynamics describe the evolution of the spatially-distributed densities of each agent class. The problem formulation we adopt is motivated by applications where agents of one class are assigned to agents of the other class, which we refer to as demand and resource agents respectively. Assignments have costs related to the distances between mutually assigned agents, and the overall cost of an assignment is quantified by a Wasserstein distance between the densities of the two agent classes. When agents can move, the assignment cost can decrease at the expense of a physical motion cost, and this tradeoff sets up a nonlinear infinite-dimensional optimal control problem. We show that in one spatial dimension, this problem can be converted to an infinite-dimensional, but decoupled, linear-quadratic (LQ) tracking problem when expressed in terms of the quantile functions of the respective agent densities. Solutions are given in the general one-dimensional case, as well as in the special cases of constant and periodically time-varying demands.
comment: Extended version including periodic-demand case. 13 pages, 7 figures
Data-driven Model Predictive Control using MATLAB
This paper presents a comprehensive overview of data-driven model predictive control, highlighting state-of-the-art methodologies and their numerical implementation. The discussion begins with a brief review of conventional model predictive control (MPC), which discusses both linear MPC (LMPC) and nonlinear MPC (NMPC). This is followed by a section on data-driven LMPC, outlining fundamental concepts and the implementation of various approaches, including subspace predictive control and prediction error methods. Subsequently, the focus shifts to data-driven NMPC, emphasizing approaches based on neural network models. The paper concludes with a review of recent advancements in data-driven MPC and explores potential directions for future research.
comment: 22 pages, 8 figures
Multiagent Systems
Opponent Shaping in LLM Agents
Large Language Models (LLMs) are increasingly being deployed as autonomous agents in real-world environments. As these deployments scale, multi-agent interactions become inevitable, making it essential to understand strategic behavior in such systems. A central open question is whether LLM agents, like reinforcement learning agents, can shape the learning dynamics and influence the behavior of others through interaction alone. In this paper, we present the first investigation of opponent shaping (OS) with LLM-based agents. Existing OS algorithms cannot be directly applied to LLMs, as they require higher-order derivatives, face scalability constraints, or depend on architectural components that are absent in transformers. To address this gap, we introduce ShapeLLM, an adaptation of model-free OS methods tailored for transformer-based agents. Using ShapeLLM, we examine whether LLM agents can influence co-players' learning dynamics across diverse game-theoretic environments. We demonstrate that LLM agents can successfully guide opponents toward exploitable equilibria in competitive games (Iterated Prisoner's Dilemma, Matching Pennies, and Chicken) and promote coordination and improve collective welfare in cooperative games (Iterated Stag Hunt and a cooperative version of the Prisoner's Dilemma). Our findings show that LLM agents can both shape and be shaped through interaction, establishing opponent shaping as a key dimension of multi-agent LLM research.
comment: 29 pages, 15 figures, 15 tables
Bayesian Decision Making around Experts
Complex learning agents are increasingly deployed alongside existing experts, such as human operators or previously trained agents. However, it remains unclear how should learners optimally incorporate certain forms of expert data, which may differ in structure from the learner's own action-outcome experiences. We study this problem in the context of Bayesian multi-armed bandits, considering: (i) offline settings, where the learner receives a dataset of outcomes from the expert's optimal policy before interaction, and (ii) simultaneous settings, where the learner must choose at each step whether to update its beliefs based on its own experience, or based on the outcome simultaneously achieved by an expert. We formalize how expert data influences the learner's posterior, and prove that pretraining on expert outcomes tightens information-theoretic regret bounds by the mutual information between the expert data and the optimal action. For the simultaneous setting, we propose an information-directed rule where the learner processes the data source that maximizes their one-step information gain about the optimal action. Finally, we propose strategies for how the learner can infer when to trust the expert and when not to, safeguarding the learner for the cases where the expert is ineffective or compromised. By quantifying the value of expert data, our framework provides practical, information-theoretic algorithms for agents to intelligently decide when to learn from others.
Climate Surrogates for Scalable Multi-Agent Reinforcement Learning: A Case Study with CICERO-SCM
Climate policy studies require models that capture the combined effects of multiple greenhouse gases on global temperature, but these models are computationally expensive and difficult to embed in reinforcement learning. We present a multi-agent reinforcement learning (MARL) framework that integrates a high-fidelity, highly efficient climate surrogate directly in the environment loop, enabling regional agents to learn climate policies under multi-gas dynamics. As a proof of concept, we introduce a recurrent neural network architecture pretrained on ($20{,}000$) multi-gas emission pathways to surrogate the climate model CICERO-SCM. The surrogate model attains near-simulator accuracy with global-mean temperature RMSE $\approx 0.0004 \mathrm{K}$ and approximately $1000\times$ faster one-step inference. When substituted for the original simulator in a climate-policy MARL setting, it accelerates end-to-end training by $>\!100\times$. We show that the surrogate and simulator converge to the same optimal policies and propose a methodology to assess this property in cases where using the simulator is intractable. Our work allows to bypass the core computational bottleneck without sacrificing policy fidelity, enabling large-scale multi-agent experiments across alternative climate-policy regimes with multi-gas dynamics and high-fidelity climate response.
Network Topology and Information Efficiency of Multi-Agent Systems: Study based on MARL
Multi-agent systems (MAS) solve complex problems through coordinated autonomous entities with individual decision-making capabilities. While Multi-Agent Reinforcement Learning (MARL) enables these agents to learn intelligent strategies, it faces challenges of non-stationarity and partial observability. Communications among agents offer a solution, but questions remain about its optimal structure and evaluation. This paper explores two underexamined aspects: communication topology and information efficiency. We demonstrate that directed and sequential topologies improve performance while reducing communication overhead across both homogeneous and heterogeneous tasks. Additionally, we introduce two metrics -- Information Entropy Efficiency Index (IEI) and Specialization Efficiency Index (SEI) -- to evaluate message compactness and role differentiation. Incorporating these metrics into training objectives improves success rates and convergence speed. Our findings highlight that designing adaptive communication topologies with information-efficient messaging is essential for effective coordination in complex MAS.
Multimodal Safety Evaluation in Generative Agent Social Simulations
Can generative agents be trusted in multimodal environments? Despite advances in large language and vision-language models that enable agents to act autonomously and pursue goals in rich settings, their ability to reason about safety, coherence, and trust across modalities remains limited. We introduce a reproducible simulation framework for evaluating agents along three dimensions: (1) safety improvement over time, including iterative plan revisions in text-visual scenarios; (2) detection of unsafe activities across multiple categories of social situations; and (3) social dynamics, measured as interaction counts and acceptance ratios of social exchanges. Agents are equipped with layered memory, dynamic planning, multimodal perception, and are instrumented with SocialMetrics, a suite of behavioral and structural metrics that quantifies plan revisions, unsafe-to-safe conversions, and information diffusion across networks. Experiments show that while agents can detect direct multimodal contradictions, they often fail to align local revisions with global safety, reaching only a 55 percent success rate in correcting unsafe plans. Across eight simulation runs with three models - Claude, GPT-4o mini, and Qwen-VL - five agents achieved average unsafe-to-safe conversion rates of 75, 55, and 58 percent, respectively. Overall performance ranged from 20 percent in multi-risk scenarios with GPT-4o mini to 98 percent in localized contexts such as fire/heat with Claude. Notably, 45 percent of unsafe actions were accepted when paired with misleading visuals, showing a strong tendency to overtrust images. These findings expose critical limitations in current architectures and provide a reproducible platform for studying multimodal safety, coherence, and social dynamics.
What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment
We introduce the Agent GPA (Goal-Plan-Action) framework: an evaluation paradigm based on an agent's operational loop of setting goals, devising plans, and executing actions. The framework includes five evaluation metrics: Goal Fulfillment, Logical Consistency, Execution Efficiency, Plan Quality, and Plan Adherence. Logical Consistency checks that an agent's actions are consistent with its prior actions. Execution Efficiency checks whether the agent executes in the most efficient way to achieve its goal. Plan Quality checks whether an agent's plans are aligned with its goals; Plan Adherence checks if an agent's actions are aligned with its plan; and Goal Fulfillment checks that agent's final outcomes match the stated goals. Our experimental results on two benchmark datasets - the public TRAIL/GAIA dataset and an internal dataset for a production-grade data agent - show that this framework (a) provides a systematic way to cover a broad range of agent failures, including all agent errors on the TRAIL/GAIA benchmark dataset; (b) supports LLM-judges that exhibit strong agreement with human annotation, covering 80% to over 95% errors; and (c) localizes errors with 86% agreement to enable targeted improvement of agent performance.
A Hybrid Agent-Based and System Dynamics Framework for Modelling Project Execution and Technology Maturity in Early-Stage R&D
This paper presents a hybrid approach to predict the evolution of technological maturity in R and D projects, using the oil and gas sector as an example. Integrating System Dynamics (SD) and Agent Based Modelling (ABM) allows the proposed multi level framework to capture uncertainties in work effort, team size, and project duration, which influence technological progress. While AB SD hybrid models are established in other fields, their use in R and D remains limited. The model combines system level feedback structures governing work phases, rework cycles, and duration with decentralised agents such as team members, tasks, and controllers, whose interactions generate emergent project dynamics. A base case scenario analysed early stage innovation projects with 15 parallel tasks over 156 weeks. A comparative sequential scenario showed an 88 percent reduction in rework duration. A second scenario assessed mixed parallel sequential task structures with varying team sizes. In parallel configurations, increasing team size reduced project duration and improved task completion, with optimal results for teams of four to five members. These findings align with empirical evidence showing that moderate team expansion enhances coordination efficiency without excessive communication overhead. However, larger teams may decrease performance due to communication complexity and management delays. Overall, the model outputs and framework align with expert understanding, supporting their validity as quantitative tools for analysing resource allocation, scheduling efficiency, and technology maturity progression.
What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment
We introduce the Agent GPA (Goal-Plan-Action) framework: an evaluation paradigm based on an agent's operational loop of setting goals, devising plans, and executing actions. The framework includes five evaluation metrics: Goal Fulfillment, Logical Consistency, Execution Efficiency, Plan Quality, and Plan Adherence. Logical Consistency checks that an agent's actions are consistent with its prior actions. Execution Efficiency checks whether the agent executes in the most efficient way to achieve its goal. Plan Quality checks whether an agent's plans are aligned with its goals; Plan Adherence checks if an agent's actions are aligned with its plan; and Goal Fulfillment checks that agent's final outcomes match the stated goals. Our experimental results on two benchmark datasets - the public TRAIL/GAIA dataset and an internal dataset for a production-grade data agent - show that this framework (a) provides a systematic way to cover a broad range of agent failures, including all agent errors on the TRAIL/GAIA benchmark dataset; (b) supports LLM-judges that exhibit strong agreement with human annotation, covering 80% to over 95% errors; and (c) localizes errors with 86% agreement to enable targeted improvement of agent performance.
MARLIN: Multi-Agent Reinforcement Learning with Murmuration Intelligence and LLM Guidance for Reservoir Management
As climate change intensifies extreme weather events, water disasters pose growing threats to global communities, making adaptive reservoir management critical for protecting vulnerable populations and ensuring water security. Modern water resource management faces unprecedented challenges from cascading uncertainties propagating through interconnected reservoir networks. These uncertainties, rooted in physical water transfer losses and environmental variability, make precise control difficult. For example, sending 10 tons downstream may yield only 8-12 tons due to evaporation and seepage. Traditional centralized optimization approaches suffer from exponential computational complexity and cannot effectively handle such real-world uncertainties, while existing multi-agent reinforcement learning (MARL) methods fail to achieve effective coordination under uncertainty. To address these challenges, we present MARLIN, a decentralized reservoir management framework inspired by starling murmurations intelligence. Integrating bio-inspired alignment, separation, and cohesion rules with MARL, MARLIN enables individual reservoirs to make local decisions while achieving emergent global coordination. In addition, a LLM provides real-time reward shaping signals, guiding agents to adapt to environmental changes and human-defined preferences. Experiments on real-world USGS data show that MARLIN improves uncertainty handling by 23\%, cuts computation by 35\%, and accelerates flood response by 68\%, exhibiting super-linear coordination, with complexity scaling 5.4x from 400 to 10,000 nodes. These results demonstrate MARLIN's potential for disaster prevention and protecting communities through intelligent, scalable water resource management.
Multi-Turn Human-LLM Interaction Through the Lens of a Two-Way Intelligibility Protocol NeurIPS 2025
Our interest is in the design of software systems involving a human-expert interacting -- using natural language -- with a large language model (LLM) on data analysis tasks. For complex problems, it is possible that LLMs can harness human expertise and creativity to find solutions that were otherwise elusive. On one level, this interaction takes place through multiple turns of prompts from the human and responses from the LLM. Here we investigate a more structured approach based on an abstract protocol described in [3] for interaction between agents. The protocol is motivated by a notion of "two-way intelligibility" and is modelled by a pair of communicating finite-state machines. We provide an implementation of the protocol, and provide empirical evidence of using the implementation to mediate interactions between an LLM and a human-agent in two areas of scientific interest (radiology and drug design). We conduct controlled experiments with a human proxy (a database), and uncontrolled experiments with human subjects. The results provide evidence in support of the protocol's capability of capturing one- and two-way intelligibility in human-LLM interaction; and for the utility of two-way intelligibility in the design of human-machine systems. Our code is available at https://github.com/karannb/interact.
comment: Multi-Turn Interactions in Large Language Models (MTI-LLM) Workshop at NeurIPS 2025
Paper2Video: Automatic Video Generation from Scientific Papers
Academic presentation videos have become an essential medium for research communication, yet producing them remains highly labor-intensive, often requiring hours of slide design, recording, and editing for a short 2 to 10 minutes video. Unlike natural video, presentation video generation involves distinctive challenges: inputs from research papers, dense multi-modal information (text, figures, tables), and the need to coordinate multiple aligned channels such as slides, subtitles, speech, and human talker. To address these challenges, we introduce Paper2Video, the first benchmark of 101 research papers paired with author-created presentation videos, slides, and speaker metadata. We further design four tailored evaluation metrics--Meta Similarity, PresentArena, PresentQuiz, and IP Memory--to measure how videos convey the paper's information to the audience. Building on this foundation, we propose PaperTalker, the first multi-agent framework for academic presentation video generation. It integrates slide generation with effective layout refinement by a novel effective tree search visual choice, cursor grounding, subtitling, speech synthesis, and talking-head rendering, while parallelizing slide-wise generation for efficiency. Experiments on Paper2Video demonstrate that the presentation videos produced by our approach are more faithful and informative than existing baselines, establishing a practical step toward automated and ready-to-use academic video generation. Our dataset, agent, and code are available at https://github.com/showlab/Paper2Video.
comment: Project Page: https://showlab.github.io/Paper2Video/
Neuro-Symbolic Agents with Modal Logic for Autonomous Diagnostics
The development of intelligent agents, particularly those powered by language models (LMs), has shown the critical role in various environments that require intelligent and autonomous decision. Environments are not passive testing grounds and they represent the data required for agents to learn and exhibit very challenging conditions that require adaptive, complex and autonomous capacity to make decisions. While the paradigm of scaling models and datasets has led to remarkable emergent capabilities, we argue that scaling the structure, fidelity, and logical consistency of agent reasoning within these environments is a crucial, yet underexplored, dimension of AI research. This paper introduces a neuro-symbolic multi-agent architecture where the belief states of individual agents are formally represented as Kripke models. This foundational choice enables them to reason about known concepts of \emph{possibility} and \emph{necessity} using the formal language of modal logic. In this work, we use of immutable, domain-specific knowledge to make infere information, which is encoded as logical constraints essential for proper diagnosis. In the proposed model, we show constraints that actively guide the hypothesis generation of LMs, effectively preventing them from reaching physically or logically untenable conclusions. In a high-fidelity simulated particle accelerator environment, our system successfully diagnoses complex, cascading failures by combining the powerful semantic intuition of LMs with the rigorous, verifiable validation of modal logic and a factual world model and showcasing a viable path toward more robust, reliable, and verifiable autonomous agents.
comment: 10 pages, 1 figure, Scaling Environments for Agents (SEA) Workshop at NeuralIPS
FG-PE: Factor-graph Approach for Multi-robot Pursuit-Evasion
With the increasing use of robots in daily life, there is a growing need to provide robust collaboration protocols for robots to tackle more complicated and dynamic problems effectively. This paper presents a novel, factor graph-based approach to address the pursuit-evasion problem, enabling accurate estimation, planning, and tracking of an evader by multiple pursuers working together. It is assumed that there are multiple pursuers and only one evader in this scenario. The proposed method significantly improves the accuracy of evader estimation and tracking, allowing pursuers to capture the evader in the shortest possible time and distance compared to existing techniques. In addition to these primary objectives, the proposed approach effectively minimizes uncertainty while remaining robust, even when communication issues lead to some messages being dropped or lost. Through a series of comprehensive experiments, this paper demonstrates that the proposed algorithm consistently outperforms traditional pursuit-evasion methods across several key performance metrics, such as the time required to capture the evader and the average distance traveled by the pursuers. Additionally, the proposed method is tested in real-world hardware experiments, further validating its effectiveness and applicability.
Position Paper: Towards Open Complex Human-AI Agents Collaboration Systems for Problem Solving and Knowledge Management
We propose a technology-agnostic, collaboration-ready stance for Human-AI Agents Collaboration Systems (HAACS) that closes long-standing gaps in prior stages (automation; flexible autonomy; agentic multi-agent collectives). Reading empirical patterns through a seven-dimension collaboration spine and human-agent contrasts, we identify missing pieces: principled budgeting of initiative, instantaneous and auditable reconfiguration, a system-wide knowledge backbone with an epistemic promotion gate, capacity-aware human interfaces; and, as a prerequisite to all of the above, unified definitions of agent and formal collaborative dynamics. We respond with (i) a boundary-centric ontology of agenthood synthesized with cybernetics; (ii) a Petri net family (colored and interpreted) that models ownership, cross-boundary interaction, concurrency, guards, and rates with collaboration transitions; and (iii) a three-level orchestration (meta, agent, execution) that governs behavior families via guard flips. On the knowledge side, we ground collaborative learning in Conversation Theory and SECI with teach-back gates and an evolving backbone; on the problem-solving side, we coordinate routine MEA-style control with practice-guided open-ended discovery. The result is the Hierarchical Exploration-Exploitation Net (HE2-Net): a policy-controlled stance that splits provisional from validated assets, promotes only after tests and peer checks, and budgets concurrent probing while keeping reuse fast and safe. We show interoperability with emerging agent protocols without ad hoc glue and sketch bio-cybernetic extensions (autopoiesis, autogenesis, evolving boundaries, synergetics, etc). Altogether, the framework keeps humans central to setting aims, justifying knowledge, and steering theory-practice dynamics, while scaling agents as reliable collaborators within audited governance.
comment: polish the structures, flows and connections, while complement the diagrams
Using utility graphs to search for Pareto-optimal outcomes in complex, interdependent issue negotiations
This paper studies how utility graphs decomposition algorithms can be used to effectively search for Pareto-efficient outcomes in complex automated negotiation. We propose a number of algorithms that can efficiently handle high-dimensional utility graphs, and test them on a variety of utility graph topologies, generated based on state of the art methods for analysing complex graphs. We show that we can achieve exponential speed-up, for many structures, even for very large utility graphs. To our knowledge, our approach can handle the largest utility spaces to date for complex interdependent negotiations, in terms of number of issues. Moreover, we examine the performance of our algorithms across two different types of elicitation queries from the literature: value and comparison queries, thus making a connection between automated negotiation and the preference elicitation literature.
comment: Authors' pre-print (16 pages)
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
Reasoning-based large language models have excelled in mathematics and programming, yet their potential in knowledge-intensive medical question answering remains underexplored and insufficiently validated in clinical contexts. To bridge this gap, we introduce ReasonMed, the largest medical reasoning dataset to date, comprising 370k high-quality examples distilled from 1.75 million initial reasoning paths generated by complementary LLMs and curated through a cost-efficient easy-medium-difficult (EMD) pipeline. ReasonMed is built through a multi-agent generation, verification, and refinement process, in which an Error Refiner improves reasoning paths by correcting error-prone steps identified by a verifier. Using ReasonMed, we investigate effective strategies for training medical reasoning models and find that integrating detailed CoT reasoning with concise answer summaries yields the most robust fine-tuning results. Models trained on ReasonMed set a new benchmark: ReasonMed-7B surpasses the prior best sub-10B models by 4.17% and even exceeds LLaMA3.1-70B on PubMedQA by 4.60%. When scaled to ReasonMed-14B, it remains highly competitive, underscoring consistent scaling potential. The codes and datasets are available at https://github.com/YuSun-Work/ReasonMed.
comment: 28 pages, 6 figures, 7 tables
Multiple Memory Systems for Enhancing the Long-term Memory of Agent
An agent powered by large language models have achieved impressive results, but effectively handling the vast amounts of historical data generated during interactions remains a challenge. The current approach is to design a memory module for the agent to process these data. However, existing methods, such as MemoryBank and A-MEM, have poor quality of stored memory content, which affects recall performance and response quality. In order to better construct high-quality long-term memory content, we have designed a multiple memory system (MMS) inspired by cognitive psychology theory. The system processes short-term memory to multiple long-term memory fragments, and constructs retrieval memory units and contextual memory units based on these fragments, with a one-to-one correspondence between the two. During the retrieval phase, MMS will match the most relevant retrieval memory units based on the user's query. Then, the corresponding contextual memory units is obtained as the context for the response stage to enhance knowledge, thereby effectively utilizing historical data. Experiments on LoCoMo dataset compared our method with three others, proving its effectiveness. Ablation studies confirmed the rationality of our memory units. We also analyzed the robustness regarding the number of selected memory segments and the storage overhead, demonstrating its practical value.
What Do Agents Think One Another Want? Level-2 Inverse Games for Inferring Agents' Estimates of Others' Objectives
Effectively interpreting strategic interactions among multiple agents requires us to infer each agent's objective from limited information. Existing inverse game-theoretic approaches frame this challenge in terms of a "level-1" inference problem, in which we take the perspective of a third-party observer and assume that individual agents share complete knowledge of one another's objectives. However, this assumption breaks down in decentralized, real-world scenarios like urban driving and bargaining, in which agents may act based on conflicting views of one another's objectives. We demonstrate the necessity of inferring agents' different estimates of each other's objectives through empirical examples, and by theoretically characterizing the prediction error of level-1 inference on fictitious gameplay data from linear-quadratic games. To address this fundamental issue, we propose a framework for level-2 inference to address the question: "What does each agent believe about other agents' objectives?" We prove that the level-2 inference problem is non-convex even in benign settings like linear-quadratic games, and we develop an efficient gradient-based approach for identifying local solutions. Experiments on a synthetic urban driving example show that our approach uncovers nuanced misalignments that level-1 methods miss.
comment: 8 pages + references + appendix + supplement
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning
Reinforcement learning (RL) is the dominant paradigm for sharpening strategic tool use capabilities of LLMs on long-horizon, sparsely-rewarded agent tasks, yet it faces a fundamental challenge of exploration-exploitation trade-off. Existing studies stimulate exploration through the lens of policy entropy, but such mechanical entropy maximization is prone to RL training instability due to the multi-turn distribution shifting. In this paper, we target the progressive exploration-exploitation balance under the guidance of the agent own experiences without succumbing to either entropy collapsing or runaway divergence. We propose SPEAR, a curriculum-based self-imitation learning (SIL) recipe for training agentic LLMs. It extends the vanilla SIL framework, where a replay buffer stores self-generated promising trajectories for off-policy update, by gradually steering the policy evolution within a well-balanced range of entropy across stages. Specifically, our approach incorporates a curriculum to manage the exploration process, utilizing intrinsic rewards to foster skill-level exploration and facilitating action-level exploration through SIL. At first, the auxiliary tool call reward plays a critical role in the accumulation of tool-use skills, enabling broad exposure to the unfamiliar distributions of the environment feedback with an upward entropy trend. As training progresses, self-imitation gets strengthened to exploit existing successful patterns from replayed experiences for comparative action-level exploration, accelerating solution iteration without unbounded entropy growth. To further stabilize training, we recalibrate the advantages of experiences in the replay buffer to address the potential policy drift. Reugularizations such as the clipping of tokens with high covariance between probability and advantage are introduced to the trajectory-level entropy control to curb over-confidence.
comment: 26 pages, 11 figures
MAViS: A Multi-Agent Framework for Long-Sequence Video Storytelling
Despite recent advances, long-sequence video generation frameworks still suffer from significant limitations: poor assistive capability, suboptimal visual quality, and limited expressiveness. To mitigate these limitations, we propose MAViS, a multi-agent collaborative framework designed to assist in long-sequence video storytelling by efficiently translating ideas into visual narratives. MAViS orchestrates specialized agents across multiple stages, including script writing, shot designing, character modeling, keyframe generation, video animation, and audio generation. In each stage, agents operate under the 3E Principle -- Explore, Examine, and Enhance -- to ensure the completeness of intermediate outputs. Considering the capability limitations of current generative models, we propose the Script Writing Guidelines to optimize compatibility between scripts and generative tools. Experimental results demonstrate that MAViS achieves state-of-the-art performance in assistive capability, visual quality, and video expressiveness. Its modular framework further enables scalability with diverse generative models and tools. With just a brief idea description, MAViS enables users to rapidly explore diverse visual storytelling and creative directions for sequential video generation by efficiently producing high-quality, complete long-sequence videos. To the best of our knowledge, MAViS is the only framework that provides multimodal design output -- videos with narratives and background music.
comment: Video Generation Agent
MAHL: Multi-Agent LLM-Guided Hierarchical Chiplet Design with Adaptive Debugging
As program workloads (e.g., AI) increase in size and algorithmic complexity, the primary challenge lies in their high dimensionality, encompassing computing cores, array sizes, and memory hierarchies. To overcome these obstacles, innovative approaches are required. Agile chip design has already benefited from machine learning integration at various stages, including logic synthesis, placement, and routing. With Large Language Models (LLMs) recently demonstrating impressive proficiency in Hardware Description Language (HDL) generation, it is promising to extend their abilities to 2.5D integration, an advanced technique that saves area overhead and development costs. However, LLM-driven chiplet design faces challenges such as flatten design, high validation cost and imprecise parameter optimization, which limit its chiplet design capability. To address this, we propose MAHL, a hierarchical LLM-based chiplet design generation framework that features six agents which collaboratively enable AI algorithm-hardware mapping, including hierarchical description generation, retrieval-augmented code generation, diverseflow-based validation, and multi-granularity design space exploration. These components together enhance the efficient generation of chiplet design with optimized Power, Performance and Area (PPA). Experiments show that MAHL not only significantly improves the generation accuracy of simple RTL design, but also increases the generation accuracy of real-world chiplet design, evaluated by Pass@5, from 0 to 0.72 compared to conventional LLMs under the best-case scenario. Compared to state-of-the-art CLARIE (expert-based), MAHL achieves comparable or even superior PPA results under certain optimization objectives.
Systems and Control (CS)
Reliability of Single-Level Equality-Constrained Inverse Optimal Control
Inverse optimal control (IOC) allows the retrieval of optimal cost function weights, or behavioral parameters, from human motion. The literature on IOC uses methods that are either based on a slow bilevel process or a fast but noise-sensitive minimization of optimality condition violation. Assuming equality-constrained optimal control models of human motion, this article presents a faster but robust approach to solving IOC using a single-level reformulation of the bilevel method and yields equivalent results. Through numerical experiments in simulation, we analyze the robustness to noise of the proposed single-level reformulation to the bilevel IOC formulation with a human-like planar reaching task that is used across recent studies. The approach shows resilience to very large levels of noise and reduces the computation time of the IOC on this task by a factor of 15 when compared to a classical bilevel implementation.
comment: 8 pages, 3 figures
Learning to Mitigate Post-Outage Load Surges: A Data-Driven Framework for Electrifying and Decarbonizing Grids
Electrification and decarbonization are transforming power system demand and recovery dynamics, yet their implications for post-outage load surges remain poorly understood. Here we analyze a metropolitan-scale heterogeneous dataset for Indianapolis comprising 30,046 feeder-level outages between 2020 and 2024, linked to smart meters and submetering, to quantify the causal impact of electric vehicles (EVs), heat pumps (HPs) and distributed energy resources (DERs) on restoration surges. Statistical analysis and causal forest inference demonstrate that rising penetrations of all three assets significantly increase surge ratios, with effects strongly modulated by restoration timing, outage duration and weather conditions. We develop a component-aware multi-task Transformer estimator that disaggregates EV, HP and DER contributions, and apply it to project historical outages under counterfactual 2035 adoption pathways. In a policy-aligned pathway, evening restorations emerge as the binding reliability constraint, with exceedance probabilities of 0.057 when 30\% of system load is restored within the first 15 minutes. Mitigation measures, probabilistic EV restarts, short thermostat offsets and accelerated DER reconnection, reduce exceedance to 0.019 and eliminate it entirely when 20\% or less of system load is restored. These results demonstrate that transition-era surges are asset-driven and causally linked to electrification and decarbonization, but can be effectively managed through integrated operational strategies.
Underground Power Distribution System Restoration Using Inverter Based Resources
Underground power distribution systems (PDSs) are increasingly deployed in urban areas. The integration of smart devices including smart switchgears, pad-mounted distribution transformers and inverter-based resources (IBRs) enhance system resilience, however simultaneously introducing unique challenges. The challenges include inrush currents caused by trapped charges in underground cables, ferroresonance in distribution transformers during energization, and three-phase load imbalance resulting from single-phase underground laterals. To address these issues, this paper proposes an underground PDS restoration framework using IBRs. Firstly, an underground cable energization model is developed to quantify inrush current by analyzing voltage differences across both switchgear terminals. Secondly, a distribution transformer energization model is proposed to evaluate ferroresonance using Q-factor constraints based on underground cable capacitance and damping resistance. Thirdly, a phase-swapping model is proposed to improve load balancing by dynamically reassigning lateral-phase connections through smart switchgears. The proposed models are further integrated into a mixed-integer nonlinear programming (MINLP) formulation to maximize the total weighted restored load while constraining inrush currents, ferroresonance, and phase imbalance. To address the nonlinearity induced by impedance matrix reordering during phase swapping, a permutation-based linearization technique is proposed. Finally, case studies on an underground PDS established based on IEEE 123-Node Test Feeder validate the effectiveness of the proposed strategy in improving uderground PDS restoration performance.
Quantum memory optimisation using finite-horizon, decoherence time and discounted mean-square performance criteria
This paper is concerned with open quantum memory systems for approximately retaining quantum information, such as initial dynamic variables or quantum states to be stored over a bounded time interval. In the Heisenberg picture of quantum dynamics, the deviation of the system variables from their initial values lends itself to closed-form computation in terms of tractable moment dynamics for open quantum harmonic oscillators and finite-level quantum systems governed by linear or quasi-linear Hudson-Parthasarathy quantum stochastic differential equations, respectively. This tractability is used in a recently proposed optimality criterion for varying the system parameters so as to maximise the memory decoherence time when the mean-square deviation achieves a given critical threshold. The memory decoherence time maximisation approach is extended beyond the previously considered low-threshold asymptotic approximation and to Schr\"{o}dinger type mean-square deviation functionals for the reduced system state governed by the Lindblad master equation. We link this approach with the minimisation of the mean-square deviation functionals at a finite time horizon and with their discounted version which quantifies the averaged performance of the quantum system as a temporary memory under a Poisson flow of storage requests.
comment: 8 pages, 1 figure, submitted to IFAC World Congress 2026
CPU- and GPU-Based Parallelization of the Robust Reference Governor
Constraint management is a central challenge in modern control systems. A solution is the Reference Governor (RG), which is an add-on strategy to pre-stabilized feedback control systems to enforce state and input constraints by shaping the reference command. While robust formulations of RG exist for linear systems, their extension to nonlinear systems is often computationally intractable. This paper develops a scenario-based robust RG formulation for nonlinear systems and investigates its parallel implementation on multi-core CPUs and CUDA-enabled GPUs. We analyze the computational structure of the algorithm, identify parallelization opportunities, and implement the resulting schemes on modern parallel hardware. Benchmarking on a nonlinear hydrogen fuel cell model demonstrates order-of-magnitude speedups (by as much as three orders of magnitude) compared to sequential implementations.
A Control Allocation Algorithm for Hypersonic Glide Vehicles with Input Limitations
Hypersonic glide vehicles (HGVs) operate in challenging flight regimes characterized by strong nonlinearities in actuation and stringent physical constraints. These include state-dependent actuator limitations, asymmetric control bounds, and thermal loads that vary with maneuvering conditions. This paper introduces an iterative control allocation method to address these challenges in real time. The proposed algorithm searches for control inputs that achieve the desired moment commands while respecting constraints on input magnitude and rate. For slender HGV configurations, thermal loads and drag generation are strongly correlated-lower drag typically results in reduced surface heating. By embedding drag-sensitive soft constraints, the method improves energy efficiency and implicitly reduces surface temperatures, lowering the vehicle's infrared signature. These features are particularly advantageous for long-range military operations that require low observability. The approach is demonstrated using the DLR's Generic Hypersonic Glide Vehicle 2 (GHGV-2) simulation model. The results confirm the method's effectiveness in maintaining control authority under realistic, constrained flight conditions.
comment: 38 pages, 20 figures, submitted to the AIAA Journal of Guidance, Control, and Dynamics
Satellite Navigation and Control using Physics-Informed Artificial Potential Field and Sliding Mode Controller
Increase in the number of space exploration missions has led to the accumulation of space debris, posing risk of collision with the operational satellites. Addressing this challenge is crucial for the sustainability of space operations. To plan a safe trajectory in the presence of moving space debris, an integrated approach of artificial potential field and sliding mode controller is proposed and implemented in this paper. The relative 6-DOF kinematics and dynamics of the spacecraft is modelled in the framework of geometric mechanics with the relative configuration expressed through exponential coordinates. Various collision avoidance guidance algorithms have been proposed in the literature but the Artificial Potential Field guidance algorithm is computationally efficient and enables real-time path adjustments to avoid collision with obstacles. However, it is prone to issues such as local minima. In literature, local minima issue is typically avoided by either redefining the potential function such as adding vorticity or by employing search techniques which are computationally expensive. To address these challenges, a physics-informed APF is proposed in this paper where Hamiltonian mechanics is used instead of the traditional Newtonian mechanics-based approach. In this approach, instead of relying on attractive and repulsive forces for path planning, the Hamiltonian approach uses the potential field to define a path of minimum potential. Additionally, to track the desired trajectory planned by the guidance algorithm within a fixed-time frame, a non-singular fixed-time sliding mode controller (FTSMC) is used. The proposed fixed-time sliding surface not only ensures fixed-time convergence of system states but also guarantees the global stability of the closed-loop system without singularity. The simulation results presented support the claims made.
SecuLEx: a Secure Limit Exchange Market for Dynamic Operating Envelopes
Distributed energy resources (DERs) are transforming power networks, challenging traditional operational methods, and requiring new coordination mechanisms. To address this challenge, this paper introduces SecuLEx (Secure Limit Exchange), a new market-based paradigm to allocate power injection and withdrawal limits that guarantee network security during time periods, called dynamic operating envelopes (DOEs). Under this paradigm, distribution system operators (DSOs) assign initial DOEs to customers. These limits can be exchanged afterward through a market, allowing customers to reallocate them according to their needs while ensuring network operational constraints. We formalize SecuLEx and illustrate DOE allocation and market exchanges on a small-scale low-voltage (LV) network, demonstrating that both procedures are computationally tractable. In this example, SecuLEx reduces renewable curtailment and improves grid utilization and social welfare compared to traditional approaches.
Closed-loop control of sloshing fuel in a spinning spacecraft
New-generation space missions require satellites to carry substantial amounts of liquid propellant, making it essential to analyse the coupled control-structure-propellant dynamics in detail. While Computational Fluid Dynamics (CFD) offers high-fidelity predictions, its computational cost limits its use in iterative design. Equivalent Mechanical Models (EMMs) provide a faster alternative, though their predictive performance, especially in closed-loop scenarios, remains largely unexplored. This work presents a comparative analysis of a spacecraft under feedback control, using both CFD and a reduced-order sloshing model. Results show good agreement, validating the simplified model for the manoeuvrer considered. This validation enables efficient sensitivity and stability studies, offering a practical tool for early-stage spacecraft design.
General formulation of an analytic, Lipschitz continuous control allocation for thrust-vectored controlled rigid-bodies
This study introduces a systematic and scalable method for arbitrary rigid-bodies equipped with vectorized thrusters. Two novel solutions are proposed: a closed-form, Lipschitz continuous mapping that ensures smooth actuator orientation references, and a convex optimization formulation capable of handling practical actuator constraints such as thrust saturation and angular rate limits. Both methods leverage the null-space structure of the allocation mapping to perform singularity avoidance while generating sub-optimal yet practical solutions. The effectiveness and generality of the proposed framework are demonstrated through numerical simulations on a 3DOF marine vessel and a 6DOF aerial quadcopter.
Optimizing BCI Rehabilitation Protocols for Stroke: Exploring Task Design and Training Duration
Stroke is a leading cause of long-term disability and the second most common cause of death worldwide. Although acute treatments have advanced, recovery remains challenging and limited. Brain-computer interfaces (BCIs) have emerged as a promising tool for post-stroke rehabilitation by promoting neuroplasticity. However, clinical outcomes remain variable, and optimal protocols have yet to be established. This study explores strategies to optimize BCI-based rehabilitation by comparing motor imagery of affected hand movement versus rest, instead of the conventional left-versus-right motor imagery. This alternative aims to simplify the task and address the weak contralateral activation commonly observed in stroke patients. Two datasets, one from healthy individuals and one from stroke patients, were used to evaluate the proposed approach. The results showed improved performance using both FBCSP and EEGNet. Additionally, we investigated the impact of session duration and found that shorter training sessions produced better BCI performance than longer sessions.
comment: 4 pages, 4 figures, accepted for 8th IEEE ENBENG Conference
A Stable, Accurate and Well-Conditioned Time-Domain PMCHWT Formulation
This paper introduces a new boundary element formulation for transient electromagnetic scattering by homogeneous dielectric objects based on the time-domain PMCHWT equation. To address dense-mesh breakdown, a multiplicative Calderon preconditioner utilizing a modified static electric field integral operator is employed. Large-timestep breakdown and late-time instability are simultaneously resolved by rescaling the Helmholtz components leveraging the quasi-Helmholtz projectors and using temporal differentiation and integration as rescaling operators. This rescaling also balances the loop and star components at large timesteps, improving solution accuracy. The resulting discrete system is solved using a marching-on-in-time scheme and iterative solvers. Numerical experiments for simply- and multiply-connected dielectric scatterers, including highly non-smooth geometries, corroborate the accuracy, stability, and efficiency of the proposed approach.
comment: 12 pages, 5 figures
Multi-level informed optimization via decomposed Kriging for large design problems under uncertainty
Engineering design involves demanding models encompassing many decision variables and uncontrollable parameters. In addition, unavoidable aleatoric and epistemic uncertainties can be very impactful and add further complexity. The state-of-the-art adopts two steps, uncertainty quantification and design optimization, to optimize systems under uncertainty by means of robust or stochastic metrics. However, conventional scenario-based, surrogate-assisted, and mathematical programming methods are not sufficiently scalable to be affordable and precise in large and complex cases. Here, a multi-level approach is proposed to accurately optimize resource-intensive, high-dimensional, and complex engineering problems under uncertainty with minimal resources. A non-intrusive, fast-scaling, Kriging-based surrogate is developed to map the combined design/parameter domain efficiently. Multiple surrogates are adaptively updated by hierarchical and orthogonal decomposition to leverage the fewer and most uncertainty-informed data. The proposed method is statistically compared to the state-of-the-art via an analytical testbed and is shown to be concurrently faster and more accurate by orders of magnitude.
comment: 34 pages, 18 figures
Topology optimization of nonlinear forced response curves via reduction on spectral submanifolds
Forced response curves (FRCs) of nonlinear systems can exhibit complex behaviors, including hardening/softening behavior and bifurcations. Although topology optimization holds great potential for tuning these nonlinear dynamic responses, its use in high-dimensional systems is limited by the high cost of repeated response and sensitivity analyses. To address this challenge, we employ the spectral submanifolds (SSMs) reduction theory, which reformulates the periodic response as the equilibria of an associated reduced-order model (ROM). This enables efficient and analytic evaluation of both response amplitudes and their sensitivities. Based on the SSM-based ROM, we formulate optimization problems that optimize the peak amplitude, the hardening/softening behavior, and the distance between two saddle-node bifurcations for an FRC. The proposed method is applied to the design of nonlinear MEMS devices, achieving targeted performance optimization. This framework provides a practical and efficient strategy for incorporating nonlinear dynamic effects into the topology optimization of structures.
comment: 26 pages, 12 figures. Submitted to Nonlinear Dynamics
Multi-Level Multi-Fidelity Methods for Path Integral and Safe Control
Sampling-based approaches are widely used in systems without analytic models to estimate risk or find optimal control. However, gathering sufficient data in such scenarios can be prohibitively costly. On the other hand, in many situations, low-fidelity models or simulators are available from which samples can be obtained at low cost. In this paper, we propose an efficient approach for risk quantification and path integral control that leverages such data from multiple models with heterogeneous sampling costs. A key technical novelty of our approach is the integration of Multi-level Monte Carlo (MLMC) and Multi-fidelity Monte Carlo (MFMC) that enable data from different time and state representations (system models) to be jointly used to reduce variance and improve sampling efficiency. We also provide theoretical analysis of the proposed method and show that our estimator is unbiased and consistent under mild conditions. Finally, we demonstrate via numerical simulation that the proposed method has improved computation (sampling costs) vs. accuracy trade-offs for risk quantification and path integral control.
Space Logistics Analysis and Incentive Design for Commercialization of Orbital Debris Remediation
As orbital debris continues to become a higher priority for the space industry, there is a need to explore how partnerships between the public and private space sector may aid in addressing this issue. This research develops a space logistics framework for planning orbital debris remediation missions, providing a quantitative basis for partnerships that are mutually beneficial between space operators and debris remediators. By integrating network-based space logistics and game theory, we illuminate the high-level costs of remediating orbital debris, and the surplus that stands to be shared as a result. These findings indicate significant progress toward the continued development of a safe, sustainable, and profitable space economy.
comment: 28 pages, 14 figures, Journal of Spacecraft and Rockets (Articles in Advance)
EB-MBD: Emerging-Barrier Model-Based Diffusion for Safe Trajectory Optimization in Highly Constrained Environments
We propose enforcing constraints on Model-Based Diffusion by introducing emerging barrier functions inspired by interior point methods. We show that constraints on Model-Based Diffusion can lead to catastrophic performance degradation, even on simple 2D systems due to sample inefficiency in the Monte Carlo approximation of the score function. We introduce Emerging-Barrier Model-Based Diffusion (EB-MBD) which uses progressively introduced barrier constraints to avoid these problems, significantly improving solution quality, without the need for computationally expensive operations such as projections. We analyze the sampling liveliness of samples each iteration to inform barrier parameter scheduling choice. We demonstrate results for 2D collision avoidance and a 3D underwater manipulator system and show that our method achieves lower cost solutions than Model-Based Diffusion, and requires orders of magnitude less computation time than projection based methods.
Some Reflections on Sliding Mode Designs in Control Systems: An Example of Adaptive Tracking Control for Simple Mechanical Systems With Friction Without Measurement of Velocity
The objective of this note is to share some reflections of the authors regarding the use of sliding mode designs in control systems. We believe the abundant, and ever increasing, appearance of this kind of works on our scientific publications deserves some critical evaluation of their actual role, relevance and pertinence. First, we discuss the procedure followed by most of these designs -- illustrated with examples from the literature. Second, we bring to the readers attention several aspects of the control problem, central in classical designs, which are disregarded in the sliding mode literature. Finally, to illustrate with an specific example our previous considerations, we compare the performance of two adaptive tracking controllers for a simple one degree of freedom mechanical systems with unknown parameters and static and Coulomb friction -- that do not rely on the measurement of velocity.
Optimal Control with Lyapunov Stability Guarantees for Space Applications
This paper investigates the infinite horizon optimal control problem (OCP) for space applications characterized by nonlinear dynamics. The proposed approach divides the problem into a finite horizon OCP with a regularized terminal cost, guiding the system towards a terminal set, and an infinite horizon linear regulation phase within this set. This strategy guarantees global asymptotic stability under specific assumptions. Our method maintains the system's fully nonlinear dynamics until it reaches the terminal set, where the system dynamics is linearized. As the terminal set converges to the origin, the difference in optimal cost incurred reduces to zero, guaranteeing an efficient and stable solution. The approach is tested through simulations on three problems: spacecraft attitude control, rendezvous maneuver, and soft landing. In spacecraft attitude control, we focus on achieving precise orientation and stabilization. For rendezvous maneuvers, we address the navigation of a chaser to meet a target spacecraft. For the soft landing problem, we ensure a controlled descent and touchdown on a planetary surface. We provide numerical results confirming the effectiveness of the proposed method in managing these nonlinear dynamics problems, offering robust solutions essential for successful space missions.
Joint Detection, Channel Estimation and Interference Nulling for Terrestrial-Satellite Downlink Co-Existence in the Upper Mid-Band
The upper mid-band FR3 spectrum (7-24 GHz) has garnered significant interest for future cellular services. However, utilizing a large portion of this band requires careful interference coordination with incumbent satellite systems. This paper investigates interference from high-power terrestrial base stations (TN-BSs) to satellite downlink receivers. A central challenge is that the victim receivers, i.e., ground-based non-terrestrial user equipment (NTN-UEs) such as satellite customer premises equipment, must first be detected and their channels estimated before the TN-BS can effectively place nulls in their directions. We explore a potential solution where NTN-UEs periodically transmit preambles or beacon signals that TN-BSs can use for detection and channel estimation. The performance of this nulling approach is analyzed in a simplified scenario with a single victim, revealing the interplay between path loss and estimation quality in determining nulling performance. To further validate the method, we conduct a detailed multi-user site-specific ray-tracing (RT) simulation in a rural environment. The results show that the proposed nulling approach is effective under realistic parameters, even with high densities of victim units, although TN-BS may require a substantial number of antennas.
comment: Accepted for publication in the Proceedings of GlobeCom 2025
Whole Body Model Predictive Control for Spin-Aware Quadrupedal Table Tennis ICRA 2026
Developing table tennis robots that mirror human speed, accuracy, and ability to predict and respond to the full range of ball spins remains a significant challenge for legged robots. To demonstrate these capabilities we present a system to play dynamic table tennis for quadrupedal robots that integrates high speed perception, trajectory prediction, and agile control. Our system uses external cameras for high-speed ball localization, physical models with learned residuals to infer spin and predict trajectories, and a novel model predictive control (MPC) formulation for agile full-body control. Notably, a continuous set of stroke strategies emerge automatically from different ball return objectives using this control paradigm. We demonstrate our system in the real world on a Spot quadruped, evaluate accuracy of each system component, and exhibit coordination through the system's ability to aim and return balls with varying spin types. As a further demonstration, the system is able to rally with human players.
comment: Submitted to appear in IEEE ICRA 2026
When to Reason: Semantic Router for vLLM NeurIPS 2025
Large Language Models (LLMs) demonstrate substantial accuracy gains when augmented with reasoning modes such as chain-of-thought and inference-time scaling. However, reasoning also incurs significant costs in inference latency and token usage, with environmental and financial impacts, which are unnecessary for many simple prompts. We present a semantic router that classifies queries based on their reasoning requirements and selectively applies reasoning only when beneficial. Our approach achieves a 10.2 percentage point improvement in accuracy on the MMLU-Pro benchmark while reducing response latency by 47.1% and token consumption by 48.5% compared to direct inference with vLLM. These results demonstrate that semantic routing offers an effective mechanism for striking a balance between accuracy and efficiency in open-source LLM serving systems
comment: 5 pages, excluding references and appendix. To be appeared at Workshop on ML for Systems at NeurIPS 2025, December 6, 2025 https://mlforsystems.org/
Identification and optimal control strategies for the transversal splitting of ultra--cold Bose gases
Splitting a Bose--Einstein condensate (BEC) is a key operation in fundamental physics experiments and emerging quantum technologies, where precise preparation of well--defined initial states requires fast yet coherent control of the condensate's nonlinear dynamics. This work formulates the BEC splitting process as an optimal feedforward control problem based on a physically interpretable, reduced--order model identified from limited experimental data. We introduce a systematic calibration strategy that combines optimal experiment selection and constrained nonlinear parameter estimation, enabling accurate system identification with minimal experimental overhead. Using this calibrated model, we compute energy--optimal trajectories via indirect optimal control to realize shortcuts to adiabaticity (STAs), achieving rapid transitions to the ground state of a double--well potential while suppressing excitations. Experiments confirm that the proposed control framework yields high--fidelity state transfers across multiple configurations, demonstrating its robustness and scalability for quantum control applications.
Resilient Multi-Dimensional Consensus and Distributed Optimization against Agent-Based and Denial-of-Service Attacks
In this paper, we consider the resilient multi-dimensional consensus and distributed optimization problems of multi-agent systems (MASs) in the presence of both agent-based and denial-of-service (DoS) attacks. The considered agent-based attacks can cover malicious, Byzantine, and stubborn agents. The links between agents in the network can be blocked by DoS attacks, which may lead the digraph to be time-varying and even disconnected. The objective is to ensure that the remaining benign agents achieve consensus. To this end, an "auxiliary point"-based resilient control algorithm is proposed for MASs. Under the proposed algorithm, each healthy agent constructs a "safe kernel" utilizing the states of its in-neighbors and updates its state toward a specific point within this kernel at each iteration. If an agent cannot receive its neighbors' states owing to DoS attacks, it will use the states received immediately before the DoS period. Moreover, a resilient multi-dimensional distributed optimization (RMDO) algorithm is also proposed. Theoretical proofs and numerical examples are presented to demonstrate the effectiveness of the proposed algorithms.
Optimization via a Control-Centric Framework
Optimization plays a central role in intelligent systems and cyber-physical technologies, where the speed and reliability of convergence directly impact performance. In control theory, optimization-centric methods are standard: controllers are designed by repeatedly solving optimization problems, as in linear quadratic regulation, $H_\infty$ control, and model predictive control. In contrast, this paper develops a control-centric framework for optimization itself, where algorithms are constructed directly from Lyapunov stability principles rather than being proposed first and analyzed afterward. A key element is the stationarity vector, which encodes first-order optimality conditions and enables Lyapunov-based convergence analysis. By pairing a Lyapunov function with a selectable decay law, we obtain continuous-time dynamics with guaranteed exponential, finite-time, fixed-time, or prescribed-time convergence. Within this framework, we introduce three feedback realizations of increasing restrictiveness: the Hessian-gradient, Newton, and gradient dynamics. Each realization shapes the decay of the stationarity vector to achieve the desired rate. These constructions unify unconstrained optimization, extend naturally to constrained problems via Lyapunov-consistent primal-dual dynamics, and broaden the results for minimax and generalized Nash equilibrium seeking problems beyond exponential stability. The framework provides systematic design tools for optimization algorithms in control and game-theoretic problems.
comment: This work has been submitted to the IEEE for possible publication. 12 pages, 3 figures
Optimal control of continuous-time symmetric systems with unknown dynamics and noisy measurements
An iterative learning algorithm is presented for continuous-time linear-quadratic optimal control problems where the system is externally symmetric with unknown dynamics. Both finite-horizon and infinite-horizon problems are considered. It is shown that the proposed algorithm is globally convergent to the optimal solution and has some advantages over adaptive dynamic programming, including being unbiased under noisy measurements and having a relatively low computational burden. Numerical experiments show the effectiveness of the results.
Product-oriented Product-Process-Resource Asset Network and its Representation in AutomationML for Asset Administration Shell
Current products, especially in the automotive sector, pose complex technical systems having a multi-disciplinary mechatronic nature. Industrial standards supporting system engineering and production typically (i) address the production phase only, but do not cover the complete product life cycle, and (ii) focus on production processes and resources rather than the products themselves. The presented approach is motivated by incorporating the impacts of the end-of-life phase of the product life cycle into the engineering phase. This paper proposes a modeling approach coming up from the Product-Process-Resource (PPR) modeling paradigm. It combines requirements on (i) respecting the product structure as a basis for the model, and (ii) incorporates repairing, remanufacturing, or upcycling within cyber-physical production systems. The proposed model called PoPAN should accompany the product during the entire life cycle as a digital shadow encapsulated within the Asset Administration Shell of a product. To facilitate the adoption of the proposed paradigm, the paper also proposes serialization of the model in the AutomationML data format. The model is demonstrated on a use-case for disassembling electric vehicle batteries to support their remanufacturing for stationary battery applications.
comment: \copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Hierarchical Reinforcement Learning with Low-Level MPC for Multi-Agent Control
Achieving safe and coordinated behavior in dynamic, constraint-rich environments remains a major challenge for learning-based control. Pure end-to-end learning often suffers from poor sample efficiency and limited reliability, while model-based methods depend on predefined references and struggle to generalize. We propose a hierarchical framework that combines tactical decision-making via reinforcement learning (RL) with low-level execution through Model Predictive Control (MPC). For the case of multi-agent systems this means that high-level policies select abstract targets from structured regions of interest (ROIs), while MPC ensures dynamically feasible and safe motion. Tested on a predator-prey benchmark, our approach outperforms end-to-end and shielding-based RL baselines in terms of reward, safety, and consistency, underscoring the benefits of combining structured learning with model-based control.
Revisiting Functional Derivatives in Multi-object Tracking
Probability generating functionals (PGFLs) are efficient and powerful tools for tracking independent objects in clutter. It was shown that PGFLs could be used for the elegant derivation of practical multi-object tracking algorithms, e.g., the probability hypothesis density (PHD) filter. However, derivations using PGFLs use the so-called functional derivatives whose definitions usually appear too complicated or heuristic, involving Dirac delta ``functions''. This paper begins by comparing different definitions of functional derivatives and exploring their relationships and implications for practical applications. It then proposes a rigorous definition of the functional derivative, utilizing straightforward yet precise mathematics for clarity. Key properties of the functional derivative are revealed and discussed.
comment: submitted to IEEE Transactions on Information Theory
Carleman-Fourier linearization of nonlinear real dynamical systems with quasi-periodic fields
This paper presents Carleman-Fourier linearization for analyzing nonlinear real dynamical systems with periodic vector fields. Using Fourier basis functions, this novel framework transforms such dynamical systems into equivalent infinite-dimensional linear dynamical systems. In this paper, we establish the exponential convergence of the primary block in the finite-section approximation of this linearized system to the state vector of the original nonlinear system. To showcase the efficacy of our approach, we apply it to the Kuramoto model, a prominent model for coupled oscillators. The results demonstrate promising accuracy in approximating the original system's behavior.
comment: Discrete and Continuous Dynamical Systems Series B, accepted
Foundation Models for Structural Health Monitoring
Structural Health Monitoring (SHM) is a critical task for ensuring the safety and reliability of civil infrastructures, typically realized on bridges and viaducts by means of vibration monitoring. In this paper, we propose for the first time the use of Transformer neural networks, with a Masked Auto-Encoder architecture, as Foundation Models for SHM. We demonstrate the ability of these models to learn generalizable representations from multiple large datasets through self-supervised pre-training, which, coupled with task-specific fine-tuning, allows them to outperform state-of-the-art traditional methods on diverse tasks, including Anomaly Detection (AD) and Traffic Load Estimation (TLE). We then extensively explore model size versus accuracy trade-offs and experiment with Knowledge Distillation (KD) to improve the performance of smaller Transformers, enabling their embedding directly into the SHM edge nodes. We showcase the effectiveness of our foundation models using data from three operational viaducts. For AD, we achieve a near-perfect 99.9% accuracy with a monitoring time span of just 15 windows. In contrast, a state-of-the-art method based on Principal Component Analysis (PCA) obtains its first good result (95.03% accuracy), only considering 120 windows. On two different TLE tasks, our models obtain state-of-the-art performance on multiple evaluation metrics (R$^2$ score, MAE% and MSE%). On the first benchmark, we achieve an R$^2$ score of 0.97 and 0.90 for light and heavy vehicle traffic, respectively, while the best previous approach (a Random Forest) stops at 0.91 and 0.84. On the second one, we achieve an R$^2$ score of 0.54 versus the 0.51 of the best competitor method, a Long-Short Term Memory network.
comment: 17 pages, 6 tables, 9 figures
A Long-Duration Autonomy Approach to Connected and Automated Vehicles
In this article, we present a long-duration autonomy approach for the control of connected and automated vehicles (CAVs) operating in a transportation network. In particular, we focus on the performance of CAVs at traffic bottlenecks, including roundabouts, merging roadways, and intersections. We take a principled approach based on optimal control, and derive a reactive controller with guarantees on safety, performance, and energy efficiency. We guarantee safety through high order control barrier functions (HOCBFs), which we ``lift'' to first order CBFs using time-optimal motion primitives. This yields a set of first-order CBFs that are compatible with the control bounds. We demonstrate the performance of our approach in simulation and compare it to an optimal control-based approach.
comment: 8 pages, 3 figures
MARLIN: Multi-Agent Reinforcement Learning with Murmuration Intelligence and LLM Guidance for Reservoir Management
As climate change intensifies extreme weather events, water disasters pose growing threats to global communities, making adaptive reservoir management critical for protecting vulnerable populations and ensuring water security. Modern water resource management faces unprecedented challenges from cascading uncertainties propagating through interconnected reservoir networks. These uncertainties, rooted in physical water transfer losses and environmental variability, make precise control difficult. For example, sending 10 tons downstream may yield only 8-12 tons due to evaporation and seepage. Traditional centralized optimization approaches suffer from exponential computational complexity and cannot effectively handle such real-world uncertainties, while existing multi-agent reinforcement learning (MARL) methods fail to achieve effective coordination under uncertainty. To address these challenges, we present MARLIN, a decentralized reservoir management framework inspired by starling murmurations intelligence. Integrating bio-inspired alignment, separation, and cohesion rules with MARL, MARLIN enables individual reservoirs to make local decisions while achieving emergent global coordination. In addition, a LLM provides real-time reward shaping signals, guiding agents to adapt to environmental changes and human-defined preferences. Experiments on real-world USGS data show that MARLIN improves uncertainty handling by 23\%, cuts computation by 35\%, and accelerates flood response by 68\%, exhibiting super-linear coordination, with complexity scaling 5.4x from 400 to 10,000 nodes. These results demonstrate MARLIN's potential for disaster prevention and protecting communities through intelligent, scalable water resource management.
A Survey of Foundation Models for IoT: Taxonomy and Criteria-Based Analysis
Foundation models have gained growing interest in the IoT domain due to their reduced reliance on labeled data and strong generalizability across tasks, which address key limitations of traditional machine learning approaches. However, most existing foundation model based methods are developed for specific IoT tasks, making it difficult to compare approaches across IoT domains and limiting guidance for applying them to new tasks. This survey aims to bridge this gap by providing a comprehensive overview of current methodologies and organizing them around four shared performance objectives by different domains: efficiency, context-awareness, safety, and security & privacy. For each objective, we review representative works, summarize commonly-used techniques and evaluation metrics. This objective-centric organization enables meaningful cross-domain comparisons and offers practical insights for selecting and designing foundation model based solutions for new IoT tasks. We conclude with key directions for future research to guide both practitioners and researchers in advancing the use of foundation models in IoT applications.
comment: Accepted by CCF Transactions on Pervasive Computing and Interaction (CCF TPCI)
Fast Online Adaptive Neural MPC via Meta-Learning
Data-driven model predictive control (MPC) has demonstrated significant potential for improving robot control performance in the presence of model uncertainties. However, existing approaches often require extensive offline data collection and computationally intensive training, limiting their ability to adapt online. To address these challenges, this paper presents a fast online adaptive MPC framework that leverages neural networks integrated with Model-Agnostic Meta-Learning (MAML). Our approach focuses on few-shot adaptation of residual dynamics - capturing the discrepancy between nominal and true system behavior - using minimal online data and gradient steps. By embedding these meta-learned residual models into a computationally efficient L4CasADi-based MPC pipeline, the proposed method enables rapid model correction, enhances predictive accuracy, and improves real-time control performance. We validate the framework through simulation studies on a Van der Pol oscillator, a Cart-Pole system, and a 2D quadrotor. Results show significant gains in adaptation speed and prediction accuracy over both nominal MPC and nominal MPC augmented with a freshly initialized neural network, underscoring the effectiveness of our approach for real-time adaptive robot control.
Characterizing and Optimizing Real-Time Optimal Control for Embedded SoCs
Resource-limited robots face significant challenges in executing computationally intensive tasks, such as locomotion and manipulation, particularly for real-time optimal control algorithms like Model Predictive Control (MPC). This paper provides a comprehensive design space exploration to identify optimal hardware computation architectures for these demanding model-based control algorithms. We profile and optimize representative architectural designs, including general-purpose scalar CPUs, vector processors, and specialized accelerators. By characterizing kernel-level benchmarks and end-to-end robotic scenarios, including a hardware-in-the-loop evaluation on a fabricated RISC-V multi-core vector SoC, we present a quantitative comparison of performance, area, and utilization across distinct architectural design points. Our findings demonstrate that targeted architectural modifications, coupled with deep software and system optimizations, enable up to 3.71x speedups for MPC, resulting in up to 27% system-level power reductions while completing robotic tasks. Finally, we propose a code generation flow designed to simplify the complex engineering effort required for mapping robotic workloads onto specialized architectures.
Hybrid Feedback Control for Global Navigation with Locally Optimal Obstacle Avoidance in n-Dimensional Spaces
We present a hybrid feedback control framework for autonomous robot navigation in n-dimensional Euclidean spaces cluttered with spherical obstacles. The proposed approach ensures safe and global navigation towards a target location by dynamically switching between two operational modes: motion-to-destination and locally optimal obstacle-avoidance. It produces continuous velocity inputs, ensures collision-free trajectories and generates locally optimal obstacle avoidance maneuvers. Unlike existing methods, the proposed framework is compatible with range sensors, enabling navigation in both a priori known and unknown environments. Extensive simulations in 2D and 3D settings, complemented by experimental validation on a TurtleBot 4 platform, confirm the efficacy and robustness of the approach. Our results demonstrate shorter paths and smoother trajectories compared to state-of-the-art methods, while maintaining computational efficiency and real-world feasibility.
Safe Autonomous Environmental Contact for Soft Robots using Control Barrier Functions
Robots built from soft materials will inherently apply lower environmental forces than their rigid counterparts, and therefore may be more suitable in sensitive settings with unintended contact. However, these robots' applied forces result from both their design and their control system in closed-loop, and therefore, ensuring bounds on these forces requires controller synthesis for safety as well. This article introduces the first feedback controller for a soft manipulator that formally meets a safety specification with respect to environmental contact. In our proof-of-concept setting, the robot's environment has known geometry and is deformable with a known elastic modulus. Our approach maps a bound on applied forces to a safe set of positions of the robot's tip via predicted deformations of the environment. Then, a quadratic program with Control Barrier Functions in its constraints is used to supervise a nominal feedback signal, verifiably maintaining the robot's tip within this safe set. Hardware experiments on a multi-segment soft pneumatic robot demonstrate that the proposed framework successfully maintains a positive safety margin. This framework represents a fundamental shift in perspective on control and safety for soft robots, implementing a formally verifiable logic specification on their pose and contact forces.
comment: 8 pages, 9 figures
Observability for Nonlinear Systems: Connecting Variational Dynamics, Lyapunov Exponents, and Empirical Gramians
Observability quantification is a key problem in dynamic network sciences. While it has been thoroughly studied for linear systems, observability quantification for nonlinear networks is less intuitive and more cumbersome. One common approach to quantify observability for nonlinear systems is via the Empirical Gramian (Empr-Gram) -- a generalized form of the Gramian of linear systems. In this paper, we produce three new results. First, we establish that a variational form of discrete-time autonomous nonlinear systems yields a so-called Variational Gramian (Var-Gram) that is equivalent to the classic Empr-Gram; the former being easier to compute than the latter. Via Lyapunov exponents derived from Lyapunov's direct method, the paper's second result derives connections between existing observability measures and Var-Gram. The third result demonstrates the applicability of these new notions for sensor selection/placement in nonlinear systems. Numerical case studies demonstrate these three developments and their merits.
Systems and Control (EESS)
Reliability of Single-Level Equality-Constrained Inverse Optimal Control
Inverse optimal control (IOC) allows the retrieval of optimal cost function weights, or behavioral parameters, from human motion. The literature on IOC uses methods that are either based on a slow bilevel process or a fast but noise-sensitive minimization of optimality condition violation. Assuming equality-constrained optimal control models of human motion, this article presents a faster but robust approach to solving IOC using a single-level reformulation of the bilevel method and yields equivalent results. Through numerical experiments in simulation, we analyze the robustness to noise of the proposed single-level reformulation to the bilevel IOC formulation with a human-like planar reaching task that is used across recent studies. The approach shows resilience to very large levels of noise and reduces the computation time of the IOC on this task by a factor of 15 when compared to a classical bilevel implementation.
comment: 8 pages, 3 figures
Learning to Mitigate Post-Outage Load Surges: A Data-Driven Framework for Electrifying and Decarbonizing Grids
Electrification and decarbonization are transforming power system demand and recovery dynamics, yet their implications for post-outage load surges remain poorly understood. Here we analyze a metropolitan-scale heterogeneous dataset for Indianapolis comprising 30,046 feeder-level outages between 2020 and 2024, linked to smart meters and submetering, to quantify the causal impact of electric vehicles (EVs), heat pumps (HPs) and distributed energy resources (DERs) on restoration surges. Statistical analysis and causal forest inference demonstrate that rising penetrations of all three assets significantly increase surge ratios, with effects strongly modulated by restoration timing, outage duration and weather conditions. We develop a component-aware multi-task Transformer estimator that disaggregates EV, HP and DER contributions, and apply it to project historical outages under counterfactual 2035 adoption pathways. In a policy-aligned pathway, evening restorations emerge as the binding reliability constraint, with exceedance probabilities of 0.057 when 30\% of system load is restored within the first 15 minutes. Mitigation measures, probabilistic EV restarts, short thermostat offsets and accelerated DER reconnection, reduce exceedance to 0.019 and eliminate it entirely when 20\% or less of system load is restored. These results demonstrate that transition-era surges are asset-driven and causally linked to electrification and decarbonization, but can be effectively managed through integrated operational strategies.
Underground Power Distribution System Restoration Using Inverter Based Resources
Underground power distribution systems (PDSs) are increasingly deployed in urban areas. The integration of smart devices including smart switchgears, pad-mounted distribution transformers and inverter-based resources (IBRs) enhance system resilience, however simultaneously introducing unique challenges. The challenges include inrush currents caused by trapped charges in underground cables, ferroresonance in distribution transformers during energization, and three-phase load imbalance resulting from single-phase underground laterals. To address these issues, this paper proposes an underground PDS restoration framework using IBRs. Firstly, an underground cable energization model is developed to quantify inrush current by analyzing voltage differences across both switchgear terminals. Secondly, a distribution transformer energization model is proposed to evaluate ferroresonance using Q-factor constraints based on underground cable capacitance and damping resistance. Thirdly, a phase-swapping model is proposed to improve load balancing by dynamically reassigning lateral-phase connections through smart switchgears. The proposed models are further integrated into a mixed-integer nonlinear programming (MINLP) formulation to maximize the total weighted restored load while constraining inrush currents, ferroresonance, and phase imbalance. To address the nonlinearity induced by impedance matrix reordering during phase swapping, a permutation-based linearization technique is proposed. Finally, case studies on an underground PDS established based on IEEE 123-Node Test Feeder validate the effectiveness of the proposed strategy in improving uderground PDS restoration performance.
Quantum memory optimisation using finite-horizon, decoherence time and discounted mean-square performance criteria
This paper is concerned with open quantum memory systems for approximately retaining quantum information, such as initial dynamic variables or quantum states to be stored over a bounded time interval. In the Heisenberg picture of quantum dynamics, the deviation of the system variables from their initial values lends itself to closed-form computation in terms of tractable moment dynamics for open quantum harmonic oscillators and finite-level quantum systems governed by linear or quasi-linear Hudson-Parthasarathy quantum stochastic differential equations, respectively. This tractability is used in a recently proposed optimality criterion for varying the system parameters so as to maximise the memory decoherence time when the mean-square deviation achieves a given critical threshold. The memory decoherence time maximisation approach is extended beyond the previously considered low-threshold asymptotic approximation and to Schr\"{o}dinger type mean-square deviation functionals for the reduced system state governed by the Lindblad master equation. We link this approach with the minimisation of the mean-square deviation functionals at a finite time horizon and with their discounted version which quantifies the averaged performance of the quantum system as a temporary memory under a Poisson flow of storage requests.
comment: 8 pages, 1 figure, submitted to IFAC World Congress 2026
CPU- and GPU-Based Parallelization of the Robust Reference Governor
Constraint management is a central challenge in modern control systems. A solution is the Reference Governor (RG), which is an add-on strategy to pre-stabilized feedback control systems to enforce state and input constraints by shaping the reference command. While robust formulations of RG exist for linear systems, their extension to nonlinear systems is often computationally intractable. This paper develops a scenario-based robust RG formulation for nonlinear systems and investigates its parallel implementation on multi-core CPUs and CUDA-enabled GPUs. We analyze the computational structure of the algorithm, identify parallelization opportunities, and implement the resulting schemes on modern parallel hardware. Benchmarking on a nonlinear hydrogen fuel cell model demonstrates order-of-magnitude speedups (by as much as three orders of magnitude) compared to sequential implementations.
A Control Allocation Algorithm for Hypersonic Glide Vehicles with Input Limitations
Hypersonic glide vehicles (HGVs) operate in challenging flight regimes characterized by strong nonlinearities in actuation and stringent physical constraints. These include state-dependent actuator limitations, asymmetric control bounds, and thermal loads that vary with maneuvering conditions. This paper introduces an iterative control allocation method to address these challenges in real time. The proposed algorithm searches for control inputs that achieve the desired moment commands while respecting constraints on input magnitude and rate. For slender HGV configurations, thermal loads and drag generation are strongly correlated-lower drag typically results in reduced surface heating. By embedding drag-sensitive soft constraints, the method improves energy efficiency and implicitly reduces surface temperatures, lowering the vehicle's infrared signature. These features are particularly advantageous for long-range military operations that require low observability. The approach is demonstrated using the DLR's Generic Hypersonic Glide Vehicle 2 (GHGV-2) simulation model. The results confirm the method's effectiveness in maintaining control authority under realistic, constrained flight conditions.
comment: 38 pages, 20 figures, submitted to the AIAA Journal of Guidance, Control, and Dynamics
Satellite Navigation and Control using Physics-Informed Artificial Potential Field and Sliding Mode Controller
Increase in the number of space exploration missions has led to the accumulation of space debris, posing risk of collision with the operational satellites. Addressing this challenge is crucial for the sustainability of space operations. To plan a safe trajectory in the presence of moving space debris, an integrated approach of artificial potential field and sliding mode controller is proposed and implemented in this paper. The relative 6-DOF kinematics and dynamics of the spacecraft is modelled in the framework of geometric mechanics with the relative configuration expressed through exponential coordinates. Various collision avoidance guidance algorithms have been proposed in the literature but the Artificial Potential Field guidance algorithm is computationally efficient and enables real-time path adjustments to avoid collision with obstacles. However, it is prone to issues such as local minima. In literature, local minima issue is typically avoided by either redefining the potential function such as adding vorticity or by employing search techniques which are computationally expensive. To address these challenges, a physics-informed APF is proposed in this paper where Hamiltonian mechanics is used instead of the traditional Newtonian mechanics-based approach. In this approach, instead of relying on attractive and repulsive forces for path planning, the Hamiltonian approach uses the potential field to define a path of minimum potential. Additionally, to track the desired trajectory planned by the guidance algorithm within a fixed-time frame, a non-singular fixed-time sliding mode controller (FTSMC) is used. The proposed fixed-time sliding surface not only ensures fixed-time convergence of system states but also guarantees the global stability of the closed-loop system without singularity. The simulation results presented support the claims made.
SecuLEx: a Secure Limit Exchange Market for Dynamic Operating Envelopes
Distributed energy resources (DERs) are transforming power networks, challenging traditional operational methods, and requiring new coordination mechanisms. To address this challenge, this paper introduces SecuLEx (Secure Limit Exchange), a new market-based paradigm to allocate power injection and withdrawal limits that guarantee network security during time periods, called dynamic operating envelopes (DOEs). Under this paradigm, distribution system operators (DSOs) assign initial DOEs to customers. These limits can be exchanged afterward through a market, allowing customers to reallocate them according to their needs while ensuring network operational constraints. We formalize SecuLEx and illustrate DOE allocation and market exchanges on a small-scale low-voltage (LV) network, demonstrating that both procedures are computationally tractable. In this example, SecuLEx reduces renewable curtailment and improves grid utilization and social welfare compared to traditional approaches.
Closed-loop control of sloshing fuel in a spinning spacecraft
New-generation space missions require satellites to carry substantial amounts of liquid propellant, making it essential to analyse the coupled control-structure-propellant dynamics in detail. While Computational Fluid Dynamics (CFD) offers high-fidelity predictions, its computational cost limits its use in iterative design. Equivalent Mechanical Models (EMMs) provide a faster alternative, though their predictive performance, especially in closed-loop scenarios, remains largely unexplored. This work presents a comparative analysis of a spacecraft under feedback control, using both CFD and a reduced-order sloshing model. Results show good agreement, validating the simplified model for the manoeuvrer considered. This validation enables efficient sensitivity and stability studies, offering a practical tool for early-stage spacecraft design.
General formulation of an analytic, Lipschitz continuous control allocation for thrust-vectored controlled rigid-bodies
This study introduces a systematic and scalable method for arbitrary rigid-bodies equipped with vectorized thrusters. Two novel solutions are proposed: a closed-form, Lipschitz continuous mapping that ensures smooth actuator orientation references, and a convex optimization formulation capable of handling practical actuator constraints such as thrust saturation and angular rate limits. Both methods leverage the null-space structure of the allocation mapping to perform singularity avoidance while generating sub-optimal yet practical solutions. The effectiveness and generality of the proposed framework are demonstrated through numerical simulations on a 3DOF marine vessel and a 6DOF aerial quadcopter.
Optimizing BCI Rehabilitation Protocols for Stroke: Exploring Task Design and Training Duration
Stroke is a leading cause of long-term disability and the second most common cause of death worldwide. Although acute treatments have advanced, recovery remains challenging and limited. Brain-computer interfaces (BCIs) have emerged as a promising tool for post-stroke rehabilitation by promoting neuroplasticity. However, clinical outcomes remain variable, and optimal protocols have yet to be established. This study explores strategies to optimize BCI-based rehabilitation by comparing motor imagery of affected hand movement versus rest, instead of the conventional left-versus-right motor imagery. This alternative aims to simplify the task and address the weak contralateral activation commonly observed in stroke patients. Two datasets, one from healthy individuals and one from stroke patients, were used to evaluate the proposed approach. The results showed improved performance using both FBCSP and EEGNet. Additionally, we investigated the impact of session duration and found that shorter training sessions produced better BCI performance than longer sessions.
comment: 4 pages, 4 figures, accepted for 8th IEEE ENBENG Conference
A Stable, Accurate and Well-Conditioned Time-Domain PMCHWT Formulation
This paper introduces a new boundary element formulation for transient electromagnetic scattering by homogeneous dielectric objects based on the time-domain PMCHWT equation. To address dense-mesh breakdown, a multiplicative Calderon preconditioner utilizing a modified static electric field integral operator is employed. Large-timestep breakdown and late-time instability are simultaneously resolved by rescaling the Helmholtz components leveraging the quasi-Helmholtz projectors and using temporal differentiation and integration as rescaling operators. This rescaling also balances the loop and star components at large timesteps, improving solution accuracy. The resulting discrete system is solved using a marching-on-in-time scheme and iterative solvers. Numerical experiments for simply- and multiply-connected dielectric scatterers, including highly non-smooth geometries, corroborate the accuracy, stability, and efficiency of the proposed approach.
comment: 12 pages, 5 figures
Multi-level informed optimization via decomposed Kriging for large design problems under uncertainty
Engineering design involves demanding models encompassing many decision variables and uncontrollable parameters. In addition, unavoidable aleatoric and epistemic uncertainties can be very impactful and add further complexity. The state-of-the-art adopts two steps, uncertainty quantification and design optimization, to optimize systems under uncertainty by means of robust or stochastic metrics. However, conventional scenario-based, surrogate-assisted, and mathematical programming methods are not sufficiently scalable to be affordable and precise in large and complex cases. Here, a multi-level approach is proposed to accurately optimize resource-intensive, high-dimensional, and complex engineering problems under uncertainty with minimal resources. A non-intrusive, fast-scaling, Kriging-based surrogate is developed to map the combined design/parameter domain efficiently. Multiple surrogates are adaptively updated by hierarchical and orthogonal decomposition to leverage the fewer and most uncertainty-informed data. The proposed method is statistically compared to the state-of-the-art via an analytical testbed and is shown to be concurrently faster and more accurate by orders of magnitude.
comment: 34 pages, 18 figures
Topology optimization of nonlinear forced response curves via reduction on spectral submanifolds
Forced response curves (FRCs) of nonlinear systems can exhibit complex behaviors, including hardening/softening behavior and bifurcations. Although topology optimization holds great potential for tuning these nonlinear dynamic responses, its use in high-dimensional systems is limited by the high cost of repeated response and sensitivity analyses. To address this challenge, we employ the spectral submanifolds (SSMs) reduction theory, which reformulates the periodic response as the equilibria of an associated reduced-order model (ROM). This enables efficient and analytic evaluation of both response amplitudes and their sensitivities. Based on the SSM-based ROM, we formulate optimization problems that optimize the peak amplitude, the hardening/softening behavior, and the distance between two saddle-node bifurcations for an FRC. The proposed method is applied to the design of nonlinear MEMS devices, achieving targeted performance optimization. This framework provides a practical and efficient strategy for incorporating nonlinear dynamic effects into the topology optimization of structures.
comment: 26 pages, 12 figures. Submitted to Nonlinear Dynamics
Multi-Level Multi-Fidelity Methods for Path Integral and Safe Control
Sampling-based approaches are widely used in systems without analytic models to estimate risk or find optimal control. However, gathering sufficient data in such scenarios can be prohibitively costly. On the other hand, in many situations, low-fidelity models or simulators are available from which samples can be obtained at low cost. In this paper, we propose an efficient approach for risk quantification and path integral control that leverages such data from multiple models with heterogeneous sampling costs. A key technical novelty of our approach is the integration of Multi-level Monte Carlo (MLMC) and Multi-fidelity Monte Carlo (MFMC) that enable data from different time and state representations (system models) to be jointly used to reduce variance and improve sampling efficiency. We also provide theoretical analysis of the proposed method and show that our estimator is unbiased and consistent under mild conditions. Finally, we demonstrate via numerical simulation that the proposed method has improved computation (sampling costs) vs. accuracy trade-offs for risk quantification and path integral control.
Space Logistics Analysis and Incentive Design for Commercialization of Orbital Debris Remediation
As orbital debris continues to become a higher priority for the space industry, there is a need to explore how partnerships between the public and private space sector may aid in addressing this issue. This research develops a space logistics framework for planning orbital debris remediation missions, providing a quantitative basis for partnerships that are mutually beneficial between space operators and debris remediators. By integrating network-based space logistics and game theory, we illuminate the high-level costs of remediating orbital debris, and the surplus that stands to be shared as a result. These findings indicate significant progress toward the continued development of a safe, sustainable, and profitable space economy.
comment: 28 pages, 14 figures, Journal of Spacecraft and Rockets (Articles in Advance)
EB-MBD: Emerging-Barrier Model-Based Diffusion for Safe Trajectory Optimization in Highly Constrained Environments
We propose enforcing constraints on Model-Based Diffusion by introducing emerging barrier functions inspired by interior point methods. We show that constraints on Model-Based Diffusion can lead to catastrophic performance degradation, even on simple 2D systems due to sample inefficiency in the Monte Carlo approximation of the score function. We introduce Emerging-Barrier Model-Based Diffusion (EB-MBD) which uses progressively introduced barrier constraints to avoid these problems, significantly improving solution quality, without the need for computationally expensive operations such as projections. We analyze the sampling liveliness of samples each iteration to inform barrier parameter scheduling choice. We demonstrate results for 2D collision avoidance and a 3D underwater manipulator system and show that our method achieves lower cost solutions than Model-Based Diffusion, and requires orders of magnitude less computation time than projection based methods.
Some Reflections on Sliding Mode Designs in Control Systems: An Example of Adaptive Tracking Control for Simple Mechanical Systems With Friction Without Measurement of Velocity
The objective of this note is to share some reflections of the authors regarding the use of sliding mode designs in control systems. We believe the abundant, and ever increasing, appearance of this kind of works on our scientific publications deserves some critical evaluation of their actual role, relevance and pertinence. First, we discuss the procedure followed by most of these designs -- illustrated with examples from the literature. Second, we bring to the readers attention several aspects of the control problem, central in classical designs, which are disregarded in the sliding mode literature. Finally, to illustrate with an specific example our previous considerations, we compare the performance of two adaptive tracking controllers for a simple one degree of freedom mechanical systems with unknown parameters and static and Coulomb friction -- that do not rely on the measurement of velocity.
Optimal Control with Lyapunov Stability Guarantees for Space Applications
This paper investigates the infinite horizon optimal control problem (OCP) for space applications characterized by nonlinear dynamics. The proposed approach divides the problem into a finite horizon OCP with a regularized terminal cost, guiding the system towards a terminal set, and an infinite horizon linear regulation phase within this set. This strategy guarantees global asymptotic stability under specific assumptions. Our method maintains the system's fully nonlinear dynamics until it reaches the terminal set, where the system dynamics is linearized. As the terminal set converges to the origin, the difference in optimal cost incurred reduces to zero, guaranteeing an efficient and stable solution. The approach is tested through simulations on three problems: spacecraft attitude control, rendezvous maneuver, and soft landing. In spacecraft attitude control, we focus on achieving precise orientation and stabilization. For rendezvous maneuvers, we address the navigation of a chaser to meet a target spacecraft. For the soft landing problem, we ensure a controlled descent and touchdown on a planetary surface. We provide numerical results confirming the effectiveness of the proposed method in managing these nonlinear dynamics problems, offering robust solutions essential for successful space missions.
Joint Detection, Channel Estimation and Interference Nulling for Terrestrial-Satellite Downlink Co-Existence in the Upper Mid-Band
The upper mid-band FR3 spectrum (7-24 GHz) has garnered significant interest for future cellular services. However, utilizing a large portion of this band requires careful interference coordination with incumbent satellite systems. This paper investigates interference from high-power terrestrial base stations (TN-BSs) to satellite downlink receivers. A central challenge is that the victim receivers, i.e., ground-based non-terrestrial user equipment (NTN-UEs) such as satellite customer premises equipment, must first be detected and their channels estimated before the TN-BS can effectively place nulls in their directions. We explore a potential solution where NTN-UEs periodically transmit preambles or beacon signals that TN-BSs can use for detection and channel estimation. The performance of this nulling approach is analyzed in a simplified scenario with a single victim, revealing the interplay between path loss and estimation quality in determining nulling performance. To further validate the method, we conduct a detailed multi-user site-specific ray-tracing (RT) simulation in a rural environment. The results show that the proposed nulling approach is effective under realistic parameters, even with high densities of victim units, although TN-BS may require a substantial number of antennas.
comment: Accepted for publication in the Proceedings of GlobeCom 2025
Whole Body Model Predictive Control for Spin-Aware Quadrupedal Table Tennis ICRA 2026
Developing table tennis robots that mirror human speed, accuracy, and ability to predict and respond to the full range of ball spins remains a significant challenge for legged robots. To demonstrate these capabilities we present a system to play dynamic table tennis for quadrupedal robots that integrates high speed perception, trajectory prediction, and agile control. Our system uses external cameras for high-speed ball localization, physical models with learned residuals to infer spin and predict trajectories, and a novel model predictive control (MPC) formulation for agile full-body control. Notably, a continuous set of stroke strategies emerge automatically from different ball return objectives using this control paradigm. We demonstrate our system in the real world on a Spot quadruped, evaluate accuracy of each system component, and exhibit coordination through the system's ability to aim and return balls with varying spin types. As a further demonstration, the system is able to rally with human players.
comment: Submitted to appear in IEEE ICRA 2026
When to Reason: Semantic Router for vLLM NeurIPS 2025
Large Language Models (LLMs) demonstrate substantial accuracy gains when augmented with reasoning modes such as chain-of-thought and inference-time scaling. However, reasoning also incurs significant costs in inference latency and token usage, with environmental and financial impacts, which are unnecessary for many simple prompts. We present a semantic router that classifies queries based on their reasoning requirements and selectively applies reasoning only when beneficial. Our approach achieves a 10.2 percentage point improvement in accuracy on the MMLU-Pro benchmark while reducing response latency by 47.1% and token consumption by 48.5% compared to direct inference with vLLM. These results demonstrate that semantic routing offers an effective mechanism for striking a balance between accuracy and efficiency in open-source LLM serving systems
comment: 5 pages, excluding references and appendix. To be appeared at Workshop on ML for Systems at NeurIPS 2025, December 6, 2025 https://mlforsystems.org/
Identification and optimal control strategies for the transversal splitting of ultra--cold Bose gases
Splitting a Bose--Einstein condensate (BEC) is a key operation in fundamental physics experiments and emerging quantum technologies, where precise preparation of well--defined initial states requires fast yet coherent control of the condensate's nonlinear dynamics. This work formulates the BEC splitting process as an optimal feedforward control problem based on a physically interpretable, reduced--order model identified from limited experimental data. We introduce a systematic calibration strategy that combines optimal experiment selection and constrained nonlinear parameter estimation, enabling accurate system identification with minimal experimental overhead. Using this calibrated model, we compute energy--optimal trajectories via indirect optimal control to realize shortcuts to adiabaticity (STAs), achieving rapid transitions to the ground state of a double--well potential while suppressing excitations. Experiments confirm that the proposed control framework yields high--fidelity state transfers across multiple configurations, demonstrating its robustness and scalability for quantum control applications.
Resilient Multi-Dimensional Consensus and Distributed Optimization against Agent-Based and Denial-of-Service Attacks
In this paper, we consider the resilient multi-dimensional consensus and distributed optimization problems of multi-agent systems (MASs) in the presence of both agent-based and denial-of-service (DoS) attacks. The considered agent-based attacks can cover malicious, Byzantine, and stubborn agents. The links between agents in the network can be blocked by DoS attacks, which may lead the digraph to be time-varying and even disconnected. The objective is to ensure that the remaining benign agents achieve consensus. To this end, an "auxiliary point"-based resilient control algorithm is proposed for MASs. Under the proposed algorithm, each healthy agent constructs a "safe kernel" utilizing the states of its in-neighbors and updates its state toward a specific point within this kernel at each iteration. If an agent cannot receive its neighbors' states owing to DoS attacks, it will use the states received immediately before the DoS period. Moreover, a resilient multi-dimensional distributed optimization (RMDO) algorithm is also proposed. Theoretical proofs and numerical examples are presented to demonstrate the effectiveness of the proposed algorithms.
Optimization via a Control-Centric Framework
Optimization plays a central role in intelligent systems and cyber-physical technologies, where the speed and reliability of convergence directly impact performance. In control theory, optimization-centric methods are standard: controllers are designed by repeatedly solving optimization problems, as in linear quadratic regulation, $H_\infty$ control, and model predictive control. In contrast, this paper develops a control-centric framework for optimization itself, where algorithms are constructed directly from Lyapunov stability principles rather than being proposed first and analyzed afterward. A key element is the stationarity vector, which encodes first-order optimality conditions and enables Lyapunov-based convergence analysis. By pairing a Lyapunov function with a selectable decay law, we obtain continuous-time dynamics with guaranteed exponential, finite-time, fixed-time, or prescribed-time convergence. Within this framework, we introduce three feedback realizations of increasing restrictiveness: the Hessian-gradient, Newton, and gradient dynamics. Each realization shapes the decay of the stationarity vector to achieve the desired rate. These constructions unify unconstrained optimization, extend naturally to constrained problems via Lyapunov-consistent primal-dual dynamics, and broaden the results for minimax and generalized Nash equilibrium seeking problems beyond exponential stability. The framework provides systematic design tools for optimization algorithms in control and game-theoretic problems.
comment: This work has been submitted to the IEEE for possible publication. 12 pages, 3 figures
Optimal control of continuous-time symmetric systems with unknown dynamics and noisy measurements
An iterative learning algorithm is presented for continuous-time linear-quadratic optimal control problems where the system is externally symmetric with unknown dynamics. Both finite-horizon and infinite-horizon problems are considered. It is shown that the proposed algorithm is globally convergent to the optimal solution and has some advantages over adaptive dynamic programming, including being unbiased under noisy measurements and having a relatively low computational burden. Numerical experiments show the effectiveness of the results.
Product-oriented Product-Process-Resource Asset Network and its Representation in AutomationML for Asset Administration Shell
Current products, especially in the automotive sector, pose complex technical systems having a multi-disciplinary mechatronic nature. Industrial standards supporting system engineering and production typically (i) address the production phase only, but do not cover the complete product life cycle, and (ii) focus on production processes and resources rather than the products themselves. The presented approach is motivated by incorporating the impacts of the end-of-life phase of the product life cycle into the engineering phase. This paper proposes a modeling approach coming up from the Product-Process-Resource (PPR) modeling paradigm. It combines requirements on (i) respecting the product structure as a basis for the model, and (ii) incorporates repairing, remanufacturing, or upcycling within cyber-physical production systems. The proposed model called PoPAN should accompany the product during the entire life cycle as a digital shadow encapsulated within the Asset Administration Shell of a product. To facilitate the adoption of the proposed paradigm, the paper also proposes serialization of the model in the AutomationML data format. The model is demonstrated on a use-case for disassembling electric vehicle batteries to support their remanufacturing for stationary battery applications.
comment: \copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Hierarchical Reinforcement Learning with Low-Level MPC for Multi-Agent Control
Achieving safe and coordinated behavior in dynamic, constraint-rich environments remains a major challenge for learning-based control. Pure end-to-end learning often suffers from poor sample efficiency and limited reliability, while model-based methods depend on predefined references and struggle to generalize. We propose a hierarchical framework that combines tactical decision-making via reinforcement learning (RL) with low-level execution through Model Predictive Control (MPC). For the case of multi-agent systems this means that high-level policies select abstract targets from structured regions of interest (ROIs), while MPC ensures dynamically feasible and safe motion. Tested on a predator-prey benchmark, our approach outperforms end-to-end and shielding-based RL baselines in terms of reward, safety, and consistency, underscoring the benefits of combining structured learning with model-based control.
Revisiting Functional Derivatives in Multi-object Tracking
Probability generating functionals (PGFLs) are efficient and powerful tools for tracking independent objects in clutter. It was shown that PGFLs could be used for the elegant derivation of practical multi-object tracking algorithms, e.g., the probability hypothesis density (PHD) filter. However, derivations using PGFLs use the so-called functional derivatives whose definitions usually appear too complicated or heuristic, involving Dirac delta ``functions''. This paper begins by comparing different definitions of functional derivatives and exploring their relationships and implications for practical applications. It then proposes a rigorous definition of the functional derivative, utilizing straightforward yet precise mathematics for clarity. Key properties of the functional derivative are revealed and discussed.
comment: submitted to IEEE Transactions on Information Theory
Carleman-Fourier linearization of nonlinear real dynamical systems with quasi-periodic fields
This paper presents Carleman-Fourier linearization for analyzing nonlinear real dynamical systems with periodic vector fields. Using Fourier basis functions, this novel framework transforms such dynamical systems into equivalent infinite-dimensional linear dynamical systems. In this paper, we establish the exponential convergence of the primary block in the finite-section approximation of this linearized system to the state vector of the original nonlinear system. To showcase the efficacy of our approach, we apply it to the Kuramoto model, a prominent model for coupled oscillators. The results demonstrate promising accuracy in approximating the original system's behavior.
comment: Discrete and Continuous Dynamical Systems Series B, accepted
Foundation Models for Structural Health Monitoring
Structural Health Monitoring (SHM) is a critical task for ensuring the safety and reliability of civil infrastructures, typically realized on bridges and viaducts by means of vibration monitoring. In this paper, we propose for the first time the use of Transformer neural networks, with a Masked Auto-Encoder architecture, as Foundation Models for SHM. We demonstrate the ability of these models to learn generalizable representations from multiple large datasets through self-supervised pre-training, which, coupled with task-specific fine-tuning, allows them to outperform state-of-the-art traditional methods on diverse tasks, including Anomaly Detection (AD) and Traffic Load Estimation (TLE). We then extensively explore model size versus accuracy trade-offs and experiment with Knowledge Distillation (KD) to improve the performance of smaller Transformers, enabling their embedding directly into the SHM edge nodes. We showcase the effectiveness of our foundation models using data from three operational viaducts. For AD, we achieve a near-perfect 99.9% accuracy with a monitoring time span of just 15 windows. In contrast, a state-of-the-art method based on Principal Component Analysis (PCA) obtains its first good result (95.03% accuracy), only considering 120 windows. On two different TLE tasks, our models obtain state-of-the-art performance on multiple evaluation metrics (R$^2$ score, MAE% and MSE%). On the first benchmark, we achieve an R$^2$ score of 0.97 and 0.90 for light and heavy vehicle traffic, respectively, while the best previous approach (a Random Forest) stops at 0.91 and 0.84. On the second one, we achieve an R$^2$ score of 0.54 versus the 0.51 of the best competitor method, a Long-Short Term Memory network.
comment: 17 pages, 6 tables, 9 figures
A Long-Duration Autonomy Approach to Connected and Automated Vehicles
In this article, we present a long-duration autonomy approach for the control of connected and automated vehicles (CAVs) operating in a transportation network. In particular, we focus on the performance of CAVs at traffic bottlenecks, including roundabouts, merging roadways, and intersections. We take a principled approach based on optimal control, and derive a reactive controller with guarantees on safety, performance, and energy efficiency. We guarantee safety through high order control barrier functions (HOCBFs), which we ``lift'' to first order CBFs using time-optimal motion primitives. This yields a set of first-order CBFs that are compatible with the control bounds. We demonstrate the performance of our approach in simulation and compare it to an optimal control-based approach.
comment: 8 pages, 3 figures
MARLIN: Multi-Agent Reinforcement Learning with Murmuration Intelligence and LLM Guidance for Reservoir Management
As climate change intensifies extreme weather events, water disasters pose growing threats to global communities, making adaptive reservoir management critical for protecting vulnerable populations and ensuring water security. Modern water resource management faces unprecedented challenges from cascading uncertainties propagating through interconnected reservoir networks. These uncertainties, rooted in physical water transfer losses and environmental variability, make precise control difficult. For example, sending 10 tons downstream may yield only 8-12 tons due to evaporation and seepage. Traditional centralized optimization approaches suffer from exponential computational complexity and cannot effectively handle such real-world uncertainties, while existing multi-agent reinforcement learning (MARL) methods fail to achieve effective coordination under uncertainty. To address these challenges, we present MARLIN, a decentralized reservoir management framework inspired by starling murmurations intelligence. Integrating bio-inspired alignment, separation, and cohesion rules with MARL, MARLIN enables individual reservoirs to make local decisions while achieving emergent global coordination. In addition, a LLM provides real-time reward shaping signals, guiding agents to adapt to environmental changes and human-defined preferences. Experiments on real-world USGS data show that MARLIN improves uncertainty handling by 23\%, cuts computation by 35\%, and accelerates flood response by 68\%, exhibiting super-linear coordination, with complexity scaling 5.4x from 400 to 10,000 nodes. These results demonstrate MARLIN's potential for disaster prevention and protecting communities through intelligent, scalable water resource management.
A Survey of Foundation Models for IoT: Taxonomy and Criteria-Based Analysis
Foundation models have gained growing interest in the IoT domain due to their reduced reliance on labeled data and strong generalizability across tasks, which address key limitations of traditional machine learning approaches. However, most existing foundation model based methods are developed for specific IoT tasks, making it difficult to compare approaches across IoT domains and limiting guidance for applying them to new tasks. This survey aims to bridge this gap by providing a comprehensive overview of current methodologies and organizing them around four shared performance objectives by different domains: efficiency, context-awareness, safety, and security & privacy. For each objective, we review representative works, summarize commonly-used techniques and evaluation metrics. This objective-centric organization enables meaningful cross-domain comparisons and offers practical insights for selecting and designing foundation model based solutions for new IoT tasks. We conclude with key directions for future research to guide both practitioners and researchers in advancing the use of foundation models in IoT applications.
comment: Accepted by CCF Transactions on Pervasive Computing and Interaction (CCF TPCI)
Fast Online Adaptive Neural MPC via Meta-Learning
Data-driven model predictive control (MPC) has demonstrated significant potential for improving robot control performance in the presence of model uncertainties. However, existing approaches often require extensive offline data collection and computationally intensive training, limiting their ability to adapt online. To address these challenges, this paper presents a fast online adaptive MPC framework that leverages neural networks integrated with Model-Agnostic Meta-Learning (MAML). Our approach focuses on few-shot adaptation of residual dynamics - capturing the discrepancy between nominal and true system behavior - using minimal online data and gradient steps. By embedding these meta-learned residual models into a computationally efficient L4CasADi-based MPC pipeline, the proposed method enables rapid model correction, enhances predictive accuracy, and improves real-time control performance. We validate the framework through simulation studies on a Van der Pol oscillator, a Cart-Pole system, and a 2D quadrotor. Results show significant gains in adaptation speed and prediction accuracy over both nominal MPC and nominal MPC augmented with a freshly initialized neural network, underscoring the effectiveness of our approach for real-time adaptive robot control.
Characterizing and Optimizing Real-Time Optimal Control for Embedded SoCs
Resource-limited robots face significant challenges in executing computationally intensive tasks, such as locomotion and manipulation, particularly for real-time optimal control algorithms like Model Predictive Control (MPC). This paper provides a comprehensive design space exploration to identify optimal hardware computation architectures for these demanding model-based control algorithms. We profile and optimize representative architectural designs, including general-purpose scalar CPUs, vector processors, and specialized accelerators. By characterizing kernel-level benchmarks and end-to-end robotic scenarios, including a hardware-in-the-loop evaluation on a fabricated RISC-V multi-core vector SoC, we present a quantitative comparison of performance, area, and utilization across distinct architectural design points. Our findings demonstrate that targeted architectural modifications, coupled with deep software and system optimizations, enable up to 3.71x speedups for MPC, resulting in up to 27% system-level power reductions while completing robotic tasks. Finally, we propose a code generation flow designed to simplify the complex engineering effort required for mapping robotic workloads onto specialized architectures.
Hybrid Feedback Control for Global Navigation with Locally Optimal Obstacle Avoidance in n-Dimensional Spaces
We present a hybrid feedback control framework for autonomous robot navigation in n-dimensional Euclidean spaces cluttered with spherical obstacles. The proposed approach ensures safe and global navigation towards a target location by dynamically switching between two operational modes: motion-to-destination and locally optimal obstacle-avoidance. It produces continuous velocity inputs, ensures collision-free trajectories and generates locally optimal obstacle avoidance maneuvers. Unlike existing methods, the proposed framework is compatible with range sensors, enabling navigation in both a priori known and unknown environments. Extensive simulations in 2D and 3D settings, complemented by experimental validation on a TurtleBot 4 platform, confirm the efficacy and robustness of the approach. Our results demonstrate shorter paths and smoother trajectories compared to state-of-the-art methods, while maintaining computational efficiency and real-world feasibility.
Safe Autonomous Environmental Contact for Soft Robots using Control Barrier Functions
Robots built from soft materials will inherently apply lower environmental forces than their rigid counterparts, and therefore may be more suitable in sensitive settings with unintended contact. However, these robots' applied forces result from both their design and their control system in closed-loop, and therefore, ensuring bounds on these forces requires controller synthesis for safety as well. This article introduces the first feedback controller for a soft manipulator that formally meets a safety specification with respect to environmental contact. In our proof-of-concept setting, the robot's environment has known geometry and is deformable with a known elastic modulus. Our approach maps a bound on applied forces to a safe set of positions of the robot's tip via predicted deformations of the environment. Then, a quadratic program with Control Barrier Functions in its constraints is used to supervise a nominal feedback signal, verifiably maintaining the robot's tip within this safe set. Hardware experiments on a multi-segment soft pneumatic robot demonstrate that the proposed framework successfully maintains a positive safety margin. This framework represents a fundamental shift in perspective on control and safety for soft robots, implementing a formally verifiable logic specification on their pose and contact forces.
comment: 8 pages, 9 figures
Observability for Nonlinear Systems: Connecting Variational Dynamics, Lyapunov Exponents, and Empirical Gramians
Observability quantification is a key problem in dynamic network sciences. While it has been thoroughly studied for linear systems, observability quantification for nonlinear networks is less intuitive and more cumbersome. One common approach to quantify observability for nonlinear systems is via the Empirical Gramian (Empr-Gram) -- a generalized form of the Gramian of linear systems. In this paper, we produce three new results. First, we establish that a variational form of discrete-time autonomous nonlinear systems yields a so-called Variational Gramian (Var-Gram) that is equivalent to the classic Empr-Gram; the former being easier to compute than the latter. Via Lyapunov exponents derived from Lyapunov's direct method, the paper's second result derives connections between existing observability measures and Var-Gram. The third result demonstrates the applicability of these new notions for sensor selection/placement in nonlinear systems. Numerical case studies demonstrate these three developments and their merits.
Robotics
BLAZER: Bootstrapping LLM-based Manipulation Agents with Zero-Shot Data Generation
Scaling data and models has played a pivotal role in the remarkable progress of computer vision and language. Inspired by these domains, recent efforts in robotics have similarly focused on scaling both data and model size to develop more generalizable and robust policies. However, unlike vision and language, robotics lacks access to internet-scale demonstrations across diverse robotic tasks and environments. As a result, the scale of existing datasets typically suffers from the need for manual data collection and curation. To address this problem, here we propose BLAZER, a framework that learns manipulation policies from automatically generated training data. We build on the zero-shot capabilities of LLM planners and automatically generate demonstrations for diverse manipulation tasks in simulation. Successful examples are then used to finetune an LLM and to improve its planning capabilities without human supervision. Notably, while BLAZER training requires access to the simulator's state, we demonstrate direct transfer of acquired skills to sensor-based manipulation. Through extensive experiments, we show BLAZER to significantly improve zero-shot manipulation in both simulated and real environments. Moreover, BLAZER improves on tasks outside of its training pool and enables downscaling of LLM models. Our code and data will be made publicly available on the project page.
comment: 11 pages, 8 figures
Scalable Offline Metrics for Autonomous Driving IROS 2025
Real-World evaluation of perception-based planning models for robotic systems, such as autonomous vehicles, can be safely and inexpensively conducted offline, i.e., by computing model prediction error over a pre-collected validation dataset with ground-truth annotations. However, extrapolating from offline model performance to online settings remains a challenge. In these settings, seemingly minor errors can compound and result in test-time infractions or collisions. This relationship is understudied, particularly across diverse closed-loop metrics and complex urban maneuvers. In this work, we revisit this undervalued question in policy evaluation through an extensive set of experiments across diverse conditions and metrics. Based on analysis in simulation, we find an even worse correlation between offline and online settings than reported by prior studies, casting doubts on the validity of current evaluation practices and metrics for driving policies. Next, we bridge the gap between offline and online evaluation. We investigate an offline metric based on epistemic uncertainty, which aims to capture events that are likely to cause errors in closed-loop settings. The resulting metric achieves over 13% improvement in correlation compared to previous offline metrics. We further validate the generalization of our findings beyond the simulation environment in real-world settings, where even greater gains are observed.
comment: Accepted at IROS 2025 (IEEE/RSJ International Conference on Intelligent Robots and Systems)
NovaFlow: Zero-Shot Manipulation via Actionable Flow from Generated Videos
Enabling robots to execute novel manipulation tasks zero-shot is a central goal in robotics. Most existing methods assume in-distribution tasks or rely on fine-tuning with embodiment-matched data, limiting transfer across platforms. We present NovaFlow, an autonomous manipulation framework that converts a task description into an actionable plan for a target robot without any demonstrations. Given a task description, NovaFlow synthesizes a video using a video generation model and distills it into 3D actionable object flow using off-the-shelf perception modules. From the object flow, it computes relative poses for rigid objects and realizes them as robot actions via grasp proposals and trajectory optimization. For deformable objects, this flow serves as a tracking objective for model-based planning with a particle-based dynamics model. By decoupling task understanding from low-level control, NovaFlow naturally transfers across embodiments. We validate on rigid, articulated, and deformable object manipulation tasks using a table-top Franka arm and a Spot quadrupedal mobile robot, and achieve effective zero-shot execution without demonstrations or embodiment-specific training. Project website: https://novaflow.lhy.xyz/.
ResAD: Normalized Residual Trajectory Modeling for End-to-End Autonomous Driving
End-to-end autonomous driving (E2EAD) systems, which learn to predict future trajectories directly from sensor data, are fundamentally challenged by the inherent spatio-temporal imbalance of trajectory data. This imbalance creates a significant optimization burden, causing models to learn spurious correlations instead of causal inference, while also prioritizing uncertain, distant predictions, thereby compromising immediate safety. To address these issues, we propose ResAD, a novel Normalized Residual Trajectory Modeling framework. Instead of predicting the future trajectory directly, our approach reframes the learning task to predict the residual deviation from a deterministic inertial reference. The inertial reference serves as a counterfactual, forcing the model to move beyond simple pattern recognition and instead identify the underlying causal factors (e.g., traffic rules, obstacles) that necessitate deviations from a default, inertially-guided path. To deal with the optimization imbalance caused by uncertain, long-term horizons, ResAD further incorporates Point-wise Normalization of the predicted residual. It re-weights the optimization objective, preventing large-magnitude errors associated with distant, uncertain waypoints from dominating the learning signal. Extensive experiments validate the effectiveness of our framework. On the NAVSIM benchmark, ResAD achieves a state-of-the-art PDMS of 88.6 using a vanilla diffusion policy with only two denoising steps, demonstrating that our approach significantly simplifies the learning task and improves model performance. The code will be released to facilitate further research.
DexNDM: Closing the Reality Gap for Dexterous In-Hand Rotation via Joint-Wise Neural Dynamics Model
Achieving generalized in-hand object rotation remains a significant challenge in robotics, largely due to the difficulty of transferring policies from simulation to the real world. The complex, contact-rich dynamics of dexterous manipulation create a "reality gap" that has limited prior work to constrained scenarios involving simple geometries, limited object sizes and aspect ratios, constrained wrist poses, or customized hands. We address this sim-to-real challenge with a novel framework that enables a single policy, trained in simulation, to generalize to a wide variety of objects and conditions in the real world. The core of our method is a joint-wise dynamics model that learns to bridge the reality gap by effectively fitting limited amount of real-world collected data and then adapting the sim policy's actions accordingly. The model is highly data-efficient and generalizable across different whole-hand interaction distributions by factorizing dynamics across joints, compressing system-wide influences into low-dimensional variables, and learning each joint's evolution from its own dynamic profile, implicitly capturing these net effects. We pair this with a fully autonomous data collection strategy that gathers diverse, real-world interaction data with minimal human intervention. Our complete pipeline demonstrates unprecedented generality: a single policy successfully rotates challenging objects with complex shapes (e.g., animals), high aspect ratios (up to 5.33), and small sizes, all while handling diverse wrist orientations and rotation axes. Comprehensive real-world evaluations and a teleoperation application for complex tasks validate the effectiveness and robustness of our approach. Website: https://meowuu7.github.io/DexNDM/
comment: Project Website: https://meowuu7.github.io/DexNDM/ Video: https://youtu.be/tU2Mv8vWftU
Dream to Recall: Imagination-Guided Experience Retrieval for Memory-Persistent Vision-and-Language Navigation
Vision-and-Language Navigation (VLN) requires agents to follow natural language instructions through environments, with memory-persistent variants demanding progressive improvement through accumulated experience. Existing approaches for memory-persistent VLN face critical limitations: they lack effective memory access mechanisms, instead relying on entire memory incorporation or fixed-horizon lookup, and predominantly store only environmental observations while neglecting navigation behavioral patterns that encode valuable decision-making strategies. We present Memoir, which employs imagination as a retrieval mechanism grounded by explicit memory: a world model imagines future navigation states as queries to selectively retrieve relevant environmental observations and behavioral histories. The approach comprises: 1) a language-conditioned world model that imagines future states serving dual purposes: encoding experiences for storage and generating retrieval queries; 2) Hybrid Viewpoint-Level Memory that anchors both observations and behavioral patterns to viewpoints, enabling hybrid retrieval; and 3) an experience-augmented navigation model that integrates retrieved knowledge through specialized encoders. Extensive evaluation across diverse memory-persistent VLN benchmarks with 10 distinctive testing scenarios demonstrates Memoir's effectiveness: significant improvements across all scenarios, with 5.4% SPL gains on IR2R over the best memory-persistent baseline, accompanied by 8.3x training speedup and 74% inference memory reduction. The results validate that predictive retrieval of both environmental and behavioral memories enables more effective navigation, with analysis indicating substantial headroom (73.3% vs 93.4% upper bound) for this imagination-guided paradigm. Code at https://github.com/xyz9911/Memoir.
comment: 14 pages, 6 figures, 13 tables
R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation
Towards the aim of generalized robotic manipulation, spatial generalization is the most fundamental capability that requires the policy to work robustly under different spatial distribution of objects, environment and agent itself. To achieve this, substantial human demonstrations need to be collected to cover different spatial configurations for training a generalized visuomotor policy via imitation learning. Prior works explore a promising direction that leverages data generation to acquire abundant spatially diverse data from minimal source demonstrations. However, most approaches face significant sim-to-real gap and are often limited to constrained settings, such as fixed-base scenarios and predefined camera viewpoints. In this paper, we propose a real-to-real 3D data generation framework (R2RGen) that directly augments the pointcloud observation-action pairs to generate real-world data. R2RGen is simulator- and rendering-free, thus being efficient and plug-and-play. Specifically, given a single source demonstration, we introduce an annotation mechanism for fine-grained parsing of scene and trajectory. A group-wise augmentation strategy is proposed to handle complex multi-object compositions and diverse task constraints. We further present camera-aware processing to align the distribution of generated data with real-world 3D sensor. Empirically, R2RGen substantially enhances data efficiency on extensive experiments and demonstrates strong potential for scaling and application on mobile manipulation.
comment: Project page: https://r2rgen.github.io/
Have We Scene It All? Scene Graph-Aware Deep Point Cloud Compression
Efficient transmission of 3D point cloud data is critical for advanced perception in centralized and decentralized multi-agent robotic systems, especially nowadays with the growing reliance on edge and cloud-based processing. However, the large and complex nature of point clouds creates challenges under bandwidth constraints and intermittent connectivity, often degrading system performance. We propose a deep compression framework based on semantic scene graphs. The method decomposes point clouds into semantically coherent patches and encodes them into compact latent representations with semantic-aware encoders conditioned by Feature-wise Linear Modulation (FiLM). A folding-based decoder, guided by latent features and graph node attributes, enables structurally accurate reconstruction. Experiments on the SemanticKITTI and nuScenes datasets show that the framework achieves state-of-the-art compression rates, reducing data size by up to 98% while preserving both structural and semantic fidelity. In addition, it supports downstream applications such as multi-robot pose graph optimization and map merging, achieving trajectory accuracy and map alignment comparable to those obtained with raw LiDAR scans.
comment: Accepted for publication in IEEE Robotics and Automation Letters (RA-L). 8 pages, 6 figures
DexMan: Learning Bimanual Dexterous Manipulation from Human and Generated Videos
We present DexMan, an automated framework that converts human visual demonstrations into bimanual dexterous manipulation skills for humanoid robots in simulation. Operating directly on third-person videos of humans manipulating rigid objects, DexMan eliminates the need for camera calibration, depth sensors, scanned 3D object assets, or ground-truth hand and object motion annotations. Unlike prior approaches that consider only simplified floating hands, it directly controls a humanoid robot and leverages novel contact-based rewards to improve policy learning from noisy hand-object poses estimated from in-the-wild videos. DexMan achieves state-of-the-art performance in object pose estimation on the TACO benchmark, with absolute gains of 0.08 and 0.12 in ADD-S and VSD. Meanwhile, its reinforcement learning policy surpasses previous methods by 19% in success rate on OakInk-v2. Furthermore, DexMan can generate skills from both real and synthetic videos, without the need for manual data collection and costly motion capture, and enabling the creation of large-scale, diverse datasets for training generalist dexterous manipulation.
comment: Video results are available at: https://embodiedai-ntu.github.io/dexman/index.html
Don't Run with Scissors: Pruning Breaks VLA Models but They Can Be Recovered
Vision-Language-Action (VLA) models have advanced robotic capabilities but remain challenging to deploy on resource-limited hardware. Pruning has enabled efficient compression of large language models (LLMs), yet it is largely understudied in robotics. Surprisingly, we observe that pruning VLA models leads to drastic degradation and increased safety violations. We introduce GLUESTICK, a post-pruning recovery method that restores much of the original model's functionality while retaining sparsity benefits. Our method performs a one-time interpolation between the dense and pruned models in weight-space to compute a corrective term. This correction is used during inference by each pruned layer to recover lost capabilities with minimal overhead. GLUESTICK requires no additional training, is agnostic to the pruning algorithm, and introduces a single hyperparameter that controls the tradeoff between efficiency and accuracy. Across diverse VLA architectures and tasks in manipulation and navigation, GLUESTICK achieves competitive memory efficiency while substantially recovering success rates and reducing safety violations. Additional material can be found at: https://gluestick-vla.github.io/.
Gaze on the Prize: Shaping Visual Attention with Return-Guided Contrastive Learning
Visual Reinforcement Learning (RL) agents must learn to act based on high-dimensional image data where only a small fraction of the pixels is task-relevant. This forces agents to waste exploration and computational resources on irrelevant features, leading to sample-inefficient and unstable learning. To address this, inspired by human visual foveation, we introduce Gaze on the Prize. This framework augments visual RL with a learnable foveal attention mechanism (Gaze), guided by a self-supervised signal derived from the agent's experience pursuing higher returns (the Prize). Our key insight is that return differences reveal what matters most: If two similar representations produce different outcomes, their distinguishing features are likely task-relevant, and the gaze should focus on them accordingly. This is realized through return-guided contrastive learning that trains the attention to distinguish between the features relevant to success and failure. We group similar visual representations into positives and negatives based on their return differences and use the resulting labels to construct contrastive triplets. These triplets provide the training signal that teaches the attention mechanism to produce distinguishable representations for states associated with different outcomes. Our method achieves up to 2.4x improvement in sample efficiency and can solve tasks that the baseline fails to learn, demonstrated across a suite of manipulation tasks from the ManiSkill3 benchmark, all without modifying the underlying algorithm or hyperparameters.
comment: Project page: https://andrewcwlee.github.io/gaze-on-the-prize
Validation of collision-free spheres of Stewart-Gough platforms for constant orientations using the Application Programming Interface of a CAD software
This paper presents a method of validation of the size of the largest collision-free sphere (CFS) of a 6-6 Stewart-Gough platform manipulator (SGPM) for a given orientation of its moving platform (MP) using the Application Programming Interface (API) of a CAD software. The position of the MP is updated via the API in an automated manner over a set of samples within a shell enclosing the surface of the CFS. For each pose of the manipulator, each pair of legs is investigated for mutual collisions. The CFS is considered safe or validated iff none of the points falling inside the CFS lead to a collision between any pair of legs. This approach can not only validate the safety of a precomputed CFS, but also estimate the same for any spatial parallel manipulator.
Reliability of Single-Level Equality-Constrained Inverse Optimal Control
Inverse optimal control (IOC) allows the retrieval of optimal cost function weights, or behavioral parameters, from human motion. The literature on IOC uses methods that are either based on a slow bilevel process or a fast but noise-sensitive minimization of optimality condition violation. Assuming equality-constrained optimal control models of human motion, this article presents a faster but robust approach to solving IOC using a single-level reformulation of the bilevel method and yields equivalent results. Through numerical experiments in simulation, we analyze the robustness to noise of the proposed single-level reformulation to the bilevel IOC formulation with a human-like planar reaching task that is used across recent studies. The approach shows resilience to very large levels of noise and reduces the computation time of the IOC on this task by a factor of 15 when compared to a classical bilevel implementation.
comment: 8 pages, 3 figures
Airy: Reading Robot Intent through Height and Sky
As industrial robots move into shared human spaces, their opaque decision making threatens safety, trust, and public oversight. This artwork, Airy, asks whether complex multi agent AI can become intuitively understandable by staging a competition between two reinforcement trained robot arms that snap a bedsheet skyward. Building on three design principles, competition as a clear metric (who lifts higher), embodied familiarity (audiences recognize fabric snapping), and sensor to sense mapping (robot cooperation or rivalry shown through forest and weather projections), the installation gives viewers a visceral way to read machine intent. Observations from five international exhibitions indicate that audiences consistently read the robots' strategies, conflict, and cooperation in real time, with emotional reactions that mirror the system's internal state. The project shows how sensory metaphors can turn a black box into a public interface.
Co-design is powerful and not free
Robotic performance emerges from the coupling of body and controller, yet it remains unclear when morphology-control co-design is necessary. We present a unified framework that embeds morphology and control parameters within a single neural network, enabling end-to-end joint optimization. Through case studies in static-obstacle-constrained reaching, we evaluate trajectory error, success rate, and collision probability. The results show that co-design provides clear benefits when morphology is poorly matched to the task, such as near obstacles or workspace boundaries, where structural adaptation simplifies control. Conversely, when the baseline morphology already affords sufficient capability, control-only optimization often matches or exceeds co-design. By clarifying when control is enough and when it is not, this work advances the understanding of embodied intelligence and offers practical guidance for embodiment-aware robot design.
A Multimodal Depth-Aware Method For Embodied Reference Understanding
Embodied Reference Understanding requires identifying a target object in a visual scene based on both language instructions and pointing cues. While prior works have shown progress in open-vocabulary object detection, they often fail in ambiguous scenarios where multiple candidate objects exist in the scene. To address these challenges, we propose a novel ERU framework that jointly leverages LLM-based data augmentation, depth-map modality, and a depth-aware decision module. This design enables robust integration of linguistic and embodied cues, improving disambiguation in complex or cluttered environments. Experimental results on two datasets demonstrate that our approach significantly outperforms existing baselines, achieving more accurate and reliable referent detection.
Evaluation of a Robust Control System in Real-World Cable-Driven Parallel Robots
This study evaluates the performance of classical and modern control methods for real-world Cable-Driven Parallel Robots (CDPRs), focusing on underconstrained systems with limited time discretization. A comparative analysis is conducted between classical PID controllers and modern reinforcement learning algorithms, including Deep Deterministic Policy Gradient (DDPG), Proximal Policy Optimization (PPO), and Trust Region Policy Optimization (TRPO). The results demonstrate that TRPO outperforms other methods, achieving the lowest root mean square (RMS) errors across various trajectories and exhibiting robustness to larger time intervals between control updates. TRPO's ability to balance exploration and exploitation enables stable control in noisy, real-world environments, reducing reliance on high-frequency sensor feedback and computational demands. These findings highlight TRPO's potential as a robust solution for complex robotic control tasks, with implications for dynamic environments and future applications in sensor fusion or hybrid control strategies.
NavSpace: How Navigation Agents Follow Spatial Intelligence Instructions
Instruction-following navigation is a key step toward embodied intelligence. Prior benchmarks mainly focus on semantic understanding but overlook systematically evaluating navigation agents' spatial perception and reasoning capabilities. In this work, we introduce the NavSpace benchmark, which contains six task categories and 1,228 trajectory-instruction pairs designed to probe the spatial intelligence of navigation agents. On this benchmark, we comprehensively evaluate 22 navigation agents, including state-of-the-art navigation models and multimodal large language models. The evaluation results lift the veil on spatial intelligence in embodied navigation. Furthermore, we propose SNav, a new spatially intelligent navigation model. SNav outperforms existing navigation agents on NavSpace and real robot tests, establishing a strong baseline for future work.
Accurate and Noise-Tolerant Extraction of Routine Logs in Robotic Process Automation (Extended Version)
Robotic Process Mining focuses on the identification of the routine types performed by human resources through a User Interface. The ultimate goal is to discover routine-type models to enable robotic process automation. The discovery of routine-type models requires the provision of a routine log. Unfortunately, the vast majority of existing works do not directly focus on enabling the model discovery, limiting themselves to extracting the set of actions that are part of the routines. They were also not evaluated in scenarios characterized by inconsistent routine execution, hereafter referred to as noise, which reflects natural variability and occasional errors in human performance. This paper presents a clustering-based technique that aims to extract routine logs. Experiments were conducted on nine UI logs from the literature with different levels of injected noise. Our technique was compared with existing techniques, most of which are not meant to discover routine logs but were adapted for the purpose. The results were evaluated through standard state-of-the-art metrics, showing that we can extract more accurate routine logs than what the state of the art could, especially in the presence of noise.
comment: 16 pages, 5 figures
Beyond hospital reach: Autonomous lightweight ultrasound robot for liver sonography
Liver disease is a major global health burden. While ultrasound is the first-line diagnostic tool, liver sonography requires locating multiple non-continuous planes from positions where target structures are often not visible, for biometric assessment and lesion detection, requiring significant expertise. However, expert sonographers are severely scarce in resource-limited regions. Here, we develop an autonomous lightweight ultrasound robot comprising an AI agent that integrates multi-modal perception with memory attention for localization of unseen target structures, and a 588-gram 6-degrees-of-freedom cable-driven robot. By mounting on the abdomen, the system enhances robustness against motion. Our robot can autonomously acquire expert-level standard liver ultrasound planes and detect pathology in patients, including two from Xining, a 2261-meter-altitude city with limited medical resources. Our system performs effectively on rapid-motion individuals and in wilderness environments. This work represents the first demonstration of autonomous sonography across multiple challenging scenarios, potentially transforming access to expert-level diagnostics in underserved regions.
Towards Reliable LLM-based Robot Planning via Combined Uncertainty Estimation
Large language models (LLMs) demonstrate advanced reasoning abilities, enabling robots to understand natural language instructions and generate high-level plans with appropriate grounding. However, LLM hallucinations present a significant challenge, often leading to overconfident yet potentially misaligned or unsafe plans. While researchers have explored uncertainty estimation to improve the reliability of LLM-based planning, existing studies have not sufficiently differentiated between epistemic and intrinsic uncertainty, limiting the effectiveness of uncertainty estimation. In this paper, we present Combined Uncertainty estimation for Reliable Embodied planning (CURE), which decomposes the uncertainty into epistemic and intrinsic uncertainty, each estimated separately. Furthermore, epistemic uncertainty is subdivided into task clarity and task familiarity for more accurate evaluation. The overall uncertainty assessments are obtained using random network distillation and multi-layer perceptron regression heads driven by LLM features. We validated our approach in two distinct experimental settings: kitchen manipulation and tabletop rearrangement experiments. The results show that, compared to existing methods, our approach yields uncertainty estimates that are more closely aligned with the actual execution outcomes.
FastUMI-100K: Advancing Data-driven Robotic Manipulation with a Large-scale UMI-style Dataset
Data-driven robotic manipulation learning depends on large-scale, high-quality expert demonstration datasets. However, existing datasets, which primarily rely on human teleoperated robot collection, are limited in terms of scalability, trajectory smoothness, and applicability across different robotic embodiments in real-world environments. In this paper, we present FastUMI-100K, a large-scale UMI-style multimodal demonstration dataset, designed to overcome these limitations and meet the growing complexity of real-world manipulation tasks. Collected by FastUMI, a novel robotic system featuring a modular, hardware-decoupled mechanical design and an integrated lightweight tracking system, FastUMI-100K offers a more scalable, flexible, and adaptable solution to fulfill the diverse requirements of real-world robot demonstration data. Specifically, FastUMI-100K contains over 100K+ demonstration trajectories collected across representative household environments, covering 54 tasks and hundreds of object types. Our dataset integrates multimodal streams, including end-effector states, multi-view wrist-mounted fisheye images and textual annotations. Each trajectory has a length ranging from 120 to 500 frames. Experimental results demonstrate that FastUMI-100K enables high policy success rates across various baseline algorithms, confirming its robustness, adaptability, and real-world applicability for solving complex, dynamic manipulation challenges. The source code and dataset will be released in this link https://github.com/MrKeee/FastUMI-100K.
Orientation Learning and Adaptation towards Simultaneous Incorporation of Multiple Local Constraints
Orientation learning plays a pivotal role in many tasks. However, the rotation group SO(3) is a Riemannian manifold. As a result, the distortion caused by non-Euclidean geometric nature introduces difficulties to the incorporation of local constraints, especially for the simultaneous incorporation of multiple local constraints. To address this issue, we propose the Angle-Axis Space-based orientation representation method to solve several orientation learning problems, including orientation adaptation and minimization of angular acceleration. Specifically, we propose a weighted average mechanism in SO(3) based on the angle-axis representation method. Our main idea is to generate multiple trajectories by considering different local constraints at different basepoints. Then these multiple trajectories are fused to generate a smooth trajectory by our proposed weighted average mechanism, achieving the goal to incorporate multiple local constraints simultaneously. Compared with existing solution, ours can address the distortion issue and make the off-theshelf Euclidean learning algorithm be re-applicable in non-Euclidean space. Simulation and Experimental evaluations validate that our solution can not only adapt orientations towards arbitrary desired via-points and cope with angular acceleration constraints, but also incorporate multiple local constraints simultaneously to achieve extra benefits, e.g., achieving smaller acceleration costs.
Executable Analytic Concepts as the Missing Link Between VLM Insight and Precise Manipulation
Enabling robots to perform precise and generalized manipulation in unstructured environments remains a fundamental challenge in embodied AI. While Vision-Language Models (VLMs) have demonstrated remarkable capabilities in semantic reasoning and task planning, a significant gap persists between their high-level understanding and the precise physical execution required for real-world manipulation. To bridge this "semantic-to-physical" gap, we introduce GRACE, a novel framework that grounds VLM-based reasoning through executable analytic concepts (EAC)-mathematically defined blueprints that encode object affordances, geometric constraints, and semantics of manipulation. Our approach integrates a structured policy scaffolding pipeline that turn natural language instructions and visual information into an instantiated EAC, from which we derive grasp poses, force directions and plan physically feasible motion trajectory for robot execution. GRACE thus provides a unified and interpretable interface between high-level instruction understanding and low-level robot control, effectively enabling precise and generalizable manipulation through semantic-physical grounding. Extensive experiments demonstrate that GRACE achieves strong zero-shot generalization across a variety of articulated objects in both simulated and real-world environments, without requiring task-specific training.
Towards Proprioception-Aware Embodied Planning for Dual-Arm Humanoid Robots
In recent years, Multimodal Large Language Models (MLLMs) have demonstrated the ability to serve as high-level planners, enabling robots to follow complex human instructions. However, their effectiveness, especially in long-horizon tasks involving dual-arm humanoid robots, remains limited. This limitation arises from two main challenges: (i) the absence of simulation platforms that systematically support task evaluation and data collection for humanoid robots, and (ii) the insufficient embodiment awareness of current MLLMs, which hinders reasoning about dual-arm selection logic and body positions during planning. To address these issues, we present DualTHOR, a new dual-arm humanoid simulator, with continuous transition and a contingency mechanism. Building on this platform, we propose Proprio-MLLM, a model that enhances embodiment awareness by incorporating proprioceptive information with motion-based position embedding and a cross-spatial encoder. Experiments show that, while existing MLLMs struggle in this environment, Proprio-MLLM achieves an average improvement of 19.75% in planning performance. Our work provides both an essential simulation platform and an effective model to advance embodied intelligence in humanoid robotics. The code is available at https://anonymous.4open.science/r/DualTHOR-5F3B.
Team Xiaomi EV-AD VLA: Learning to Navigate Socially Through Proactive Risk Perception -- Technical Report for IROS 2025 RoboSense Challenge Social Navigation Track
In this report, we describe the technical details of our submission to the IROS 2025 RoboSense Challenge Social Navigation Track. This track focuses on developing RGBD-based perception and navigation systems that enable autonomous agents to navigate safely, efficiently, and socially compliantly in dynamic human-populated indoor environments. The challenge requires agents to operate from an egocentric perspective using only onboard sensors including RGB-D observations and odometry, without access to global maps or privileged information, while maintaining social norm compliance such as safe distances and collision avoidance. Building upon the Falcon model, we introduce a Proactive Risk Perception Module to enhance social navigation performance. Our approach augments Falcon with collision risk understanding that learns to predict distance-based collision risk scores for surrounding humans, which enables the agent to develop more robust spatial awareness and proactive collision avoidance behaviors. The evaluation on the Social-HM3D benchmark demonstrates that our method improves the agent's ability to maintain personal space compliance while navigating toward goals in crowded indoor scenes with dynamic human agents, achieving 2nd place among 16 participating teams in the challenge.
USIM and U0: A Vision-Language-Action Dataset and Model for General Underwater Robots
Underwater environments present unique challenges for robotic operation, including complex hydrodynamics, limited visibility, and constrained communication. Although data-driven approaches have advanced embodied intelligence in terrestrial robots and enabled task-specific autonomous underwater robots, developing underwater intelligence capable of autonomously performing multiple tasks remains highly challenging, as large-scale, high-quality underwater datasets are still scarce. To address these limitations, we introduce USIM, a simulation-based multi-task Vision-Language-Action (VLA) dataset for underwater robots. USIM comprises over 561K frames from 1,852 trajectories, totaling approximately 15.6 hours of BlueROV2 interactions across 20 tasks in 9 diverse scenarios, ranging from visual navigation to mobile manipulation. Building upon this dataset, we propose U0, a VLA model for general underwater robots, which integrates binocular vision and other sensor modalities through multimodal fusion, and further incorporates a convolution-attention-based perception focus enhancement module (CAP) to improve spatial understanding and mobile manipulation. Across tasks such as inspection, obstacle avoidance, scanning, and dynamic tracking, the framework achieves a success rate of 80%, while in challenging mobile manipulation tasks, it reduces the distance to the target by 21.2% compared with baseline methods, demonstrating its effectiveness. USIM and U0 show that VLA models can be effectively applied to underwater robotic applications, providing a foundation for scalable dataset construction, improved task autonomy, and the practical realization of intelligent general underwater robots.
DM1: MeanFlow with Dispersive Regularization for 1-Step Robotic Manipulation
The ability to learn multi-modal action distributions is indispensable for robotic manipulation policies to perform precise and robust control. Flow-based generative models have recently emerged as a promising solution to learning distributions of actions, offering one-step action generation and thus achieving much higher sampling efficiency compared to diffusion-based methods. However, existing flow-based policies suffer from representation collapse, the inability to distinguish similar visual representations, leading to failures in precise manipulation tasks. We propose DM1 (MeanFlow with Dispersive Regularization for One-Step Robotic Manipulation), a novel flow matching framework that integrates dispersive regularization into MeanFlow to prevent collapse while maintaining one-step efficiency. DM1 employs multiple dispersive regularization variants across different intermediate embedding layers, encouraging diverse representations across training batches without introducing additional network modules or specialized training procedures. Experiments on RoboMimic benchmarks show that DM1 achieves 20-40 times faster inference (0.07s vs. 2-3.5s) and improves success rates by 10-20 percentage points, with the Lift task reaching 99% success over 85% of the baseline. Real-robot deployment on a Franka Panda further validates that DM1 transfers effectively from simulation to the physical world. To the best of our knowledge, this is the first work to leverage representation regularization to enable flow-based policies to achieve strong performance in robotic manipulation, establishing a simple yet powerful approach for efficient and robust manipulation.
comment: Website with code: https://guowei-zou.github.io/dm1/
GM3: A General Physical Model for Micro-Mobility Vehicles
Modeling the dynamics of micro-mobility vehicles (MMV) is becoming increasingly important for training autonomous vehicle systems and building urban traffic simulations. However, mainstream tools rely on variants of the Kinematic Bicycle Model (KBM) or mode-specific physics that miss tire slip, load transfer, and rider/vehicle lean. To our knowledge, no unified, physics-based model captures these dynamics across the full range of common MMVs and wheel layouts. We propose the "Generalized Micro-mobility Model" (GM3), a tire-level formulation based on the tire brush representation that supports arbitrary wheel configurations, including single/double track and multi-wheel platforms. We introduce an interactive model-agnostic simulation framework that decouples vehicle/layout specification from dynamics to compare the GM3 with the KBM and other models, consisting of fixed step RK4 integration, human-in-the-loop and scripted control, real-time trajectory traces and logging for analysis. We also empirically validate the GM3 on the Stanford Drone Dataset's deathCircle (roundabout) scene for biker, skater, and cart classes.
IntentionVLA: Generalizable and Efficient Embodied Intention Reasoning for Human-Robot Interaction
Vision-Language-Action (VLA) models leverage pretrained vision-language models (VLMs) to couple perception with robotic control, offering a promising path toward general-purpose embodied intelligence. However, current SOTA VLAs are primarily pretrained on multimodal tasks with limited relevance to embodied scenarios, and then finetuned to map explicit instructions to actions. Consequently, due to the lack of reasoning-intensive pretraining and reasoning-guided manipulation, these models are unable to perform implicit human intention reasoning required for complex, real-world interactions. To overcome these limitations, we propose \textbf{IntentionVLA}, a VLA framework with a curriculum training paradigm and an efficient inference mechanism. Our proposed method first leverages carefully designed reasoning data that combine intention inference, spatial grounding, and compact embodied reasoning, endowing the model with both reasoning and perception capabilities. In the following finetuning stage, IntentionVLA employs the compact reasoning outputs as contextual guidance for action generation, enabling fast inference under indirect instructions. Experimental results show that IntentionVLA substantially outperforms $\pi_0$, achieving 18\% higher success rates with direct instructions and 28\% higher than ECoT under intention instructions. On out-of-distribution intention tasks, IntentionVLA achieves over twice the success rate of all baselines, and further enables zero-shot human-robot interaction with 40\% success rate. These results highlight IntentionVLA as a promising paradigm for next-generation human-robot interaction (HRI) systems.
Trajectory Conditioned Cross-embodiment Skill Transfer
Learning manipulation skills from human demonstration videos presents a promising yet challenging problem, primarily due to the significant embodiment gap between human body and robot manipulators. Existing methods rely on paired datasets or hand-crafted rewards, which limit scalability and generalization. We propose TrajSkill, a framework for Trajectory Conditioned Cross-embodiment Skill Transfer, enabling robots to acquire manipulation skills directly from human demonstration videos. Our key insight is to represent human motions as sparse optical flow trajectories, which serve as embodiment-agnostic motion cues by removing morphological variations while preserving essential dynamics. Conditioned on these trajectories together with visual and textual inputs, TrajSkill jointly synthesizes temporally consistent robot manipulation videos and translates them into executable actions, thereby achieving cross-embodiment skill transfer. Extensive experiments are conducted, and the results on simulation data (MetaWorld) show that TrajSkill reduces FVD by 39.6\% and KVD by 36.6\% compared with the state-of-the-art, and improves cross-embodiment success rate by up to 16.7\%. Real-robot experiments in kitchen manipulation tasks further validate the effectiveness of our approach, demonstrating practical human-to-robot skill transfer across embodiments.
Injecting Hallucinations in Autonomous Vehicles: A Component-Agnostic Safety Evaluation Framework
Perception failures in autonomous vehicles (AV) remain a major safety concern because they are the basis for many accidents. To study how these failures affect safety, researchers typically inject artificial faults into hardware or software components and observe the outcomes. However, existing fault injection studies often target a single sensor or machine perception (MP) module, resulting in siloed frameworks that are difficult to generalize or integrate into unified simulation environments. This work addresses that limitation by reframing perception failures as hallucinations, false perceptions that distort an AV situational awareness and may trigger unsafe control actions. Since hallucinations describe only observable effects, this abstraction enables analysis independent of specific sensors or algorithms, focusing instead on how their faults manifest along the MP pipeline. Building on this concept, we propose a configurable, component-agnostic hallucination injection framework that induces six plausible hallucination types in an iterative open-source simulator. More than 18,350 simulations were executed in which hallucinations were injected while AVs crossed an unsignalized transverse street with traffic. The results statistically validate the framework and quantify the impact of each hallucination type on collisions and near misses. Certain hallucinations, such as perceptual latency and drift, significantly increase the risk of collision in the scenario tested, validating the proposed paradigm can stress the AV system safety. The framework offers a scalable, statistically validated, component agnostic, and fully interoperable toolset that simplifies and accelerates AV safety validations, even those with novel MP architectures and components. It can potentially reduce the time-to-market of AV and lay the foundation for future research on fault tolerance, and resilient AV design.
comment: 22 pages, 15 figures, 21 tables
DEAS: DEtached value learning with Action Sequence for Scalable Offline RL
Offline reinforcement learning (RL) presents an attractive paradigm for training intelligent agents without expensive online interactions. However, current approaches still struggle with complex, long-horizon sequential decision making. In this work, we introduce DEtached value learning with Action Sequence (DEAS), a simple yet effective offline RL framework that leverages action sequences for value learning. These temporally extended actions provide richer information than single-step actions and can be interpreted through the options framework via semi-Markov decision process Q-learning, enabling reduction of the effective planning horizon by considering longer sequences at once. However, directly adopting such sequences in actor-critic algorithms introduces excessive value overestimation, which we address through detached value learning that steers value estimates toward in-distribution actions that achieve high return in the offline dataset. We demonstrate that DEAS consistently outperforms baselines on complex, long-horizon tasks from OGBench and can be applied to enhance the performance of large-scale Vision-Language-Action models that predict action sequences, significantly boosting performance in both RoboCasa Kitchen simulation tasks and real-world manipulation tasks.
comment: Project website: https://changyeon.site/deas
Probabilistically-Safe Bipedal Navigation over Uncertain Terrain via Conformal Prediction and Contraction Analysis
We address the challenge of enabling bipedal robots to traverse rough terrain by developing probabilistically safe planning and control strategies that ensure dynamic feasibility and centroidal robustness under terrain uncertainty. Specifically, we propose a high-level Model Predictive Control (MPC) navigation framework for a bipedal robot with a specified confidence level of safety that (i) enables safe traversal toward a desired goal location across a terrain map with uncertain elevations, and (ii) formally incorporates uncertainty bounds into the centroidal dynamics of locomotion control. To model the rough terrain, we employ Gaussian Process (GP) regression to estimate elevation maps and leverage Conformal Prediction (CP) to construct calibrated confidence intervals that capture the true terrain elevation. Building on this, we formulate contraction-based reachable tubes that explicitly account for terrain uncertainty, ensuring state convergence and tube invariance. In addition, we introduce a contraction-based flywheel torque control law for the reduced-order Linear Inverted Pendulum Model (LIPM), which stabilizes the angular momentum about the center-of-mass (CoM). This formulation provides both probabilistic safety and goal reachability guarantees. For a given confidence level, we establish the forward invariance of the proposed torque control law by demonstrating exponential stabilization of the actual CoM phase-space trajectory and the desired trajectory prescribed by the high-level planner. Finally, we evaluate the effectiveness of our planning framework through physics-based simulations of the Digit bipedal robot in MuJoCo.
comment: 9 pages, 4 figures
EB-MBD: Emerging-Barrier Model-Based Diffusion for Safe Trajectory Optimization in Highly Constrained Environments
We propose enforcing constraints on Model-Based Diffusion by introducing emerging barrier functions inspired by interior point methods. We show that constraints on Model-Based Diffusion can lead to catastrophic performance degradation, even on simple 2D systems due to sample inefficiency in the Monte Carlo approximation of the score function. We introduce Emerging-Barrier Model-Based Diffusion (EB-MBD) which uses progressively introduced barrier constraints to avoid these problems, significantly improving solution quality, without the need for computationally expensive operations such as projections. We analyze the sampling liveliness of samples each iteration to inform barrier parameter scheduling choice. We demonstrate results for 2D collision avoidance and a 3D underwater manipulator system and show that our method achieves lower cost solutions than Model-Based Diffusion, and requires orders of magnitude less computation time than projection based methods.
Differentiable Particle Optimization for Fast Sequential Manipulation
Sequential robot manipulation tasks require finding collision-free trajectories that satisfy geometric constraints across multiple object interactions in potentially high-dimensional configuration spaces. Solving these problems in real-time and at large scales has remained out of reach due to computational requirements. Recently, GPU-based acceleration has shown promising results, but prior methods achieve limited performance due to CPU-GPU data transfer overhead and complex logic that prevents full hardware utilization. To this end, we present SPaSM (Sampling Particle optimization for Sequential Manipulation), a fully GPU-parallelized framework that compiles constraint evaluation, sampling, and gradient-based optimization into optimized CUDA kernels for end-to-end trajectory optimization without CPU coordination. The method consists of a two-stage particle optimization strategy: first solving placement constraints through massively parallel sampling, then lifting solutions to full trajectory optimization in joint space. Unlike hierarchical approaches, SPaSM jointly optimizes object placements and robot trajectories to handle scenarios where motion feasibility constrains placement options. Experimental evaluation on challenging benchmarks demonstrates solution times in the realm of $\textbf{milliseconds}$ with a 100% success rate; a $4000\times$ speedup compared to existing approaches.
comment: 8 pages, 7 figures, 3 tables. Under review
CDE: Concept-Driven Exploration for Reinforcement Learning
Intelligent exploration remains a critical challenge in reinforcement learning (RL), especially in visual control tasks. Unlike low-dimensional state-based RL, visual RL must extract task-relevant structure from raw pixels, making exploration inefficient. We propose Concept-Driven Exploration (CDE), which leverages a pre-trained vision-language model (VLM) to generate object-centric visual concepts from textual task descriptions as weak, potentially noisy supervisory signals. Rather than directly conditioning on these noisy signals, CDE trains a policy to reconstruct the concepts via an auxiliary objective, using reconstruction accuracy as an intrinsic reward to guide exploration toward task-relevant objects. Because the policy internalizes these concepts, VLM queries are only needed during training, reducing dependence on external models during deployment. Across five challenging simulated visual manipulation tasks, CDE achieves efficient, targeted exploration and remains robust to noisy VLM predictions. Finally, we demonstrate real-world transfer by deploying CDE on a Franka Research 3 arm, attaining an 80\% success rate in a real-world manipulation task.
comment: Preprint
Adaptive Science Operations in Deep Space Missions Using Offline Belief State Planning SP
Deep space missions face extreme communication delays and environmental uncertainty that prevent real-time ground operations. To support autonomous science operations in communication-constrained environments, we present a partially observable Markov decision process (POMDP) framework that adaptively sequences spacecraft science instruments. We integrate a Bayesian network into the POMDP observation space to manage the high-dimensional and uncertain measurements typical of astrobiology missions. This network compactly encodes dependencies among measurements and improves the interpretability and computational tractability of science data. Instrument operation policies are computed offline, allowing resource-aware plans to be generated and thoroughly validated prior to launch. We use the Enceladus Orbilander's proposed Life Detection Suite (LDS) as a case study, demonstrating how Bayesian network structure and reward shaping influence system performance. We compare our method against the mission's baseline Concept of Operations (ConOps), evaluating both misclassification rates and performance in off-nominal sample accumulation scenarios. Our approach reduces sample identification errors by nearly 40%
comment: 7 pages, 4 tables, 5 figures, accepted in IEEE ISPARO 2026
Adaptive Motion Planning via Contact-Based Intent Inference for Human-Robot Collaboration
Human-robot collaboration (HRC) requires robots to adapt their motions to human intent to ensure safe and efficient cooperation in shared spaces. Although large language models (LLMs) provide high-level reasoning for inferring human intent, their application to reliable motion planning in HRC remains challenging. Physical human-robot interaction (pHRI) is intuitive but often relies on continuous kinesthetic guidance, which imposes burdens on operators. To address these challenges, a contact-informed adaptive motion-planning framework is introduced to infer human intent directly from physical contact and employ the inferred intent for online motion correction in HRC. First, an optimization-based force estimation method is proposed to infer human-intended contact forces and locations from joint torque measurements and a robot dynamics model, thereby reducing cost and installation complexity while enabling whole-body sensitivity. Then, a torque-based contact detection mechanism with link-level localization is introduced to reduce the optimization search space and to enable real-time estimation. Subsequently, a contact-informed adaptive motion planner is developed to infer human intent from contacts and to replan robot motion online, while maintaining smoothness and adapting to human corrections. Finally, experiments on a 7-DOF manipulator are conducted to demonstrate the accuracy of the proposed force estimation method and the effectiveness of the contact-informed adaptive motion planner under perception uncertainty in HRC.
Humanoid Everyday: A Comprehensive Robotic Dataset for Open-World Humanoid Manipulation
From loco-motion to dextrous manipulation, humanoid robots have made remarkable strides in demonstrating complex full-body capabilities. However, the majority of current robot learning datasets and benchmarks mainly focus on stationary robot arms, and the few existing humanoid datasets are either confined to fixed environments or limited in task diversity, often lacking human-humanoid interaction and lower-body locomotion. Moreover, there are a few standardized evaluation platforms for benchmarking learning-based policies on humanoid data. In this work, we present Humanoid Everyday, a large-scale and diverse humanoid manipulation dataset characterized by extensive task variety involving dextrous object manipulation, human-humanoid interaction, locomotion-integrated actions, and more. Leveraging a highly efficient human-supervised teleoperation pipeline, Humanoid Everyday aggregates high-quality multimodal sensory data, including RGB, depth, LiDAR, and tactile inputs, together with natural language annotations, comprising 10.3k trajectories and over 3 million frames of data across 260 tasks across 7 broad categories. In addition, we conduct an analysis of representative policy learning methods on our dataset, providing insights into their strengths and limitations across different task categories. For standardized evaluation, we introduce a cloud-based evaluation platform that allows researchers to seamlessly deploy their policies in our controlled setting and receive performance feedback. By releasing Humanoid Everyday along with our policy learning analysis and a standardized cloud-based evaluation platform, we intend to advance research in general-purpose humanoid manipulation and lay the groundwork for more capable and embodied robotic agents in real-world scenarios. Our dataset, data collection code, and cloud evaluation website are made publicly available on our project website.
Geometry-aware Policy Imitation
We propose a Geometry-aware Policy Imitation (GPI) approach that rethinks imitation learning by treating demonstrations as geometric curves rather than collections of state-action samples. From these curves, GPI derives distance fields that give rise to two complementary control primitives: a progression flow that advances along expert trajectories and an attraction flow that corrects deviations. Their combination defines a controllable, non-parametric vector field that directly guides robot behavior. This formulation decouples metric learning from policy synthesis, enabling modular adaptation across low-dimensional robot states and high-dimensional perceptual inputs. GPI naturally supports multimodality by preserving distinct demonstrations as separate models and allows efficient composition of new demonstrations through simple additions to the distance field. We evaluate GPI in simulation and on real robots across diverse tasks. Experiments show that GPI achieves higher success rates than diffusion-based policies while running 20 times faster, requiring less memory, and remaining robust to perturbations. These results establish GPI as an efficient, interpretable, and scalable alternative to generative approaches for robotic imitation learning. Project website: https://yimingli1998.github.io/projects/GPI/
comment: 21 pages, 13 figures. In submission
Detecting spills using thermal imaging, pretrained deep learning models, and a robotic platform
This paper presents a real-time spill detection system that utilizes pretrained deep learning models with RGB and thermal imaging to classify spill vs. no-spill scenarios across varied environments. Using a balanced binary dataset (4,000 images), our experiments demonstrate the advantages of thermal imaging in inference speed, accuracy, and model size. We achieve up to 100% accuracy using lightweight models like VGG19 and NasNetMobile, with thermal models performing faster and more robustly across different lighting conditions. Our system runs on consumer-grade hardware (RTX 4080) and achieves inference times as low as 44 ms with model sizes under 350 MB, highlighting its deployability in safety-critical contexts. Results from experiments with a real robot and test datasets indicate that a VGG19 model trained on thermal imaging performs best.
comment: 6 pages
Zero-Shot Policy Transfer in Reinforcement Learning using Buckingham's Pi Theorem
Reinforcement learning (RL) policies often fail to generalize to new robots, tasks, or environments with different physical parameters, a challenge that limits their real-world applicability. This paper presents a simple, zero-shot transfer method based on Buckingham's Pi Theorem to address this limitation. The method adapts a pre-trained policy to new system contexts by scaling its inputs (observations) and outputs (actions) through a dimensionless space, requiring no retraining. The approach is evaluated against a naive transfer baseline across three environments of increasing complexity: a simulated pendulum, a physical pendulum for sim-to-real validation, and the high-dimensional HalfCheetah. Results demonstrate that the scaled transfer exhibits no loss of performance on dynamically similar contexts. Furthermore, on non-similar contexts, the scaled policy consistently outperforms the naive transfer, significantly expanding the volume of contexts where the original policy remains effective. These findings demonstrate that dimensional analysis provides a powerful and practical tool to enhance the robustness and generalization of RL policies.
BEAR: Benchmarking and Enhancing Multimodal Language Models for Atomic Embodied Capabilities
Embodied capabilities refer to a suite of fundamental abilities for an agent to perceive, comprehend, and interact with the physical world. While multimodal large language models (MLLMs) show promise as embodied agents, a thorough and systematic evaluation of their embodied capabilities remains underexplored, as existing benchmarks primarily focus on specific domains such as planning or spatial understanding. To bridge this gap, we introduce BEAR, a comprehensive and fine-grained benchmark that evaluates MLLMs on atomic embodied capabilities. BEAR comprises 4,469 interleaved image-video-text entries across 14 domains in 6 categories, including tasks from low-level pointing, trajectory understanding, spatial reasoning, to high-level planning. Extensive evaluation results of 20 representative MLLMs reveal their persistent limitations across all domains of embodied capabilities. To tackle the shortfall, we propose BEAR-Agent, a multimodal conversable agent that integrates pretrained vision models to strengthen MLLM perception, 3D understanding, and planning capabilities. It substantially enhances MLLM performance across diverse embodied capabilities on BEAR, yielding a 9.12% absolute gain and a relative improvement of 17.5% on GPT-5. Furthermore, our experiments indicate that improving MLLM embodied capabilities can benefit embodied tasks in simulated environments. Project website: https://bear-official66.github.io/
Whole Body Model Predictive Control for Spin-Aware Quadrupedal Table Tennis ICRA 2026
Developing table tennis robots that mirror human speed, accuracy, and ability to predict and respond to the full range of ball spins remains a significant challenge for legged robots. To demonstrate these capabilities we present a system to play dynamic table tennis for quadrupedal robots that integrates high speed perception, trajectory prediction, and agile control. Our system uses external cameras for high-speed ball localization, physical models with learned residuals to infer spin and predict trajectories, and a novel model predictive control (MPC) formulation for agile full-body control. Notably, a continuous set of stroke strategies emerge automatically from different ball return objectives using this control paradigm. We demonstrate our system in the real world on a Spot quadruped, evaluate accuracy of each system component, and exhibit coordination through the system's ability to aim and return balls with varying spin types. As a further demonstration, the system is able to rally with human players.
comment: Submitted to appear in IEEE ICRA 2026
Point and Go: Intuitive Reference Frame Reallocation in Mode Switching for Assistive Robotics
Operating high degree of freedom robots can be difficult for users of wheelchair mounted robotic manipulators. Mode switching in Cartesian space has several drawbacks such as unintuitive control reference frames, separate translation and orientation control, and limited movement capabilities that hinder performance. We propose Point and Go mode switching, which reallocates the Cartesian mode switching reference frames into a more intuitive action space comprised of new translation and rotation modes. We use a novel sweeping motion to point the gripper, which defines the new translation axis along the robot base frame's horizontal plane. This creates an intuitive `point and go' translation mode that allows the user to easily perform complex, human-like movements without switching control modes. The system's rotation mode combines position control with a refined end-effector oriented frame that provides precise and consistent robot actions in various end-effector poses. We verified its effectiveness through initial experiments, followed by a three-task user study that compared our method to Cartesian mode switching and a state of the art learning method. Results show that Point and Go mode switching reduced completion times by 31\%, pauses by 41\%, and mode switches by 33\%, while receiving significantly favorable responses in user surveys.
comment: 7 Pages, 5 figures
Unified World Models: Memory-Augmented Planning and Foresight for Visual Navigation
Enabling embodied agents to effectively imagine future states is critical for robust and generalizable visual navigation. Current state-of-the-art approaches, however, adopt modular architectures that separate navigation planning from visual world modeling, leading to state-action misalignment and limited adaptability in novel or dynamic scenarios. To overcome this fundamental limitation, we propose UniWM, a unified, memory-augmented world model integrating egocentric visual foresight and planning within a single multimodal autoregressive backbone. Unlike modular frameworks, UniWM explicitly grounds action decisions in visually imagined outcomes, ensuring tight alignment between prediction and control. A hierarchical memory mechanism further integrates detailed short-term perceptual cues with longer-term trajectory context, enabling stable, coherent reasoning over extended horizons. Extensive experiments across four challenging benchmarks (Go Stanford, ReCon, SCAND, HuRoN) demonstrate that UniWM substantially improves navigation success rates by up to 30%, significantly reduces trajectory errors compared to strong baselines, and exhibits impressive zero-shot generalization on the unseen TartanDrive dataset. These results highlight UniWM as a principled step toward unified, imagination-driven embodied navigation.
comment: 18 pages, 11 figures, code: https://github.com/F1y1113/UniWM
ConPoSe: LLM-Guided Contact Point Selection for Scalable Cooperative Object Pushing
Object transportation in cluttered environments is a fundamental task in various domains, including domestic service and warehouse logistics. In cooperative object transport, multiple robots must coordinate to move objects that are too large for a single robot. One transport strategy is pushing, which only requires simple robots. However, careful selection of robot-object contact points is necessary to push the object along a preplanned path. Although this selection can be solved analytically, the solution space grows combinatorially with the number of robots and object size, limiting scalability. Inspired by how humans rely on common-sense reasoning for cooperative transport, we propose combining the reasoning capabilities of Large Language Models with local search to select suitable contact points. Our LLM-guided local search method for contact point selection, ConPoSe, successfully selects contact points for a variety of shapes, including cuboids, cylinders, and T-shapes. We demonstrate that ConPoSe scales better with the number of robots and object size than the analytical approach, and also outperforms pure LLM-based selection.
FreeTacMan: Robot-free Visuo-Tactile Data Collection System for Contact-rich Manipulation
Enabling robots with contact-rich manipulation remains a pivotal challenge in robot learning, which is substantially hindered by the data collection gap, including its inefficiency and limited sensor setup. While prior work has explored handheld paradigms, their rod-based mechanical structures remain rigid and unintuitive, providing limited tactile feedback and posing challenges for human operators. Motivated by the dexterity and force feedback of human motion, we propose FreeTacMan, a human-centric and robot-free data collection system for accurate and efficient robot manipulation. Concretely, we design a wearable gripper with dual visuo-tactile sensors for data collection, which can be worn by human fingers for intuitive control. A high-precision optical tracking system is introduced to capture end-effector poses while synchronizing visual and tactile feedback simultaneously. We leverage FreeTacMan to collect a large-scale multimodal dataset, comprising over 3000k paired visual-tactile images with end-effector poses, 10k demonstration trajectories across 50 diverse contact-rich manipulation tasks. FreeTacMan achieves multiple improvements in data collection performance compared to prior works, and enables effective policy learning for contact-rich manipulation tasks with self-collected dataset. The full suite of hardware specifications and the dataset will be released to facilitate reproducibility and support research in visuo-tactile manipulation.
Uncertainty Comes for Free: Human-in-the-Loop Policies with Diffusion Models
Human-in-the-loop (HitL) robot deployment has gained significant attention in both academia and industry as a semi-autonomous paradigm that enables human operators to intervene and adjust robot behaviors at deployment time, improving success rates. However, continuous human monitoring and intervention can be highly labor-intensive and impractical when deploying a large number of robots. To address this limitation, we propose a method that allows diffusion policies to actively seek human assistance only when necessary, reducing reliance on constant human oversight. To achieve this, we leverage the generative process of diffusion policies to compute an uncertainty-based metric based on which the autonomous agent can decide to request operator assistance at deployment time, without requiring any operator interaction during training. Additionally, we show that the same method can be used for efficient data collection for fine-tuning diffusion policies in order to improve their autonomous performance. Experimental results from simulated and real-world environments demonstrate that our approach enhances policy performance during deployment for a variety of scenarios.
ERR@HRI 2.0 Challenge: Multimodal Detection of Errors and Failures in Human-Robot Conversations
The integration of large language models (LLMs) into conversational robots has made human-robot conversations more dynamic. Yet, LLM-powered conversational robots remain prone to errors, e.g., misunderstanding user intent, prematurely interrupting users, or failing to respond altogether. Detecting and addressing these failures is critical for preventing conversational breakdowns, avoiding task disruptions, and sustaining user trust. To tackle this problem, the ERR@HRI 2.0 Challenge provides a multimodal dataset of LLM-powered conversational robot failures during human-robot conversations and encourages researchers to benchmark machine learning models designed to detect robot failures. The dataset includes 16 hours of dyadic human-robot interactions, incorporating facial, speech, and head movement features. Each interaction is annotated with the presence or absence of robot errors from the system perspective, and perceived user intention to correct for a mismatch between robot behavior and user expectation. Participants are invited to form teams and develop machine learning models that detect these failures using multimodal data. Submissions will be evaluated using various performance metrics, including detection accuracy and false positive rate. This challenge represents another key step toward improving failure detection in human-robot interaction through social signal analysis.
iA*: Imperative Learning-based A* Search for Path Planning
Path planning, which aims to find a collision-free path between two locations, is critical for numerous applications ranging from mobile robots to self-driving vehicles. Traditional search-based methods like A* search guarantee path optimality but are often computationally expensive when handling large-scale maps. While learning-based methods alleviate this issue by incorporating learned constraints into their search procedures, they often face challenges like overfitting and reliance on extensive labeled datasets. To address these limitations, we propose Imperative A* (iA*), a novel self-supervised path planning framework leveraging bilevel optimization (BLO) and imperative learning (IL). The iA* framework integrates a neural network that predicts node costs with a differentiable A* search mechanism, enabling efficient self-supervised training via bilevel optimization. This integration significantly enhances the balance between search efficiency and path optimality while improving generalization to previously unseen maps. Extensive experiments demonstrate that iA* outperforms both classical and supervised learning-based methods, achieving an average reduction of 65.7\% in search area and 54.4\% in runtime, underscoring its effectiveness in robot path planning tasks.
FG-PE: Factor-graph Approach for Multi-robot Pursuit-Evasion
With the increasing use of robots in daily life, there is a growing need to provide robust collaboration protocols for robots to tackle more complicated and dynamic problems effectively. This paper presents a novel, factor graph-based approach to address the pursuit-evasion problem, enabling accurate estimation, planning, and tracking of an evader by multiple pursuers working together. It is assumed that there are multiple pursuers and only one evader in this scenario. The proposed method significantly improves the accuracy of evader estimation and tracking, allowing pursuers to capture the evader in the shortest possible time and distance compared to existing techniques. In addition to these primary objectives, the proposed approach effectively minimizes uncertainty while remaining robust, even when communication issues lead to some messages being dropped or lost. Through a series of comprehensive experiments, this paper demonstrates that the proposed algorithm consistently outperforms traditional pursuit-evasion methods across several key performance metrics, such as the time required to capture the evader and the average distance traveled by the pursuers. Additionally, the proposed method is tested in real-world hardware experiments, further validating its effectiveness and applicability.
Ultra-Efficient On-Device Object Detection on AI-Integrated Smart Glasses with TinyissimoYOLO ECCV 2024
Smart glasses are rapidly gaining advanced functions thanks to cutting-edge computing technologies, especially accelerated hardware architectures, and tiny Artificial Intelligence (AI) algorithms. However, integrating AI into smart glasses featuring a small form factor and limited battery capacity remains challenging for a satisfactory user experience. To this end, this paper proposes the design of a smart glasses platform for always-on on-device object detection with an all-day battery lifetime. The proposed platform is based on GAP9, a novel multi-core RISC-V processor from Greenwaves Technologies. Additionally, a family of sub-million parameter TinyissimoYOLO networks are proposed. They are benchmarked on established datasets, capable of differentiating up to 80 classes on MS-COCO. Evaluations on the smart glasses prototype demonstrate TinyissimoYOLO's inference latency of only 17ms and consuming 1.59mJ energy per inference. An end-to-end latency of 56ms is achieved which is equivalent to 18 frames per seconds (FPS) with a total power consumption of 62.9mW. This ensures continuous system runtime of up to 9.3 hours on a 154mAh battery. These results outperform MCUNet (TinyNAS+TinyEngine), which runs a simpler task (image classification) at just 7.3 FPS, while the 18 FPS achieved in this paper even include image-capturing, network inference, and detection post-processing. The algorithm's code is released open with this paper and can be found here: https://github.com/ETH-PBL/TinyissimoYOLO
comment: This paper has been accepted for publication at ECCV 2024 Workshops, Milan, 2024
TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics
Vision-Language Models (VLMs) have shown remarkable capabilities in spatial reasoning, yet they remain fundamentally limited to qualitative precision and lack the computational precision required for real-world robotics. Current approaches fail to leverage metric cues from depth sensors and camera calibration, instead reducing geometric problems to pattern recognition tasks that cannot deliver the centimeter-level accuracy essential for robotic manipulation. We present TIGeR (Tool-Integrated Geometric Reasoning), a novel framework that transforms VLMs from perceptual estimators to geometric computers by enabling them to generate and execute precise geometric computations through external tools. Rather than attempting to internalize complex geometric operations within neural networks, TIGeR empowers models to recognize geometric reasoning requirements, synthesize appropriate computational code, and invoke specialized libraries for exact calculations. To support this paradigm, we introduce TIGeR-300K, a comprehensive tool-invocation-oriented dataset covering point transformations, pose estimation, and spatial compatibility verification, complete with tool invocation sequences and intermediate computations. Through a two-stage training pipeline combining supervised fine-tuning (SFT) and reinforcement fine-tuning (RFT) with our proposed hierarchical reward design, TIGeR achieves SOTA performance on geometric reasoning benchmarks while demonstrating centimeter-level precision in real-world robotic manipulation tasks.
comment: 9 pages, 6 figures
Product-oriented Product-Process-Resource Asset Network and its Representation in AutomationML for Asset Administration Shell
Current products, especially in the automotive sector, pose complex technical systems having a multi-disciplinary mechatronic nature. Industrial standards supporting system engineering and production typically (i) address the production phase only, but do not cover the complete product life cycle, and (ii) focus on production processes and resources rather than the products themselves. The presented approach is motivated by incorporating the impacts of the end-of-life phase of the product life cycle into the engineering phase. This paper proposes a modeling approach coming up from the Product-Process-Resource (PPR) modeling paradigm. It combines requirements on (i) respecting the product structure as a basis for the model, and (ii) incorporates repairing, remanufacturing, or upcycling within cyber-physical production systems. The proposed model called PoPAN should accompany the product during the entire life cycle as a digital shadow encapsulated within the Asset Administration Shell of a product. To facilitate the adoption of the proposed paradigm, the paper also proposes serialization of the model in the AutomationML data format. The model is demonstrated on a use-case for disassembling electric vehicle batteries to support their remanufacturing for stationary battery applications.
comment: \copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Hierarchical Reinforcement Learning with Low-Level MPC for Multi-Agent Control
Achieving safe and coordinated behavior in dynamic, constraint-rich environments remains a major challenge for learning-based control. Pure end-to-end learning often suffers from poor sample efficiency and limited reliability, while model-based methods depend on predefined references and struggle to generalize. We propose a hierarchical framework that combines tactical decision-making via reinforcement learning (RL) with low-level execution through Model Predictive Control (MPC). For the case of multi-agent systems this means that high-level policies select abstract targets from structured regions of interest (ROIs), while MPC ensures dynamically feasible and safe motion. Tested on a predator-prey benchmark, our approach outperforms end-to-end and shielding-based RL baselines in terms of reward, safety, and consistency, underscoring the benefits of combining structured learning with model-based control.
OmniNav: A Unified Framework for Prospective Exploration and Visual-Language Navigation
Embodied navigation presents a core challenge for intelligent robots, requiring the comprehension of visual environments, natural language instructions, and autonomous exploration. Existing models often fall short in offering a unified solution across diverse navigation paradigms, resulting in low success rates and limited generalization. We introduce OmniNav, a unified framework addressing instruct-goal, object-goal, point-goal navigation, and frontier-based exploration within a single architecture. Our approach features a lightweight, low-latency policy that accurately predicts continuous-space waypoints (coordinates and orientations). This policy surpasses action-chunk methods in precision and supports real-world deployment at control frequencies up to 5 Hz. Architecturally, OmniNav employs a fast-slow system design: a fast module generates waypoints using short-horizon visual context and subtasks, while a slow module performs deliberative planning with long-horizon observations and candidate frontiers to select subsequent subgoals and subtasks. This collaboration enhances path efficiency and maintains trajectory coherence, particularly in exploration and memory-intensive scenarios. Crucially, we identify that the primary bottleneck isn't merely navigation policy learning, but a robust understanding of general instructions and objects. To boost generalization, OmniNav integrates large-scale, general-purpose training datasets, including those for image captioning and visual recognition, into a joint multi-task regimen. This significantly improves success rates and robustness. Extensive experiments confirm OmniNav's state-of-the-art performance across various navigation benchmarks, with real-world deployment further validating its efficacy. OmniNav provides practical insights for embodied navigation, charting a scalable path towards versatile, highly generalizable robotic intelligence.
Beyond Behavior Cloning: Robustness through Interactive Imitation and Contrastive Learning
Behavior cloning (BC) traditionally relies on demonstration data, assuming the demonstrated actions are optimal. This can lead to overfitting under noisy data, particularly when expressive models are used (e.g., the energy-based model in Implicit BC). To address this, we extend behavior cloning into an iterative process of optimal action estimation within the Interactive Imitation Learning framework. Specifically, we introduce Contrastive policy Learning from Interactive Corrections (CLIC). CLIC leverages human corrections to estimate a set of desired actions and optimizes the policy to select actions from this set. Extensive simulation and real-robot experiments validate CLIC's advantages over existing state-of-the-art methods, including stable training of energy-based models, robustness to feedback noise, and adaptability to diverse feedback types beyond demonstrations. Our implementation is publicly available at https://github.com/clic-webpage/CLIC.
ManipGPT: Is Affordance Segmentation by Large Vision Models Enough for Articulated Object Manipulation?
Visual actionable affordance has emerged as a transformative approach in robotics, focusing on perceiving interaction areas prior to manipulation. Traditional methods rely on pixel sampling to identify successful interaction samples or processing pointclouds for affordance mapping. However, these approaches are computationally intensive and struggle to adapt to diverse and dynamic environments. This paper introduces ManipGPT, a framework designed to predict optimal interaction areas for articulated objects using a large pre-trained vision transformer (ViT). We create a dataset of 9.9k simulated and real images to bridge the visual sim-to-real gap and enhance real-world applicability. By fine-tuning the vision transformer on this small dataset, we significantly improve part-level affordance segmentation, adapting the model's in-context segmentation capabilities to robot manipulation scenarios. This enables effective manipulation across simulated and real-world environments by generating part-level affordance masks, paired with an impedance adaptation policy, sufficiently eliminating the need for complex datasets or perception systems.
comment: 8 pages, 6 figures
A Long-Duration Autonomy Approach to Connected and Automated Vehicles
In this article, we present a long-duration autonomy approach for the control of connected and automated vehicles (CAVs) operating in a transportation network. In particular, we focus on the performance of CAVs at traffic bottlenecks, including roundabouts, merging roadways, and intersections. We take a principled approach based on optimal control, and derive a reactive controller with guarantees on safety, performance, and energy efficiency. We guarantee safety through high order control barrier functions (HOCBFs), which we ``lift'' to first order CBFs using time-optimal motion primitives. This yields a set of first-order CBFs that are compatible with the control bounds. We demonstrate the performance of our approach in simulation and compare it to an optimal control-based approach.
comment: 8 pages, 3 figures
Fast Online Adaptive Neural MPC via Meta-Learning
Data-driven model predictive control (MPC) has demonstrated significant potential for improving robot control performance in the presence of model uncertainties. However, existing approaches often require extensive offline data collection and computationally intensive training, limiting their ability to adapt online. To address these challenges, this paper presents a fast online adaptive MPC framework that leverages neural networks integrated with Model-Agnostic Meta-Learning (MAML). Our approach focuses on few-shot adaptation of residual dynamics - capturing the discrepancy between nominal and true system behavior - using minimal online data and gradient steps. By embedding these meta-learned residual models into a computationally efficient L4CasADi-based MPC pipeline, the proposed method enables rapid model correction, enhances predictive accuracy, and improves real-time control performance. We validate the framework through simulation studies on a Van der Pol oscillator, a Cart-Pole system, and a 2D quadrotor. Results show significant gains in adaptation speed and prediction accuracy over both nominal MPC and nominal MPC augmented with a freshly initialized neural network, underscoring the effectiveness of our approach for real-time adaptive robot control.
Characterizing and Optimizing Real-Time Optimal Control for Embedded SoCs
Resource-limited robots face significant challenges in executing computationally intensive tasks, such as locomotion and manipulation, particularly for real-time optimal control algorithms like Model Predictive Control (MPC). This paper provides a comprehensive design space exploration to identify optimal hardware computation architectures for these demanding model-based control algorithms. We profile and optimize representative architectural designs, including general-purpose scalar CPUs, vector processors, and specialized accelerators. By characterizing kernel-level benchmarks and end-to-end robotic scenarios, including a hardware-in-the-loop evaluation on a fabricated RISC-V multi-core vector SoC, we present a quantitative comparison of performance, area, and utilization across distinct architectural design points. Our findings demonstrate that targeted architectural modifications, coupled with deep software and system optimizations, enable up to 3.71x speedups for MPC, resulting in up to 27% system-level power reductions while completing robotic tasks. Finally, we propose a code generation flow designed to simplify the complex engineering effort required for mapping robotic workloads onto specialized architectures.
Hybrid Feedback Control for Global Navigation with Locally Optimal Obstacle Avoidance in n-Dimensional Spaces
We present a hybrid feedback control framework for autonomous robot navigation in n-dimensional Euclidean spaces cluttered with spherical obstacles. The proposed approach ensures safe and global navigation towards a target location by dynamically switching between two operational modes: motion-to-destination and locally optimal obstacle-avoidance. It produces continuous velocity inputs, ensures collision-free trajectories and generates locally optimal obstacle avoidance maneuvers. Unlike existing methods, the proposed framework is compatible with range sensors, enabling navigation in both a priori known and unknown environments. Extensive simulations in 2D and 3D settings, complemented by experimental validation on a TurtleBot 4 platform, confirm the efficacy and robustness of the approach. Our results demonstrate shorter paths and smoother trajectories compared to state-of-the-art methods, while maintaining computational efficiency and real-world feasibility.
Safe Autonomous Environmental Contact for Soft Robots using Control Barrier Functions
Robots built from soft materials will inherently apply lower environmental forces than their rigid counterparts, and therefore may be more suitable in sensitive settings with unintended contact. However, these robots' applied forces result from both their design and their control system in closed-loop, and therefore, ensuring bounds on these forces requires controller synthesis for safety as well. This article introduces the first feedback controller for a soft manipulator that formally meets a safety specification with respect to environmental contact. In our proof-of-concept setting, the robot's environment has known geometry and is deformable with a known elastic modulus. Our approach maps a bound on applied forces to a safe set of positions of the robot's tip via predicted deformations of the environment. Then, a quadratic program with Control Barrier Functions in its constraints is used to supervise a nominal feedback signal, verifiably maintaining the robot's tip within this safe set. Hardware experiments on a multi-segment soft pneumatic robot demonstrate that the proposed framework successfully maintains a positive safety margin. This framework represents a fundamental shift in perspective on control and safety for soft robots, implementing a formally verifiable logic specification on their pose and contact forces.
comment: 8 pages, 9 figures
Through the Perspective of LiDAR: A Feature-Enriched and Uncertainty-Aware Annotation Pipeline for Terrestrial Point Cloud Segmentation
Accurate semantic segmentation of terrestrial laser scanning (TLS) point clouds is limited by costly manual annotation. We propose a semi-automated, uncertainty-aware pipeline that integrates spherical projection, feature enrichment, ensemble learning, and targeted annotation to reduce labeling effort, while sustaining high accuracy. Our approach projects 3D points to a 2D spherical grid, enriches pixels with multi-source features, and trains an ensemble of segmentation networks to produce pseudo-labels and uncertainty maps, the latter guiding annotation of ambiguous regions. The 2D outputs are back-projected to 3D, yielding densely annotated point clouds supported by a three-tier visualization suite (2D feature maps, 3D colorized point clouds, and compact virtual spheres) for rapid triage and reviewer guidance. Using this pipeline, we build Mangrove3D, a semantic segmentation TLS dataset for mangrove forests. We further evaluate data efficiency and feature importance to address two key questions: (1) how much annotated data are needed and (2) which features matter most. Results show that performance saturates after ~12 annotated scans, geometric features contribute the most, and compact nine-channel stacks capture nearly all discriminative power, with the mean Intersection over Union (mIoU) plateauing at around 0.76. Finally, we confirm the generalization of our feature-enrichment strategy through cross-dataset tests on ForestSemantic and Semantic3D. Our contributions include: (i) a robust, uncertainty-aware TLS annotation pipeline with visualization tools; (ii) the Mangrove3D dataset; and (iii) empirical guidance on data efficiency and feature importance, thus enabling scalable, high-quality segmentation of TLS point clouds for ecological monitoring and beyond. The dataset and processing scripts are publicly available at https://fz-rit.github.io/through-the-lidars-eye/.
comment: 40 pages (28 main text), 20 figures, 4 supplementary materials; links to 3D point animations are included in the last table
Real-time Human Finger Pointing Recognition and Estimation for Robot Directives Using a Single Web-Camera
Gestures play a pivotal role in human communication, often serving as a preferred or complementary medium to verbal expression due to their superior spatial reference capabilities. A finger-pointing gesture conveys vital information regarding some point of interest in the environment. In Human-Robot Interaction (HRI), users can easily direct robots to target locations, facilitating tasks in diverse domains such as search and rescue or factory assistance. State-of-the-art approaches for visual pointing estimation often rely on depth cameras, are limited to indoor environments, and provide discrete predictions between limited targets. In this paper, we explore the development of models that enable robots to understand pointing directives from humans using a single web camera, even in diverse indoor and outdoor environments. A novel perception framework is proposed which includes a designated data-based model termed PointingNet. PointingNet recognizes the occurrence of pointing through classification followed by approximating the position and direction of the index finger with an advanced regression model. The model relies on a novel segmentation model for masking any lifted arm. While state-of-the-art human pose estimation models provide poor pointing angle estimation error of 28deg, PointingNet exhibits a mean error of less than 2deg. With the pointing information, the target location is computed, followed by robot motion planning and execution. The framework is evaluated on two robotic systems, demonstrating accurate target reaching.
IG-MCTS: Human-in-the-Loop Cooperative Navigation under Incomplete Information
Human-robot cooperative navigation is challenging under incomplete information. We introduce CoNav-Maze, a simulated environment where a robot navigates with local perception while a human operator provides guidance based on an inaccurate map. The robot can share its onboard camera views to help the operator refine their understanding of the environment. To enable efficient cooperation, we propose Information Gain Monte Carlo Tree Search (IG-MCTS), an online planning algorithm that jointly optimizes autonomous movement and informative communication. IG-MCTS leverages a learned Neural Human Perception Model (NHPM) -- trained on a crowdsourced mapping dataset -- to predict how the human's internal map evolves as new observations are shared. User studies show that IG-MCTS significantly reduces communication demands and yields eye-tracking metrics indicative of lower cognitive load, while maintaining task performance comparable to teleoperation and instruction-following baselines. Finally, we illustrate generalization beyond discrete mazes through a continuous-space waterway navigation setting, in which NHPM benefits from deeper encoder-decoder architectures and IG-MCTS leverages a dynamically constructed Voronoi-partitioned traversability graph.
HA-VLN 2.0: An Open Benchmark and Leaderboard for Human-Aware Navigation in Discrete and Continuous Environments with Dynamic Multi-Human Interactions
Vision-and-Language Navigation (VLN) has been studied mainly in either discrete or continuous settings, with little attention to dynamic, crowded environments. We present HA-VLN 2.0, a unified benchmark introducing explicit social-awareness constraints. Our contributions are: (i) a standardized task and metrics capturing both goal accuracy and personal-space adherence; (ii) HAPS 2.0 dataset and simulators modeling multi-human interactions, outdoor contexts, and finer language-motion alignment; (iii) benchmarks on 16,844 socially grounded instructions, revealing sharp performance drops of leading agents under human dynamics and partial observability; and (iv) real-world robot experiments validating sim-to-real transfer, with an open leaderboard enabling transparent comparison. Results show that explicit social modeling improves navigation robustness and reduces collisions, underscoring the necessity of human-centric approaches. By releasing datasets, simulators, baselines, and protocols, HA-VLN 2.0 provides a strong foundation for safe, socially responsible navigation research.
comment: 33 pages, 20 figures, website: https://ha-vln-project.vercel.app/
Artists' Views on Robotics Involvement in Painting Productions
As robotic technologies evolve, their potential in artistic creation becomes an increasingly relevant topic of inquiry. This study explores how professional abstract artists perceive and experience co-creative interactions with an autonomous painting robotic arm. Eight artists engaged in six painting sessions -- three with a human partner, followed by three with the robot -- and subsequently participated in semi-structured interviews analyzed through reflexive thematic analysis. Human-human interactions were described as intuitive, dialogic, and emotionally engaging, whereas human-robot sessions felt more playful and reflective, offering greater autonomy and prompting for novel strategies to overcome the system's limitations. This work offers one of the first empirical investigations into artists' lived experiences with a robot, highlighting the value of long-term engagement and a multidisciplinary approach to human-robot co-creation.
comment: 10 pages, 9 figures, submitted to RAM special issue: Arts and Robotics
Robotics
WristWorld: Generating Wrist-Views via 4D World Models for Robotic Manipulation
Wrist-view observations are crucial for VLA models as they capture fine-grained hand-object interactions that directly enhance manipulation performance. Yet large-scale datasets rarely include such recordings, resulting in a substantial gap between abundant anchor views and scarce wrist views. Existing world models cannot bridge this gap, as they require a wrist-view first frame and thus fail to generate wrist-view videos from anchor views alone. Amid this gap, recent visual geometry models such as VGGT emerge with geometric and cross-view priors that make it possible to address extreme viewpoint shifts. Inspired by these insights, we propose WristWorld, the first 4D world model that generates wrist-view videos solely from anchor views. WristWorld operates in two stages: (i) Reconstruction, which extends VGGT and incorporates our Spatial Projection Consistency (SPC) Loss to estimate geometrically consistent wrist-view poses and 4D point clouds; (ii) Generation, which employs our video generation model to synthesize temporally coherent wrist-view videos from the reconstructed perspective. Experiments on Droid, Calvin, and Franka Panda demonstrate state-of-the-art video generation with superior spatial consistency, while also improving VLA performance, raising the average task completion length on Calvin by 3.81% and closing 42.4% of the anchor-wrist view gap.
HyPlan: Hybrid Learning-Assisted Planning Under Uncertainty for Safe Autonomous Driving
We present a novel hybrid learning-assisted planning method, named HyPlan, for solving the collision-free navigation problem for self-driving cars in partially observable traffic environments. HyPlan combines methods for multi-agent behavior prediction, deep reinforcement learning with proximal policy optimization and approximated online POMDP planning with heuristic confidence-based vertical pruning to reduce its execution time without compromising safety of driving. Our experimental performance analysis on the CARLA-CTS2 benchmark of critical traffic scenarios with pedestrians revealed that HyPlan may navigate safer than selected relevant baselines and perform significantly faster than considered alternative online POMDP planners.
COMPAct: Computational Optimization and Automated Modular design of Planetary Actuators
The optimal design of robotic actuators is a critical area of research, yet limited attention has been given to optimizing gearbox parameters and automating actuator CAD. This paper introduces COMPAct: Computational Optimization and Automated Modular Design of Planetary Actuators, a framework that systematically identifies optimal gearbox parameters for a given motor across four gearbox types, single-stage planetary gearbox (SSPG), compound planetary gearbox (CPG), Wolfrom planetary gearbox (WPG), and double-stage planetary gearbox (DSPG). The framework minimizes mass and actuator width while maximizing efficiency, and further automates actuator CAD generation to enable direct 3D printing without manual redesign. Using this framework, optimal gearbox designs are explored over a wide range of gear ratios, providing insights into the suitability of different gearbox types across various gear ratio ranges. In addition, the framework is used to generate CAD models of all four gearbox types with varying gear ratios and motors. Two actuator types are fabricated and experimentally evaluated through power efficiency, no-load backlash, and transmission stiffness tests. Experimental results indicate that the SSPG actuator achieves a mechanical efficiency of 60-80 %, a no-load backlash of 0.59 deg, and a transmission stiffness of 242.7 Nm/rad, while the CPG actuator demonstrates 60 % efficiency, 2.6 deg backlash, and a stiffness of 201.6 Nm/rad. Code available at: https://anonymous.4open.science/r/COMPAct-SubNum-3408 Video: https://youtu.be/99zOKgxsDho
comment: 8 pages, 9 Figures, 2 tables, first two authors contributed equally
TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics
Vision-Language Models (VLMs) have shown remarkable capabilities in spatial reasoning, yet they remain fundamentally limited to qualitative precision and lack the computational precision required for real-world robotics. Current approaches fail to leverage metric cues from depth sensors and camera calibration, instead reducing geometric problems to pattern recognition tasks that cannot deliver the centimeter-level accuracy essential for robotic manipulation. We present TIGeR (Tool-Integrated Geometric Reasoning), a novel framework that transforms VLMs from perceptual estimators to geometric computers by enabling them to generate and execute precise geometric computations through external tools. Rather than attempting to internalize complex geometric operations within neural networks, TIGeR empowers models to recognize geometric reasoning requirements, synthesize appropriate computational code, and invoke specialized libraries for exact calculations. To support this paradigm, we introduce TIGeR-300K, a comprehensive tool-invocation-oriented dataset covering point transformations, pose estimation, trajectory generation, and spatial compatibility verification, complete with tool invocation sequences and intermediate computations. Through a two-stage training pipeline combining supervised fine-tuning (SFT) and reinforcement fine-tuning (RFT) with our proposed hierarchical reward design, TIGeR achieves SOTA performance on geometric reasoning benchmarks while demonstrating centimeter-level precision in real-world robotic manipulation tasks.
comment: 9 pages, 6 figures
A Narwhal-Inspired Sensing-to-Control Framework for Small Fixed-Wing Aircraft
Fixed-wing unmanned aerial vehicles (UAVs) offer endurance and efficiency but lack low-speed agility due to highly coupled dynamics. We present an end-to-end sensing-to-control pipeline that combines bio-inspired hardware, physics-informed dynamics learning, and convex control allocation. Measuring airflow on a small airframe is difficult because near-body aerodynamics, propeller slipstream, control-surface actuation, and ambient gusts distort pressure signals. Inspired by the narwhal's protruding tusk, we mount in-house multi-hole probes far upstream and complement them with sparse, carefully placed wing pressure sensors for local flow measurement. A data-driven calibration maps probe pressures to airspeed and flow angles. We then learn a control-affine dynamics model using the estimated airspeed/angles and sparse sensors. A soft left/right symmetry regularizer improves identifiability under partial observability and limits confounding between wing pressures and flaperon inputs. Desired wrenches (forces and moments) are realized by a regularized least-squares allocator that yields smooth, trimmed actuation. Wind-tunnel studies across a wide operating range show that adding wing pressures reduces force-estimation error by 25-30%, the proposed model degrades less under distribution shift (about 12% versus 44% for an unstructured baseline), and force tracking improves with smoother inputs, including a 27% reduction in normal-force RMSE versus a plain affine model and 34% versus an unstructured baseline.
DPL: Depth-only Perceptive Humanoid Locomotion via Realistic Depth Synthesis and Cross-Attention Terrain Reconstruction
Recent advancements in legged robot perceptive locomotion have shown promising progress. However, terrain-aware humanoid locomotion remains largely constrained to two paradigms: depth image-based end-to-end learning and elevation map-based methods. The former suffers from limited training efficiency and a significant sim-to-real gap in depth perception, while the latter depends heavily on multiple vision sensors and localization systems, resulting in latency and reduced robustness. To overcome these challenges, we propose a novel framework that tightly integrates three key components: (1) Terrain-Aware Locomotion Policy with a Blind Backbone, which leverages pre-trained elevation map-based perception to guide reinforcement learning with minimal visual input; (2) Multi-Modality Cross-Attention Transformer, which reconstructs structured terrain representations from noisy depth images; (3) Realistic Depth Images Synthetic Method, which employs self-occlusion-aware ray casting and noise-aware modeling to synthesize realistic depth observations, achieving over 30\% reduction in terrain reconstruction error. This combination enables efficient policy training with limited data and hardware resources, while preserving critical terrain features essential for generalization. We validate our framework on a full-sized humanoid robot, demonstrating agile and adaptive locomotion across diverse and challenging terrains.
ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL
Real-world robotic agents must act under partial observability and long horizons, where key cues may appear long before they affect decision making. However, most modern approaches rely solely on instantaneous information, without incorporating insights from the past. Standard recurrent or transformer models struggle with retaining and leveraging long-term dependencies: context windows truncate history, while naive memory extensions fail under scale and sparsity. We propose ELMUR (External Layer Memory with Update/Rewrite), a transformer architecture with structured external memory. Each layer maintains memory embeddings, interacts with them via bidirectional cross-attention, and updates them through an Least Recently Used (LRU) memory module using replacement or convex blending. ELMUR extends effective horizons up to 100,000 times beyond the attention window and achieves a 100% success rate on a synthetic T-Maze task with corridors up to one million steps. In POPGym, it outperforms baselines on more than half of the tasks. On MIKASA-Robo sparse-reward manipulation tasks with visual observations, it nearly doubles the performance of strong baselines. These results demonstrate that structured, layer-local external memory offers a simple and scalable approach to decision making under partial observability.
comment: 22 pages, 7 figures
TrackVLA++: Unleashing Reasoning and Memory Capabilities in VLA Models for Embodied Visual Tracking
Embodied Visual Tracking (EVT) is a fundamental ability that underpins practical applications, such as companion robots, guidance robots and service assistants, where continuously following moving targets is essential. Recent advances have enabled language-guided tracking in complex and unstructured scenes. However, existing approaches lack explicit spatial reasoning and effective temporal memory, causing failures under severe occlusions or in the presence of similar-looking distractors. To address these challenges, we present TrackVLA++, a novel Vision-Language-Action (VLA) model that enhances embodied visual tracking with two key modules, a spatial reasoning mechanism and a Target Identification Memory (TIM). The reasoning module introduces a Chain-of-Thought paradigm, termed Polar-CoT, which infers the target's relative position and encodes it as a compact polar-coordinate token for action prediction. Guided by these spatial priors, the TIM employs a gated update strategy to preserve long-horizon target memory, ensuring spatiotemporal consistency and mitigating target loss during extended occlusions. Extensive experiments show that TrackVLA++ achieves state-of-the-art performance on public benchmarks across both egocentric and multi-camera settings. On the challenging EVT-Bench DT split, TrackVLA++ surpasses the previous leading approach by 5.1 and 12, respectively. Furthermore, TrackVLA++ exhibits strong zero-shot generalization, enabling robust real-world tracking in dynamic and occluded scenarios.
comment: Project page: https://pku-epic.github.io/TrackVLA-plus-plus-Web/
A Digital Twin Framework for Metamorphic Testing of Autonomous Driving Systems Using Generative Model
Ensuring the safety of self-driving cars remains a major challenge due to the complexity and unpredictability of real-world driving environments. Traditional testing methods face significant limitations, such as the oracle problem, which makes it difficult to determine whether a system's behavior is correct, and the inability to cover the full range of scenarios an autonomous vehicle may encounter. In this paper, we introduce a digital twin-driven metamorphic testing framework that addresses these challenges by creating a virtual replica of the self-driving system and its operating environment. By combining digital twin technology with AI-based image generative models such as Stable Diffusion, our approach enables the systematic generation of realistic and diverse driving scenes. This includes variations in weather, road topology, and environmental features, all while maintaining the core semantics of the original scenario. The digital twin provides a synchronized simulation environment where changes can be tested in a controlled and repeatable manner. Within this environment, we define three metamorphic relations inspired by real-world traffic rules and vehicle behavior. We validate our framework in the Udacity self-driving simulator and demonstrate that it significantly enhances test coverage and effectiveness. Our method achieves the highest true positive rate (0.719), F1 score (0.689), and precision (0.662) compared to baseline approaches. This paper highlights the value of integrating digital twins with AI-powered scenario generation to create a scalable, automated, and high-fidelity testing solution for autonomous vehicle safety.
Sampling Strategies for Robust Universal Quadrupedal Locomotion Policies
This work focuses on sampling strategies of configuration variations for generating robust universal locomotion policies for quadrupedal robots. We investigate the effects of sampling physical robot parameters and joint proportional-derivative gains to enable training a single reinforcement learning policy that generalizes to multiple parameter configurations. Three fundamental joint gain sampling strategies are compared: parameter sampling with (1) linear and polynomial function mappings of mass-to-gains, (2) performance-based adaptive filtering, and (3) uniform random sampling. We improve the robustness of the policy by biasing the configurations using nominal priors and reference models. All training was conducted on RaiSim, tested in simulation on a range of diverse quadrupeds, and zero-shot deployed onto hardware using the ANYmal quadruped robot. Compared to multiple baseline implementations, our results demonstrate the need for significant joint controller gains randomization for robust closing of the sim-to-real gap.
Generative World Modelling for Humanoids: 1X World Model Challenge Technical Report
World models are a powerful paradigm in AI and robotics, enabling agents to reason about the future by predicting visual observations or compact latent states. The 1X World Model Challenge introduces an open-source benchmark of real-world humanoid interaction, with two complementary tracks: sampling, focused on forecasting future image frames, and compression, focused on predicting future discrete latent codes. For the sampling track, we adapt the video generation foundation model Wan-2.2 TI2V-5B to video-state-conditioned future frame prediction. We condition the video generation on robot states using AdaLN-Zero, and further post-train the model using LoRA. For the compression track, we train a Spatio-Temporal Transformer model from scratch. Our models achieve 23.0 dB PSNR in the sampling task and a Top-500 CE of 6.6386 in the compression task, securing 1st place in both challenges.
comment: 6 pages, 3 figures, 1X world model challenge technical report
Vision-Language-Action Models for Robotics: A Review Towards Real-World Applications
Amid growing efforts to leverage advances in large language models (LLMs) and vision-language models (VLMs) for robotics, Vision-Language-Action (VLA) models have recently gained significant attention. By unifying vision, language, and action data at scale, which have traditionally been studied separately, VLA models aim to learn policies that generalise across diverse tasks, objects, embodiments, and environments. This generalisation capability is expected to enable robots to solve novel downstream tasks with minimal or no additional task-specific data, facilitating more flexible and scalable real-world deployment. Unlike previous surveys that focus narrowly on action representations or high-level model architectures, this work offers a comprehensive, full-stack review, integrating both software and hardware components of VLA systems. In particular, this paper provides a systematic review of VLAs, covering their strategy and architectural transition, architectures and building blocks, modality-specific processing techniques, and learning paradigms. In addition, to support the deployment of VLAs in real-world robotic applications, we also review commonly used robot platforms, data collection strategies, publicly available datasets, data augmentation methods, and evaluation benchmarks. Throughout this comprehensive survey, this paper aims to offer practical guidance for the robotics community in applying VLAs to real-world robotic systems. All references categorized by training approach, evaluation method, modality, and dataset are available in the table on our project website: https://vla-survey.github.io .
comment: Accepted to IEEE Access, website: https://vla-survey.github.io
Bring the Apple, Not the Sofa: Impact of Irrelevant Context in Embodied AI Commands on VLA Models
Vision Language Action (VLA) models are widely used in Embodied AI, enabling robots to interpret and execute language instructions. However, their robustness to natural language variability in real-world scenarios has not been thoroughly investigated. In this work, we present a novel systematic study of the robustness of state-of-the-art VLA models under linguistic perturbations. Specifically, we evaluate model performance under two types of instruction noise: (1) human-generated paraphrasing and (2) the addition of irrelevant context. We further categorize irrelevant contexts into two groups according to their length and their semantic and lexical proximity to robot commands. In this study, we observe consistent performance degradation as context size expands. We also demonstrate that the model can exhibit relative robustness to random context, with a performance drop within 10%, while semantically and lexically similar context of the same length can trigger a quality decline of around 50%. Human paraphrases of instructions lead to a drop of nearly 20%. To mitigate this, we propose an LLM-based filtering framework that extracts core commands from noisy inputs. Incorporating our filtering step allows models to recover up to 98.5% of their original performance under noisy conditions.
Artists' Views on Robotics Involvement in Painting Productions
As robotic technologies evolve, their potential in artistic creation becomes an increasingly relevant topic of inquiry. This study explores how professional abstract artists perceive and experience co-creative interactions with an autonomous painting robotic arm. Eight artists engaged in six painting sessions -- three with a human partner, followed by three with the robot -- and subsequently participated in semi-structured interviews analyzed through reflexive thematic analysis. Human-human interactions were described as intuitive, dialogic, and emotionally engaging, whereas human-robot sessions felt more playful and reflective, offering greater autonomy and prompting for novel strategies to overcome the system's limitations. This work offers one of the first empirical investigations into artists' lived experiences with a robot, highlighting the value of long-term engagement and a multidisciplinary approach to human-robot co-creation.
comment: 10 pages, 9 figures, submitted to RAM special issue: Arts and Robotics
Introspection in Learned Semantic Scene Graph Localisation IROS 2025
This work investigates how semantics influence localisation performance and robustness in a learned self-supervised, contrastive semantic localisation framework. After training a localisation network on both original and perturbed maps, we conduct a thorough post-hoc introspection analysis to probe whether the model filters environmental noise and prioritises distinctive landmarks over routine clutter. We validate various interpretability methods and present a comparative reliability analysis. Integrated gradients and Attention Weights consistently emerge as the most reliable probes of learned behaviour. A semantic class ablation further reveals an implicit weighting in which frequent objects are often down-weighted. Overall, the results indicate that the model learns noise-robust, semantically salient relations about place definition, thereby enabling explainable registration under challenging visual and structural variations.
comment: IEEE IROS 2025 Workshop FAST
Diffusing Trajectory Optimization Problems for Recovery During Multi-Finger Manipulation
Multi-fingered hands are emerging as powerful platforms for performing fine manipulation tasks, including tool use. However, environmental perturbations or execution errors can impede task performance, motivating the use of recovery behaviors that enable normal task execution to resume. In this work, we take advantage of recent advances in diffusion models to construct a framework that autonomously identifies when recovery is necessary and optimizes contact-rich trajectories to recover. We use a diffusion model trained on the task to estimate when states are not conducive to task execution, framed as an out-of-distribution detection problem. We then use diffusion sampling to project these states in-distribution and use trajectory optimization to plan contact-rich recovery trajectories. We also propose a novel diffusion-based approach that distills this process to efficiently diffuse the full parameterization, including constraints, goal state, and initialization, of the recovery trajectory optimization problem, saving time during online execution. We compare our method to a reinforcement learning baseline and other methods that do not explicitly plan contact interactions, including on a hardware screwdriver-turning task where we show that recovering using our method improves task performance by 96% and that ours is the only method evaluated that can attempt recovery without causing catastrophic task failure. Videos can be found at https://dtourrecovery.github.io/.
Temporal-Prior-Guided View Planning for Periodic 3D Plant Reconstruction IROS 2025
Periodic 3D reconstruction is essential for crop monitoring, but costly when each cycle restarts from scratch, wasting resources and ignoring information from previous captures. We propose temporal-prior-guided view planning for periodic plant reconstruction, in which a previously reconstructed model of the same plant is non-rigidly aligned to a new partial observation to form an approximation of the current geometry. To accommodate plant growth, we inflate this approximation and solve a set covering optimization problem to compute a minimal set of views. We integrated this method into a complete pipeline that acquires one additional next-best view before registration for robustness and then plans a globally shortest path to connect the planned set of views and outputs the best view sequence. Experiments on maize and tomato under hemisphere and sphere view spaces show that our system maintains or improves surface coverage while requiring fewer views and comparable movement cost compared to state-of-the-art baselines.
comment: Accepted to the Active Perception Workshop at IROS 2025
Tailoring materials into kirigami robots
Kirigami, the traditional paper-cutting craft, holds immense potential for revolutionizing robotics by providing multifunctional, lightweight, and adaptable solutions. Kirigami structures, characterized by their bending-dominated deformation, offer resilience to tensile forces and facilitate shape morphing under small actuation forces. Kirigami components such as actuators, sensors, batteries, controllers, and body structures can be tailored to specific robotic applications by optimizing cut patterns. Actuators based on kirigami principles exhibit complex motions programmable through various energy sources, while kirigami sensors bridge the gap between electrical conductivity and compliance. Kirigami-integrated batteries enable energy storage directly within robot structures, enhancing flexibility and compactness. Kirigami-controlled mechanisms mimic mechanical computations, enabling advanced functionalities such as shape morphing and memory functions. Applications of kirigami-enabled robots include grasping, locomotion, and wearables, showcasing their adaptability to diverse environments and tasks. Despite promising opportunities, challenges remain in the design of cut patterns for a given function and streamlining fabrication techniques.
DecompGAIL: Learning Realistic Traffic Behaviors with Decomposed Multi-Agent Generative Adversarial Imitation Learning
Realistic traffic simulation is critical for the development of autonomous driving systems and urban mobility planning, yet existing imitation learning approaches often fail to model realistic traffic behaviors. Behavior cloning suffers from covariate shift, while Generative Adversarial Imitation Learning (GAIL) is notoriously unstable in multi-agent settings. We identify a key source of this instability: irrelevant interaction misguidance, where a discriminator penalizes an ego vehicle's realistic behavior due to unrealistic interactions among its neighbors. To address this, we propose Decomposed Multi-agent GAIL (DecompGAIL), which explicitly decomposes realism into ego-map and ego-neighbor components, filtering out misleading neighbor: neighbor and neighbor: map interactions. We further introduce a social PPO objective that augments ego rewards with distance-weighted neighborhood rewards, encouraging overall realism across agents. Integrated into a lightweight SMART-based backbone, DecompGAIL achieves state-of-the-art performance on the WOMD Sim Agents 2025 benchmark.
HARP-NeXt: High-Speed and Accurate Range-Point Fusion Network for 3D LiDAR Semantic Segmentation IROS 2025
LiDAR semantic segmentation is crucial for autonomous vehicles and mobile robots, requiring high accuracy and real-time processing, especially on resource-constrained embedded systems. Previous state-of-the-art methods often face a trade-off between accuracy and speed. Point-based and sparse convolution-based methods are accurate but slow due to the complexity of neighbor searching and 3D convolutions. Projection-based methods are faster but lose critical geometric information during the 2D projection. Additionally, many recent methods rely on test-time augmentation (TTA) to improve performance, which further slows the inference. Moreover, the pre-processing phase across all methods increases execution time and is demanding on embedded platforms. Therefore, we introduce HARP-NeXt, a high-speed and accurate LiDAR semantic segmentation network. We first propose a novel pre-processing methodology that significantly reduces computational overhead. Then, we design the Conv-SE-NeXt feature extraction block to efficiently capture representations without deep layer stacking per network stage. We also employ a multi-scale range-point fusion backbone that leverages information at multiple abstraction levels to preserve essential geometric details, thereby enhancing accuracy. Experiments on the nuScenes and SemanticKITTI benchmarks show that HARP-NeXt achieves a superior speed-accuracy trade-off compared to all state-of-the-art methods, and, without relying on ensemble models or TTA, is comparable to the top-ranked PTv3, while running 24$\times$ faster. The code is available at https://github.com/SamirAbouHaidar/HARP-NeXt
comment: Accepted at IROS 2025 (IEEE/RSJ International Conference on Intelligent Robots and Systems)
Distributed 3D Source Seeking via SO(3) Geometric Control of Robot Swarms
This paper presents a geometric control framework on the Lie group SO(3) for 3D source-seeking by robots with first-order attitude dynamics and constant translational speed. By working directly on SO(3), the approach avoids Euler-angle singularities and quaternion ambiguities, providing a unique, intrinsic representation of orientation. We design a proportional feed-forward controller that ensures exponential alignment of each agent to an estimated ascending direction toward a 3D scalar field source. The controller adapts to bounded unknown variations and preserves well-posed swarm formations. Numerical simulations demonstrate the effectiveness of the method, with all code provided open source for reproducibility.
comment: 7 pages, 3 figures. Submitted for presentation at the IFAC World Congress 2026
UniFField: A Generalizable Unified Neural Feature Field for Visual, Semantic, and Spatial Uncertainties in Any Scene
Comprehensive visual, geometric, and semantic understanding of a 3D scene is crucial for successful execution of robotic tasks, especially in unstructured and complex environments. Additionally, to make robust decisions, it is necessary for the robot to evaluate the reliability of perceived information. While recent advances in 3D neural feature fields have enabled robots to leverage features from pretrained foundation models for tasks such as language-guided manipulation and navigation, existing methods suffer from two critical limitations: (i) they are typically scene-specific, and (ii) they lack the ability to model uncertainty in their predictions. We present UniFField, a unified uncertainty-aware neural feature field that combines visual, semantic, and geometric features in a single generalizable representation while also predicting uncertainty in each modality. Our approach, which can be applied zero shot to any new environment, incrementally integrates RGB-D images into our voxel-based feature representation as the robot explores the scene, simultaneously updating uncertainty estimation. We evaluate our uncertainty estimations to accurately describe the model prediction errors in scene reconstruction and semantic feature prediction. Furthermore, we successfully leverage our feature predictions and their respective uncertainty for an active object search task using a mobile manipulator robot, demonstrating the capability for robust decision-making.
comment: Project website: https://sites.google.com/view/uniffield
SanDRA: Safe Large-Language-Model-Based Decision Making for Automated Vehicles Using Reachability Analysis
Large language models have been widely applied to knowledge-driven decision-making for automated vehicles due to their strong generalization and reasoning capabilities. However, the safety of the resulting decisions cannot be ensured due to possible hallucinations and the lack of integrated vehicle dynamics. To address this issue, we propose SanDRA, the first safe large-language-model-based decision making framework for automated vehicles using reachability analysis. Our approach starts with a comprehensive description of the driving scenario to prompt large language models to generate and rank feasible driving actions. These actions are translated into temporal logic formulas that incorporate formalized traffic rules, and are subsequently integrated into reachability analysis to eliminate unsafe actions. We validate our approach in both open-loop and closed-loop driving environments using off-the-shelf and finetuned large language models, showing that it can provide provably safe and, where possible, legally compliant driving actions, even under high-density traffic conditions. To ensure transparency and facilitate future research, all code and experimental setups are publicly available at github.com/CommonRoad/SanDRA.
comment: @2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training
Recent progress in vision and language foundation models has significantly advanced multimodal understanding, reasoning, and generation, inspiring a surge of interest in extending such capabilities to embodied settings through vision-language-action (VLA) models. Yet, most VLA models are still trained with supervised fine-tuning (SFT), which struggles to generalize under distribution shifts due to error accumulation. Reinforcement learning (RL) offers a promising alternative by directly optimizing task performance through interaction, but existing attempts remain fragmented and lack a unified platform for fair and systematic comparison across model architectures and algorithmic designs. To address this gap, we introduce RLinf-VLA, a unified and efficient framework for scalable RL training of VLA models. The system adopts a highly flexible resource allocation design that addresses the challenge of integrating rendering, training, and inference in RL+VLA training. In particular, for GPU-parallelized simulators, RLinf-VLA implements a novel hybrid fine-grained pipeline allocation mode, achieving a 1.61x-1.88x speedup in training. Through a unified interface, RLinf-VLA seamlessly supports diverse VLA architectures (e.g., OpenVLA, OpenVLA-OFT), multiple RL algorithms (e.g., PPO, GRPO), and various simulators (e.g., ManiSkill, LIBERO). In simulation, a unified model achieves 98.11\% across 130 LIBERO tasks and 97.66\% across 25 ManiSkill tasks. Beyond empirical performance, our study distills a set of best practices for applying RL to VLA training and sheds light on emerging patterns in this integration. Furthermore, we present preliminary deployment on a real-world Franka robot, where RL-trained policies exhibit stronger generalization than those trained with SFT. We envision RLinf-VLA as a foundation to accelerate and standardize research on embodied intelligence.
comment: This is the technical report of the RLinf Team, focusing on the algorithm side. For the system-level design, please refer to arXiv:2509.15965. The open-sourced code link: https://github.com/RLinf/RLinf
Assist-As-Needed: Adaptive Multimodal Robotic Assistance for Medication Management in Dementia Care
People living with dementia (PLWDs) face progressively declining abilities in medication management-from simple forgetfulness to complete task breakdown-yet most assistive technologies fail to adapt to these changing needs. This one-size-fits-all approach undermines autonomy, accelerates dependence, and increases caregiver burden. Occupational therapy principles emphasize matching assistance levels to individual capabilities: minimal reminders for those who merely forget, spatial guidance for those who misplace items, and comprehensive multimodal support for those requiring step-by-step instruction. However, existing robotic systems lack this adaptive, graduated response framework essential for maintaining PLWD independence. We present an adaptive multimodal robotic framework using the Pepper robot that dynamically adjusts assistance based on real-time assessment of user needs. Our system implements a hierarchical intervention model progressing from (1) simple verbal reminders, to (2) verbal + gestural cues, to (3) full multimodal guidance combining physical navigation to medication locations with step-by-step verbal and gestural instructions. Powered by LLM-driven interaction strategies and multimodal sensing, the system continuously evaluates task states to provide just-enough assistance-preserving autonomy while ensuring medication adherence. We conducted a preliminary study with healthy adults and dementia care stakeholders in a controlled lab setting, evaluating the system's usability, comprehensibility, and appropriateness of adaptive feedback mechanisms. This work contributes: (1) a theoretically grounded adaptive assistance framework translating occupational therapy principles into HRI design, (2) a multimodal robotic implementation that preserves PLWD dignity through graduated support, and (3) empirical insights into stakeholder perceptions of adaptive robotic care.
Through the Perspective of LiDAR: A Feature-Enriched and Uncertainty-Aware Annotation Pipeline for Terrestrial Point Cloud Segmentation
Accurate semantic segmentation of terrestrial laser scanning (TLS) point clouds is limited by costly manual annotation. We propose a semi-automated, uncertainty-aware pipeline that integrates spherical projection, feature enrichment, ensemble learning, and targeted annotation to reduce labeling effort, while sustaining high accuracy. Our approach projects 3D points to a 2D spherical grid, enriches pixels with multi-source features, and trains an ensemble of segmentation networks to produce pseudo-labels and uncertainty maps, the latter guiding annotation of ambiguous regions. The 2D outputs are back-projected to 3D, yielding densely annotated point clouds supported by a three-tier visualization suite (2D feature maps, 3D colorized point clouds, and compact virtual spheres) for rapid triage and reviewer guidance. Using this pipeline, we build Mangrove3D, a semantic segmentation TLS dataset for mangrove forests. We further evaluate data efficiency and feature importance to address two key questions: (1) how much annotated data are needed and (2) which features matter most. Results show that performance saturates after ~12 annotated scans, geometric features contribute the most, and compact nine-channel stacks capture nearly all discriminative power, with the mean Intersection over Union (mIoU) plateauing at around 0.76. Finally, we confirm the generalization of our feature-enrichment strategy through cross-dataset tests on ForestSemantic and Semantic3D. Our contributions include: (i) a robust, uncertainty-aware TLS annotation pipeline with visualization tools; (ii) the Mangrove3D dataset; and (iii) empirical guidance on data efficiency and feature importance, thus enabling scalable, high-quality segmentation of TLS point clouds for ecological monitoring and beyond. The dataset and processing scripts are publicly available at https://fz-rit.github.io/through-the-lidars-eye/.
Safe Obstacle-Free Guidance of Space Manipulators in Debris Removal Missions via Deep Reinforcement Learning
The objective of this study is to develop a model-free workspace trajectory planner for space manipulators using a Twin Delayed Deep Deterministic Policy Gradient (TD3) agent to enable safe and reliable debris capture. A local control strategy with singularity avoidance and manipulability enhancement is employed to ensure stable execution. The manipulator must simultaneously track a capture point on a non-cooperative target, avoid self-collisions, and prevent unintended contact with the target. To address these challenges, we propose a curriculum-based multi-critic network where one critic emphasizes accurate tracking and the other enforces collision avoidance. A prioritized experience replay buffer is also used to accelerate convergence and improve policy robustness. The framework is evaluated on a simulated seven-degree-of-freedom KUKA LBR iiwa mounted on a free-floating base in Matlab/Simulink, demonstrating safe and adaptive trajectory generation for debris removal missions.
RAISE: A self-driving laboratory for interfacial property formulation discovery
Surface wettability is a critical design parameter for biomedical devices, coatings, and textiles. Contact angle measurements quantify liquid-surface interactions, which depend strongly on liquid formulation. Herein, we present the Robotic Autonomous Imaging Surface Evaluator (RAISE), a closed-loop, self-driving laboratory that is capable of linking liquid formulation optimization with surface wettability assessment. RAISE comprises a full experimental orchestrator with the ability of mixing liquid ingredients to create varying formulation cocktails, transferring droplets of prepared formulations to a high-throughput stage, and using a pick-and-place camera tool for automated droplet image capture. The system also includes an automated image processing pipeline to measure contact angles. This closed loop experiment orchestrator is integrated with a Bayesian Optimization (BO) client, which enables iterative exploration of new formulations based on previous contact angle measurements to meet user-defined objectives. The system operates in a high-throughput manner and can achieve a measurement rate of approximately 1 contact angle measurement per minute. Here we demonstrate RAISE can be used to explore surfactant wettability and how surfactant combinations create tunable formulations that compensate for purity-related variations. Furthermore, multi-objective BO demonstrates how precise and optimal formulations can be reached based on application-specific goals. The optimization is guided by a desirability score, which prioritizes formulations that are within target contact angle ranges, minimize surfactant usage and reduce cost. This work demonstrates the capabilities of RAISE to autonomously link liquid formulations to contact angle measurements in a closed-loop system, using multi-objective BO to efficiently identify optimal formulations aligned with researcher-defined criteria.
comment: Mohammad Nazeri, Sheldon Mei, and Jeffrey Watchorn contributed equally to this work. *Corresponding author: Frank Gu (f.gu@utoronto.ca)
GATO: GPU-Accelerated and Batched Trajectory Optimization for Scalable Edge Model Predictive Control
While Model Predictive Control (MPC) delivers strong performance across robotics applications, solving the underlying (batches of) nonlinear trajectory optimization (TO) problems online remains computationally demanding. Existing GPU-accelerated approaches typically (i) parallelize a single solve to meet real-time deadlines, (ii) scale to very large batches at slower-than-real-time rates, or (iii) achieve speed by restricting model generality (e.g., point-mass dynamics or a single linearization). This leaves a large gap in solver performance for many state-of-the-art MPC applications that require real-time batches of tens to low-hundreds of solves. As such, we present GATO, an open source, GPU-accelerated, batched TO solver co-designed across algorithm, software, and computational hardware to deliver real-time throughput for these moderate batch size regimes. Our approach leverages a combination of block-, warp-, and thread-level parallelism within and across solves for ultra-high performance. We demonstrate the effectiveness of our approach through a combination of: simulated benchmarks showing speedups of 18-21x over CPU baselines and 1.4-16x over GPU baselines as batch size increases; case studies highlighting improved disturbance rejection and convergence behavior; and finally a validation on hardware using an industrial manipulator. We open source GATO to support reproducibility and adoption.
Inspection Planning Primitives with Implicit Models
The aging and increasing complexity of infrastructures make efficient inspection planning more critical in ensuring safety. Thanks to sampling-based motion planning, many inspection planners are fast. However, they often require huge memory. This is particularly true when the structure under inspection is large and complex, consisting of many struts and pillars of various geometry and sizes. Such structures can be represented efficiently using implicit models, such as neural Signed Distance Functions (SDFs). However, most primitive computations used in sampling-based inspection planner have been designed to work efficiently with explicit environment models, which in turn requires the planner to use explicit environment models or performs frequent transformations between implicit and explicit environment models during planning. This paper proposes a set of primitive computations, called Inspection Planning Primitives with Implicit Models (IPIM), that enable sampling-based inspection planners to entirely use neural SDFs representation during planning. Evaluation on three scenarios, including inspection of a complex real-world structure with over 92M triangular mesh faces, indicates that even a rudimentary sampling-based planner with IPIM can generate inspection trajectories of similar quality to those generated by the state-of-the-art planner, while using up to 70x less memory than the state-of-the-art inspection planner.
IGUANA: Immersive Guidance, Navigation, and Control for Consumer UAV
As the markets for unmanned aerial vehicles (UAVs) and mixed reality (MR) headsets continue to grow, recent research has increasingly explored their integration, which enables more intuitive, immersive, and situationally aware control systems. We present IGUANA, an MR-based immersive guidance, navigation, and control system for consumer UAVs. IGUANA introduces three key elements beyond conventional control interfaces: (1) a 3D terrain map interface with draggable waypoint markers and live camera preview for high-level control, (2) a novel spatial control metaphor that uses a virtual ball as a physical analogy for low-level control, and (3) a spatial overlay that helps track the UAV when it is not visible with the naked eye or visual line of sight is interrupted. We conducted a user study to evaluate our design, both quantitatively and qualitatively, and found that (1) the 3D map interface is intuitive and easy to use, relieving users from manual control and suggesting improved accuracy and consistency with lower perceived workload relative to conventional dual-stick controller, (2) the virtual ball interface is intuitive but limited by the lack of physical feedback, and (3) the spatial overlay is very useful in enhancing the users' situational awareness.
comment: This is the author's version of the work. The definitive Version of Record was published in 31st ACM Symposium on Virtual Reality Software and Technology (VRST '25)
AVO: Amortized Value Optimization for Contact Mode Switching in Multi-Finger Manipulation
Dexterous manipulation tasks often require switching between different contact modes, such as rolling, sliding, sticking, or non-contact contact modes. When formulating dexterous manipulation tasks as a trajectory optimization problem, a common approach is to decompose these tasks into sub-tasks for each contact mode, which are each solved independently. Optimizing each sub-task independently can limit performance, as optimizing contact points, contact forces, or other variables without information about future sub-tasks can place the system in a state from which it is challenging to make progress on subsequent sub-tasks. Further, optimizing these sub-tasks is very computationally expensive. To address these challenges, we propose Amortized Value Optimization (AVO), which introduces a learned value function that predicts the total future task performance. By incorporating this value function into the cost of the trajectory optimization at each planning step, the value function gradients guide the optimizer toward states that minimize the cost in future sub-tasks. This effectively bridges separately optimized sub-tasks, and accelerates the optimization by reducing the amount of online computation needed. We validate AVO on a screwdriver grasping and turning task in both simulation and real world experiments, and show improved performance even with 50% less computational budget compared to trajectory optimization without the value function.
HJCD-IK: GPU-Accelerated Inverse Kinematics through Batched Hybrid Jacobian Coordinate Descent
Inverse Kinematics (IK) is a core problem in robotics, in which joint configurations are found to achieve a desired end-effector pose. Although analytical solvers are fast and efficient, they are limited to systems with low degrees-of-freedom and specific topological structures. Numerical optimization-based approaches are more general, but suffer from high computational costs and frequent convergence to spurious local minima. Recent efforts have explored the use of GPUs to combine sampling and optimization to enhance both the accuracy and speed of IK solvers. We build on this recent literature and introduce HJCD-IK, a GPU-accelerated, sampling-based hybrid solver that combines an orientation-aware greedy coordinate descent initialization scheme with a Jacobian-based polishing routine. This design enables our solver to improve both convergence speed and overall accuracy as compared to the state-of-the-art, consistently finding solutions along the accuracy-latency Pareto frontier and often achieving order-of-magnitude gains. In addition, our method produces a broad distribution of high-quality samples, yielding the lowest maximum mean discrepancy. We release our code open-source for the benefit of the community.
VeMo: A Lightweight Data-Driven Approach to Model Vehicle Dynamics
Developing a dynamic model for a high-performance vehicle is a complex problem that requires extensive structural information about the system under analysis. This information is often unavailable to those who did not design the vehicle and represents a typical issue in autonomous driving applications, which are frequently developed on top of existing vehicles; therefore, vehicle models are developed under conditions of information scarcity. This paper proposes a lightweight encoder-decoder model based on Gate Recurrent Unit layers to correlate the vehicle's future state with its past states, measured onboard, and control actions the driver performs. The results demonstrate that the model achieves a maximum mean relative error below 2.6% in extreme dynamic conditions. It also shows good robustness when subject to noisy input data across the interested frequency components. Furthermore, being entirely data-driven and free from physical constraints, the model exhibits physical consistency in the output signals, such as longitudinal and lateral accelerations, yaw rate, and the vehicle's longitudinal velocity.
A Rotation-Invariant Embedded Platform for (Neural) Cellular Automata
This paper presents a rotation-invariant embedded platform for simulating (neural) cellular automata (NCA) in modular robotic systems. Inspired by previous work on physical NCA, we introduce key innovations that overcome limitations in prior hardware designs. Our platform features a symmetric, modular structure, enabling seamless connections between cells regardless of orientation. Additionally, each cell is battery-powered, allowing it to operate independently and retain its state even when disconnected from the collective. To demonstrate the platform's applicability, we present a novel rotation-invariant NCA model for isotropic shape classification. The proposed system provides a robust foundation for exploring the physical realization of NCA, with potential applications in distributed robotic systems and self-organizing structures. Our implementation, including hardware, software code, a simulator, and a video, is openly shared at: https://github.com/dwoiwode/embedded_nca
comment: Accepted for ALIFE 2025
FLEET: Formal Language-Grounded Scheduling for Heterogeneous Robot Teams
Coordinating heterogeneous robot teams from free-form natural-language instructions is hard. Language-only planners struggle with long-horizon coordination and hallucination, while purely formal methods require closed-world models. We present FLEET, a hybrid decentralized framework that turns language into optimized multi-robot schedules. An LLM front-end produces (i) a task graph with durations and precedence and (ii) a capability-aware robot--task fitness matrix; a formal back-end solves a makespan-minimization problem while the underlying robots execute their free-form subtasks with agentic closed-loop control. Across multiple free-form language-guided autonomy coordination benchmarks, FLEET improves success over state of the art generative planners on two-agent teams across heterogeneous tasks. Ablations show that mixed integer linear programming (MILP) primarily improves temporal structure, while LLM-derived fitness is decisive for capability-coupled tasks; together they deliver the highest overall performance. We demonstrate the translation to real world challenges with hardware trials using a pair of quadruped robots with disjoint capabilities.
ResMimic: From General Motion Tracking to Humanoid Whole-body Loco-Manipulation via Residual Learning
Humanoid whole-body loco-manipulation promises transformative capabilities for daily service and warehouse tasks. While recent advances in general motion tracking (GMT) have enabled humanoids to reproduce diverse human motions, these policies lack the precision and object awareness required for loco-manipulation. To this end, we introduce ResMimic, a two-stage residual learning framework for precise and expressive humanoid control from human motion data. First, a GMT policy, trained on large-scale human-only motion, serves as a task-agnostic base for generating human-like whole-body movements. An efficient but precise residual policy is then learned to refine the GMT outputs to improve locomotion and incorporate object interaction. To further facilitate efficient training, we design (i) a point-cloud-based object tracking reward for smoother optimization, (ii) a contact reward that encourages accurate humanoid body-object interactions, and (iii) a curriculum-based virtual object controller to stabilize early training. We evaluate ResMimic in both simulation and on a real Unitree G1 humanoid. Results show substantial gains in task success, training efficiency, and robustness over strong baselines. Videos are available at https://resmimic.github.io/ .
comment: 9 pages, 8 figures
Online Hybrid-Belief POMDP with Coupled Semantic-Geometric Models
Robots operating in complex and unknown environments frequently require geometric-semantic representations of the environment to safely perform their tasks. While inferring the environment, they must account for many possible scenarios when planning future actions. Since objects' class types are discrete and the robot's self-pose and the objects' poses are continuous, the environment can be represented by a hybrid discrete-continuous belief which is updated according to models and incoming data. Prior probabilities and observation models representing the environment can be learned from data using deep learning algorithms. Such models often couple environmental semantic and geometric properties. As a result, semantic variables are interconnected, causing semantic state space dimensionality to increase exponentially. In this paper, we consider planning under uncertainty using partially observable Markov decision processes (POMDPs) with hybrid semantic-geometric beliefs. The models and priors consider the coupling between semantic and geometric variables. Within POMDP, we introduce the concept of semantically aware safety. Obtaining representative samples of the theoretical hybrid belief, required for estimating the value function, is very challenging. As a key contribution, we develop a novel form of the hybrid belief and leverage it to sample representative samples. We show that under certain conditions, the value function and probability of safety can be calculated efficiently with an explicit expectation over all possible semantic mappings. Our simulations show that our estimates of the objective function and probability of safety achieve similar levels of accuracy compared to estimators that run exhaustively on the entire semantic state-space using samples from the theoretical hybrid belief. Nevertheless, the complexity of our estimators is polynomial rather than exponential.
comment: 20 pages, 9 figures
BIM-Constrained Optimization for Accurate Localization and Deviation Correction in Construction Monitoring
Augmented reality (AR) applications for construction monitoring rely on real-time environmental tracking to visualize architectural elements. However, construction sites present significant challenges for traditional tracking methods due to featureless surfaces, dynamic changes, and drift accumulation, leading to misalignment between digital models and the physical world. This paper proposes a BIM-aware drift correction method to address these challenges. Instead of relying solely on SLAM-based localization, we align ``as-built" detected planes from the real-world environment with ``as-planned" architectural planes in BIM. Our method performs robust plane matching and computes a transformation (TF) between SLAM (S) and BIM (B) origin frames using optimization techniques, minimizing drift over time. By incorporating BIM as prior structural knowledge, we can achieve improved long-term localization and enhanced AR visualization accuracy in noisy construction environments. The method is evaluated through real-world experiments, showing significant reductions in drift-induced errors and optimized alignment consistency. On average, our system achieves a reduction of 52.24% in angular deviations and a reduction of 60.8% in the distance error of the matched walls compared to the initial manual alignment by the user.
M^3RS: Multi-robot, Multi-objective, and Multi-mode Routing and Scheduling
Task execution quality significantly impacts multi-robot missions, yet existing task allocation frameworks rarely consider quality of service as a decision variable, despite its importance in applications like robotic disinfection and cleaning. We introduce the multi-robot, multi-objective, and multi-mode routing and scheduling (M3RS) problem, designed for time-constrained missions. In M3RS, each task offers multiple execution modes with varying resource needs, durations, and quality levels, allowing trade-offs across mission objectives. M3RS is modeled as a mixed-integer linear programming (MIP) problem and optimizes task sequencing and execution modes for each agent. We apply M3RS to multi-robot disinfection in healthcare and public spaces, optimizing disinfection quality and task completion rates. Through synthetic case studies, M3RS demonstrates 3-46$\%$ performance improvements over the standard task allocation method across various metrics. Further, to improve compute time, we propose a clustering-based column generation algorithm that achieves solutions comparable to or better than the baseline MIP solver while reducing computation time by 60$\%$. We also conduct case studies with simulated and real robots. Experimental videos are available on the project page: \href{https://sites.google.com/view/g-robot/m3rs/}{https://sites.google.com/view/g-robot/m3rs/}.
comment: Under review
BIM Informed Visual SLAM for Construction Monitoring
Simultaneous Localization and Mapping (SLAM) is a key tool for monitoring construction sites, where aligning the evolving as-built state with the as-planned design enables early error detection and reduces costly rework. LiDAR-based SLAM achieves high geometric precision, but its sensors are typically large and power-demanding, limiting their use on portable platforms. Visual SLAM offers a practical alternative with lightweight cameras already embedded in most mobile devices. however, visually mapping construction environments remains challenging: repetitive layouts, occlusions, and incomplete or low-texture structures often cause drift in the trajectory map. To mitigate this, we propose an RGB-D SLAM system that incorporates the Building Information Model (BIM) as structural prior knowledge. Instead of relying solely on visual cues, our system continuously establishes correspondences between detected wall and their BIM counterparts, which are then introduced as constraints in the back-end optimization. The proposed method operates in real time and has been validated on real construction sites, reducing trajectory error by an average of 23.71% and map RMSE by 7.14% compared to visual SLAM baselines. These results demonstrate that BIM constraints enable reliable alignment of the digital plan with the as-built scene, even under partially constructed conditions.
comment: 8 pages, 5 tables, 4 figures
Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions
The rise of foundation models paves the way for generalist robot policies in the physical world. Existing methods relying on text-only instructions often struggle to generalize to unseen scenarios. We argue that interleaved image-text inputs offer richer and less biased context and enable robots to better handle unseen tasks with more versatile human-robot interaction. Building on this insight, Interleave-VLA, the first robot learning paradigm capable of comprehending interleaved image-text instructions and directly generating continuous action sequences in the physical world, is introduced. It offers a natural, flexible, and model-agnostic paradigm that extends state-of-the-art vision-language-action (VLA) models with minimal modifications while achieving strong zero-shot generalization. Interleave-VLA also includes an automatic pipeline that converts text instructions from Open X-Embodiment into interleaved image-text instructions, resulting in a large-scale real-world interleaved embodied dataset with 210k episodes. Comprehensive evaluation in simulation and the real world shows that Interleave-VLA offers two major benefits: (1) improves out-of-domain generalization to unseen objects by 2x compared to text input baselines, (2) supports flexible task interfaces and diverse instructions in a zero-shot manner, such as hand-drawn sketches. We attribute Interleave-VLA's strong zero-shot capability to the use of instruction images, which effectively mitigate hallucinations, and the inclusion of heterogeneous multimodal datasets, enriched with Internet-sourced images, offering potential for scalability. More information is available at https://interleave-vla.github.io/Interleave-VLA-Anonymous/
Diffusion Trajectory-guided Policy for Long-horizon Robot Manipulation
Recently, Vision-Language-Action models (VLA) have advanced robot imitation learning, but high data collection costs and limited demonstrations hinder generalization and current imitation learning methods struggle in out-of-distribution scenarios, especially for long-horizon tasks. A key challenge is how to mitigate compounding errors in imitation learning, which lead to cascading failures over extended trajectories. To address these challenges, we propose the Diffusion Trajectory-guided Policy (DTP) framework, which generates 2D trajectories through a diffusion model to guide policy learning for long-horizon tasks. By leveraging task-relevant trajectories, DTP provides trajectory-level guidance to reduce error accumulation. Our two-stage approach first trains a generative vision-language model to create diffusion-based trajectories, then refines the imitation policy using them. Experiments on the CALVIN benchmark show that DTP outperforms state-of-the-art baselines by 25% in success rate, starting from scratch without external pretraining. Moreover, DTP significantly improves real-world robot performance.
comment: 8 pages, 5 figures, accepted to IEEE Robotics and Automation Letters (RAL)
Touch Speaks, Sound Feels: A Multimodal Approach to Affective and Social Touch from Robots to Humans
Affective tactile interaction constitutes a fundamental component of human communication. In natural human-human encounters, touch is seldom experienced in isolation; rather, it is inherently multisensory. Individuals not only perceive the physical sensation of touch but also register the accompanying auditory cues generated through contact. The integration of haptic and auditory information forms a rich and nuanced channel for emotional expression. While extensive research has examined how robots convey emotions through facial expressions and speech, their capacity to communicate social gestures and emotions via touch remains largely underexplored. To address this gap, we developed a multimodal interaction system incorporating a 5*5 grid of 25 vibration motors synchronized with audio playback, enabling robots to deliver combined haptic-audio stimuli. In an experiment involving 32 Chinese participants, ten emotions and six social gestures were presented through vibration, sound, or their combination. Participants rated each stimulus on arousal and valence scales. The results revealed that (1) the combined haptic-audio modality significantly enhanced decoding accuracy compared to single modalities; (2) each individual channel-vibration or sound-effectively supported certain emotions recognition, with distinct advantages depending on the emotional expression; and (3) gestures alone were generally insufficient for conveying clearly distinguishable emotions. These findings underscore the importance of multisensory integration in affective human-robot interaction and highlight the complementary roles of haptic and auditory cues in enhancing emotional communication.
Learning to Recover: Dynamic Reward Shaping with Wheel-Leg Coordination for Fallen Robots
Adaptive recovery from fall incidents are essential skills for the practical deployment of wheeled-legged robots, which uniquely combine the agility of legs with the speed of wheels for rapid recovery. However, traditional methods relying on preplanned recovery motions, simplified dynamics or sparse rewards often fail to produce robust recovery policies. This paper presents a learning-based framework integrating Episode-based Dynamic Reward Shaping and curriculum learning, which dynamically balances exploration of diverse recovery maneuvers with precise posture refinement. An asymmetric actor-critic architecture accelerates training by leveraging privileged information in simulation, while noise-injected observations enhance robustness against uncertainties. We further demonstrate that synergistic wheel-leg coordination reduces joint torque consumption by 15.8% and 26.2% and improves stabilization through energy transfer mechanisms. Extensive evaluations on two distinct quadruped platforms achieve recovery success rates up to 99.1% and 97.8% without platform-specific tuning. The supplementary material is available at https://boyuandeng.github.io/L2R-WheelLegCoordination/
Interpretable Robot Control via Structured Behavior Trees and Large Language Models
As intelligent robots become more integrated into human environments, there is a growing need for intuitive and reliable Human-Robot Interaction (HRI) interfaces that are adaptable and more natural to interact with. Traditional robot control methods often require users to adapt to interfaces or memorize predefined commands, limiting usability in dynamic, unstructured environments. This paper presents a novel framework that bridges natural language understanding and robotic execution by combining Large Language Models (LLMs) with Behavior Trees. This integration enables robots to interpret natural language instructions given by users and translate them into executable actions by activating domain-specific plugins. The system supports scalable and modular integration, with a primary focus on perception-based functionalities, such as person tracking and hand gesture recognition. To evaluate the system, a series of real-world experiments was conducted across diverse environments. Experimental results demonstrate that the proposed approach is practical in real-world scenarios, with an average cognition-to-execution accuracy of approximately 94%, making a significant contribution to HRI systems and robots. The complete source code of the framework is publicly available at https://github.com/snt-arg/robot_suite.
comment: 15 pages, 5 figures, 3 tables
UltraHiT: A Hierarchical Transformer Architecture for Generalizable Internal Carotid Artery Robotic Ultrasonography
Carotid ultrasound is crucial for the assessment of cerebrovascular health, particularly the internal carotid artery (ICA). While previous research has explored automating carotid ultrasound, none has tackled the challenging ICA. This is primarily due to its deep location, tortuous course, and significant individual variations, which greatly increase scanning complexity. To address this, we propose a Hierarchical Transformer-based decision architecture, namely UltraHiT, that integrates high-level variation assessment with low-level action decision. Our motivation stems from conceptualizing individual vascular structures as morphological variations derived from a standard vascular model. The high-level module identifies variation and switches between two low-level modules: an adaptive corrector for variations, or a standard executor for normal cases. Specifically, both the high-level module and the adaptive corrector are implemented as causal transformers that generate predictions based on the historical scanning sequence. To ensure generalizability, we collected the first large-scale ICA scanning dataset comprising 164 trajectories and 72K samples from 28 subjects of both genders. Based on the above innovations, our approach achieves a 95% success rate in locating the ICA on unseen individuals, outperforming baselines and demonstrating its effectiveness. Our code will be released after acceptance.
Automating RT Planning at Scale: High Quality Data For AI Training
Radiotherapy (RT) planning is complex, subjective, and time-intensive. Advances with artificial intelligence (AI) promise to improve its precision and efficiency, but progress is often limited by the scarcity of large, standardized datasets. To address this, we introduce the Automated Iterative RT Planning (AIRTP) system, a scalable solution for generating high-quality treatment plans. This scalable solution is designed to generate substantial volumes of consistently high-quality treatment plans, overcoming a key obstacle in the advancement of AI-driven RT planning. Our AIRTP pipeline adheres to clinical guidelines and automates essential steps, including organ-at-risk (OAR) contouring, helper structure creation, beam setup, optimization, and plan quality improvement, using AI integrated with RT planning software like Varian Eclipse. Furthermore, a novel approach for determining optimization parameters to reproduce 3D dose distributions, i.e. a method to convert dose predictions to deliverable treatment plans constrained by machine limitations is proposed. A comparative analysis of plan quality reveals that our automated pipeline produces treatment plans of quality comparable to those generated manually, which traditionally require several hours of labor per plan. Committed to public research, the first data release of our AIRTP pipeline includes nine cohorts covering head-and-neck and lung cancer sites to support an AAPM 2025 challenge. To our best knowledge, this dataset features more than 10 times number of plans compared to the largest existing well-curated public dataset. Repo: https://github.com/RiqiangGao/GDP-HMM_AAPMChallenge.
comment: radiotherapy planning, data for AI training
Context Matters! Relaxing Goals with LLMs for Feasible 3D Scene Planning
Embodied agents need to plan and act reliably in real and complex 3D environments. Classical planning (e.g., PDDL) offers structure and guarantees, but in practice it fails under noisy perception and incorrect predicate grounding. On the other hand, Large Language Models (LLMs)-based planners leverage commonsense reasoning, yet frequently propose actions that are unfeasible or unsafe. Following recent works that combine the two approaches, we introduce ContextMatters, a framework that fuses LLMs and classical planning to perform hierarchical goal relaxation: the LLM helps ground symbols to the scene and, when the target is unreachable, it proposes functionally equivalent goals that progressively relax constraints, adapting the goal to the context of the agent's environment. Operating on 3D Scene Graphs, this mechanism turns many nominally unfeasible tasks into tractable plans and enables context-aware partial achievement when full completion is not achievable. Our experimental results show a +52.45% Success Rate improvement over state-of-the-art LLMs+PDDL baseline, demonstrating the effectiveness of our approach. Moreover, we validate the execution of ContextMatter in a real world scenario by deploying it on a TIAGo robot. Code, dataset, and supplementary materials are available to the community at https://lab-rococo-sapienza.github.io/context-matters/.
Development of a magnetorheological hand exoskeleton featuring a high force-to-power ratio for enhanced grip endurance
Hand exoskeletons have significant potential in labor-intensive fields by mitigating hand grip fatigue, enhancing hand strength, and preventing injuries. However, most of the traditional hand exoskeletons are driven by motors, whose output force is limited in the constrained installation conditions. Besides, they also come with the disadvantages of high power consumption, complex and bulky assistive systems, and high instability. In this work, we develop a novel hand exoskeleton integrated with innovative magnetorheological (MR) clutches that offers a high force-to-power ratio to improve grip endurance. The clutch features an enhanced structure design, a micro roller enhancing structure, which can significantly boost output forces. The experimental data demonstrate that, when it is supplied with 2 V, the clutch can deliver a peak holding force of 381.15 N-55 times that when no voltage is provided (7 N). In this scenario, it only consumes 1.38 W, yielding a force-to-power ratio of 256.75N/W, which is 2.35 times higher than the best-reported actuator used for hand exoskeletons. This capability enables the designed MRHE to provide approximately 419.79 N support force for gripping. The designed MR hand exoskeleton is highly integrated, comprising an exoskeleton frame, MR clutches, a control unit, and a battery. Evaluations through static grip endurance tests and dynamic carrying and lifting tests confirm that the MR hand exoskeleton can effectively reduce muscle fatigue, extend grip endurance, and minimize injuries. These findings highlight its strong potential for practical applications in repetitive tasks such as carrying and lifting in industrial settings.
NAR-*ICP: Neural Execution of Classical ICP-based Pointcloud Registration Algorithms
This study explores the intersection of neural networks and classical robotics algorithms through the Neural Algorithmic Reasoning (NAR) blueprint, enabling the training of neural networks to reason like classical robotics algorithms by learning to execute them. Algorithms are integral to robotics and safety-critical applications due to their predictable and consistent performance through logical and mathematical principles. In contrast, while neural networks are highly adaptable, handling complex, high-dimensional data and generalising across tasks, they often lack interpretability and transparency in their internal computations. To bridge the two, we propose a novel Graph Neural Network (GNN)-based framework, NAR-*ICP, that learns the intermediate computations of classical ICP-based registration algorithms, extending the CLRS Benchmark. We evaluate our approach across real-world and synthetic datasets, demonstrating its flexibility in handling complex inputs, and its potential to be used within larger learning pipelines. Our method achieves superior performance compared to the baselines, even surpassing the algorithms it was trained on, further demonstrating its ability to generalise beyond the capabilities of traditional algorithms.
comment: 19 pages, 16 tables, 7 figures
P2 Explore: Efficient Exploration in Unknown Cluttered Environment with Floor Plan Prediction IROS 2025
Robot exploration aims at the reconstruction of unknown environments, and it is important to achieve it with shorter paths. Traditional methods focus on optimizing the visiting order of frontiers based on current observations, which may lead to local-minimal results. Recently, by predicting the structure of the unseen environment, the exploration efficiency can be further improved. However, in a cluttered environment, due to the randomness of obstacles, the ability to predict is weak. Moreover, this inaccuracy will lead to limited improvement in exploration. Therefore, we propose FPUNet which can be efficient in predicting the layout of noisy indoor environments. Then, we extract the segmentation of rooms and construct their topological connectivity based on the predicted map. The visiting order of these predicted rooms is optimized which can provide high-level guidance for exploration. The FPUNet is compared with other network architectures which demonstrates it is the SOTA method for this task. Extensive experiments in simulations show that our method can shorten the path length by 2.18% to 34.60% compared to the baselines.
comment: 7 pages, Accepted by IROS 2025, Open-sourced at https://github.com/song-kun/P2Explore
Estimating the Joint Probability of Scenario Parameters with Gaussian Mixture Copula Models
This paper presents the first application of Gaussian Mixture Copula Models to the statistical modeling of driving scenarios for the safety validation of automated driving systems. Knowledge of the joint probability distribution of scenario parameters is essential for scenario-based safety assessment, where risk quantification depends on the likelihood of concrete parameter combinations. Gaussian Mixture Copula Models bring together the multimodal expressivity of Gaussian Mixture Models and the flexibility of copulas, enabling separate modeling of marginal distributions and dependencies. We benchmark Gaussian Mixture Copula Models against previously proposed approaches - Gaussian Mixture Models and Gaussian Copula Models - using real-world driving data drawn from scenarios defined in United Nations Regulation No. 157. Our evaluation across approximately 18 million scenario instances demonstrates that Gaussian Mixture Copula Models consistently surpass Gaussian Copula Models and perform better than, or at least comparably to, Gaussian Mixture Models, as measured by both log-likelihood and Sinkhorn distance. These results are promising for the adoption of Gaussian Mixture Copula Models as a statistical foundation for future scenario-based validation frameworks.
comment: 8 pages, 4 figures; This work has been submitted to the IEEE for possible publication
Gaze Estimation for Human-Robot Interaction: Analysis Using the NICO Platform
This paper evaluates the current gaze estimation methods within an HRI context of a shared workspace scenario. We introduce a new, annotated dataset collected with the NICO robotic platform. We evaluate four state-of-the-art gaze estimation models. The evaluation shows that the angular errors are close to those reported on general-purpose benchmarks. However, when expressed in terms of distance in the shared workspace the best median error is 16.48 cm quantifying the practical limitations of current methods. We conclude by discussing these limitations and offering recommendations on how to best integrate gaze estimation as a modality in HRI systems.
comment: Code available at http://github.com/kocurvik/nico_gaze
Control of Humanoid Robots with Parallel Mechanisms using Differential Actuation Models
Several recently released humanoid robots, inspired by the mechanical design of Cassie, employ actuator configurations in which the motors are displaced from the joints to reduce leg inertia. While studies accounting for the full kinematic complexity have demonstrated the benefits of these designs, the associated loop-closure constraints greatly increase computational cost and limit their use in control and learning. As a result, the non-linear transmission is often approximated by a constant reduction ratio, preventing exploitation of the mechanism's full capabilities. This paper introduces a compact analytical formulation for the two standard knee and ankle mechanisms that captures the exact non-linear transmission while remaining computationally efficient. The model is fully differentiable up to second order with a minimal formulation, enabling low-cost evaluation of dynamic derivatives for trajectory optimization and of the apparent transmission impedance for reinforcement learning. We integrate this formulation into trajectory optimization and locomotion policy learning, and compare it against simplified constant-ratio approaches. Hardware experiments demonstrate improved accuracy and robustness, showing that the proposed method provides a practical means to incorporate parallel actuation into modern control algorithms.
Robot Learning from Any Images
We introduce RoLA, a framework that transforms any in-the-wild image into an interactive, physics-enabled robotic environment. Unlike previous methods, RoLA operates directly on a single image without requiring additional hardware or digital assets. Our framework democratizes robotic data generation by producing massive visuomotor robotic demonstrations within minutes from a wide range of image sources, including camera captures, robotic datasets, and Internet images. At its core, our approach combines a novel method for single-view physical scene recovery with an efficient visual blending strategy for photorealistic data collection. We demonstrate RoLA's versatility across applications like scalable robotic data generation and augmentation, robot learning from Internet images, and single-image real-to-sim-to-real systems for manipulators and humanoids. Video results are available at https://sihengz02.github.io/RoLA .
comment: CoRL 2025 camera ready
Mitigating Cross-Modal Distraction and Ensuring Geometric Feasibility via Affordance-Guided and Self-Consistent MLLMs for Task Planning in Instruction-Following Manipulation
We investigate the use of Multimodal Large Language Models (MLLMs) with in-context learning for closed-loop task planning in instruction-following manipulation. We identify four essential requirements for successful task planning: quantity estimation, reachability analysis, relative positioning, and collision avoidance. However, existing benchmarks fail to support holistic evaluation across all these aspects. To address this gap, we introduce \textbf{QuARC} (Quantity, Analysis, Relative positioning, Collision), a new benchmark based on a food preparation scenario that integrates all four challenges. Using QuARC, we reveal two major limitations of current MLLMs: cross-modal distraction and geometric infeasibility. To tackle these, we adapt Chain-of-Thought with Self-Consistency to mitigate reasoning loss from cross-modal distractions and incorporate an affordance predictor to guide planning based on geometric feasibility. Our comprehensive evaluation analyzes performance across multiple baselines and explains sources of improvement. Our method achieves a 76.7\% success rate on the benchmark, significantly outperforming the ViLa baseline (36.7\%), without requiring additional finetuning. Code and dataset are available at https://hcis-lab.github.io/Affordance-Guided-Self-Consistent-MLLM.
Generating and Optimizing Topologically Distinct Guesses for Mobile Manipulator Path Planning with Path Constraints
Optimal path planning is prone to convergence to local, rather than global, optima. This is often the case for mobile manipulators due to nonconvexities induced by obstacles, robot kinematics and constraints. This paper focuses on planning under end effector path constraints and attempts to circumvent the issue of converging to a local optimum. We propose a pipeline that first discovers multiple homotopically distinct paths, and then optimizes them to obtain multiple distinct local optima. The best out of these distinct local optima is likely to be close to the global optimum. We demonstrate the effectiveness of our pipeline in the optimal path planning of mobile manipulators in the presence of path and obstacle constraints.
EffiTune: Diagnosing and Mitigating Training Inefficiency for Parameter Tuner in Robot Navigation System IROS 2025
Robot navigation systems are critical for various real-world applications such as delivery services, hospital logistics, and warehouse management. Although classical navigation methods provide interpretability, they rely heavily on expert manual tuning, limiting their adaptability. Conversely, purely learning-based methods offer adaptability but often lead to instability and erratic robot behaviors. Recently introduced parameter tuners aim to balance these approaches by integrating data-driven adaptability into classical navigation frameworks. However, the parameter tuning process currently suffers from training inefficiencies and redundant sampling, with critical regions in environment often underrepresented in training data. In this paper, we propose EffiTune, a novel framework designed to diagnose and mitigate training inefficiency for parameter tuners in robot navigation systems. EffiTune first performs robot-behavior-guided diagnostics to pinpoint critical bottlenecks and underrepresented regions. It then employs a targeted up-sampling strategy to enrich the training dataset with critical samples, significantly reducing redundancy and enhancing training efficiency. Our comprehensive evaluation demonstrates that EffiTune achieves more than a 13.5% improvement in navigation performance, enhanced robustness in out-of-distribution scenarios, and a 4x improvement in training efficiency within the same computational budget.
comment: Accepted to IROS 2025
EgoExo++: Integrating On-demand Exocentric Visuals with 2.5D Ground Surface Estimation for Interactive Teleoperation of Subsea ROVs
Underwater ROVs (Remotely Operated Vehicles) are indispensable for subsea exploration and task execution, yet typical teleoperation engines based on egocentric (first-person) video feeds restrict human operators' field-of-view and limit precise maneuvering in complex, unstructured underwater environments. To address this, we propose EgoExo, a geometry-driven solution integrated into a visual SLAM pipeline that synthesizes on-demand exocentric (third-person) views from egocentric camera feeds. Our proposed framework, EgoExo++, extends beyond 2D exocentric view synthesis (EgoExo) to augment a dense 2.5D ground surface estimation on-the-fly. It simultaneously renders the ROV model onto this reconstructed surface, enhancing semantic perception and depth comprehension. The computations involved are closed-form and rely solely on egocentric views and monocular SLAM estimates, which makes it portable across existing teleoperation engines and robust to varying waterbody characteristics. We validate the geometric accuracy of our approach through extensive experiments of 2-DOF indoor navigation and 6-DOF underwater cave exploration in challenging low-light conditions. Quantitative metrics confirm the reliability of the rendered Exo views, while a user study involving 15 operators demonstrates improved situational awareness, navigation safety, and task efficiency during teleoperation. Furthermore, we highlight the role of EgoExo++ augmented visuals in supporting shared autonomy, operator training, and embodied teleoperation. This new interactive approach to ROV teleoperation presents promising opportunities for future research in subsea telerobotics.
comment: EgoExo++ (Journal extension), V5, metadata updated, 12 pages
OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction
A dominant paradigm for teaching humanoid robots complex skills is to retarget human motions as kinematic references to train reinforcement learning (RL) policies. However, existing retargeting pipelines often struggle with the significant embodiment gap between humans and robots, producing physically implausible artifacts like foot-skating and penetration. More importantly, common retargeting methods neglect the rich human-object and human-environment interactions essential for expressive locomotion and loco-manipulation. To address this, we introduce OmniRetarget, an interaction-preserving data generation engine based on an interaction mesh that explicitly models and preserves the crucial spatial and contact relationships between an agent, the terrain, and manipulated objects. By minimizing the Laplacian deformation between the human and robot meshes while enforcing kinematic constraints, OmniRetarget generates kinematically feasible trajectories. Moreover, preserving task-relevant interactions enables efficient data augmentation, from a single demonstration to different robot embodiments, terrains, and object configurations. We comprehensively evaluate OmniRetarget by retargeting motions from OMOMO, LAFAN1, and our in-house MoCap datasets, generating over 8-hour trajectories that achieve better kinematic constraint satisfaction and contact preservation than widely used baselines. Such high-quality data enables proprioceptive RL policies to successfully execute long-horizon (up to 30 seconds) parkour and loco-manipulation skills on a Unitree G1 humanoid, trained with only 5 reward terms and simple domain randomization shared by all tasks, without any learning curriculum.
comment: Project website: https://omniretarget.github.io
Flight Demonstration and Model Validation of a Prototype Variable-Altitude Venus Aerobot
This paper details a significant milestone towards maturing a buoyant aerial robotic platform, or aerobot, for flight in the Venus clouds. We describe two flights of our subscale altitude-controlled aerobot, fabricated from the materials necessary to survive Venus conditions. During these flights over the Nevada Black Rock desert, the prototype flew at the identical atmospheric densities as 54 to 55 km cloud layer altitudes on Venus. We further describe a first-principle aerobot dynamics model which we validate against the Nevada flight data and subsequently employ to predict the performance of future aerobots on Venus. The aerobot discussed in this paper is under JPL and Aerostar development for an in-situ mission flying multiple circumnavigations of Venus, sampling the chemical and physical properties of the planet's atmosphere and also remotely sensing surface properties.
comment: Accepted version for AIAA Journal of Aircraft
Preferenced Oracle Guided Multi-mode Policies for Dynamic Bipedal Loco-Manipulation
Dynamic loco-manipulation calls for effective whole-body control and contact-rich interactions with the object and the environment. Existing learning-based control synthesis relies on training low-level skill policies and explicitly switching with a high-level policy or a hand-designed finite state machine, leading to quasi-static behaviors. In contrast, dynamic tasks such as soccer require the robot to run towards the ball, decelerate to an optimal approach to dribble, and eventually kick a goal - a continuum of smooth motion. To this end, we propose Preferenced Oracle Guided Multi-mode Policies (OGMP) to learn a single policy mastering all the required modes and preferred sequence of transitions to solve uni-object loco-manipulation tasks. We design hybrid automatons as oracles to generate references with continuous dynamics and discrete mode jumps to perform a guided policy optimization through bounded exploration. To enforce learning a desired sequence of mode transitions, we present a task-agnostic preference reward that enhances performance. The proposed approach demonstrates successful loco-manipulation for tasks like soccer and moving boxes omnidirectionally through whole-body control. In soccer, a single policy learns to optimally reach the ball, transition to contact-rich dribbling, and execute successful goal kicks and ball stops. Leveraging the oracle's abstraction, we solve each loco-manipulation task on robots with varying morphologies, including HECTOR V1, Berkeley Humanoid, Unitree G1, and H1, using the same reward definition and weights.
comment: 7 pages, 8 figures
BaTCAVe: Trustworthy Explanations for Robot Behaviors
Black box neural networks are an indispensable part of modern robots. Nevertheless, deploying such high-stakes systems in real-world scenarios poses significant challenges when the stakeholders, such as engineers and legislative bodies, lack insights into the neural networks' decision-making process. Presently, explainable AI is primarily tailored to natural language processing and computer vision, falling short in two critical aspects when applied in robots: grounding in decision-making tasks and the ability to assess trustworthiness of their explanations. In this paper, we introduce a trustworthy explainable robotics technique based on human-interpretable, high-level concepts that attribute to the decisions made by the neural network. Our proposed technique provides explanations with associated uncertainty scores for the explanation by matching neural network's activations with human-interpretable visualizations. To validate our approach, we conducted a series of experiments with various simulated and real-world robot decision-making models, demonstrating the effectiveness of the proposed approach as a post-hoc, human-friendly robot diagnostic tool.
comment: 19 pages, 26 figures
SiLVR: Scalable Lidar-Visual Radiance Field Reconstruction with Uncertainty Quantification
We present a neural radiance field (NeRF) based large-scale reconstruction system that fuses lidar and vision data to generate high-quality reconstructions that are geometrically accurate and capture photorealistic texture. Our system adopts the state-of-the-art NeRF representation to incorporate lidar. Adding lidar data adds strong geometric constraints on the depth and surface normals, which is particularly useful when modelling uniform texture surfaces which contain ambiguous visual reconstruction cues. A key contribution of this work is a novel method to quantify the epistemic uncertainty of the lidar-visual NeRF reconstruction by estimating the spatial variance of each point location in the radiance field given the sensor observations from the cameras and lidar. This provides a principled approach to evaluate the contribution of each sensor modality to the final reconstruction. In this way, reconstructions that are uncertain (due to e.g. uniform visual texture, limited observation viewpoints, or little lidar coverage) can be identified and removed. Our system is integrated with a real-time lidar SLAM system which is used to bootstrap a Structure-from-Motion (SfM) reconstruction procedure. It also helps to properly constrain the overall metric scale which is essential for the lidar depth loss. The refined SLAM trajectory can then be divided into submaps using Spectral Clustering to group sets of co-visible images together. This submapping approach is more suitable for visual reconstruction than distance-based partitioning. Our uncertainty estimation is particularly effective when merging submaps as their boundaries often contain artefacts due to limited observations. We demonstrate the reconstruction system using a multi-camera, lidar sensor suite in experiments involving both robot-mounted and handheld scanning. Our test datasets cover a total area of more than 20,000 square metres.
comment: Accepted by T-RO. Webpage: https://dynamic.robots.ox.ac.uk/projects/silvr/
Autonomy Architectures for Safe Planning in Unknown Environments Under Budget Constraints
Mission planning can often be formulated as a constrained control problem under multiple path constraints (i.e., safety constraints) and budget constraints (i.e., resource expenditure constraints). In a priori unknown environments, verifying that an offline solution will satisfy the constraints for all time can be difficult, if not impossible. We present ReRoot, a novel sampling-based framework that enforces safety and budget constraints for nonlinear systems in unknown environments. The main idea is that ReRoot grows multiple reverse RRT* trees online, starting from renewal sets, i.e., sets where the budget constraints are renewed. The dynamically feasible backup trajectories guarantee safety and reduce resource expenditure, which provides a principled backup policy when integrated into the gatekeeper safety verification architecture. We demonstrate our approach in simulation with a fixed-wing UAV in a GNSS-denied environment with a budget constraint on localization error that can be renewed at visual landmarks.
comment: Code: https://github.com/dcherenson/budget-constrained-planning
Estimating Dynamic Soft Continuum Robot States From Boundaries
State estimation is one of the fundamental problems in robotics. For soft continuum robots, this task is particularly challenging because their states (poses, strains, internal wrenches, and velocities) are inherently infinite-dimensional functions due to their continuous deformability. Traditional sensing techniques, however, can only provide discrete measurements. Recently, a dynamic state estimation method known as a \textit{boundary observer} was introduced, which uses Cosserat rod theory to recover all infinite-dimensional states by measuring only the tip velocity. In this work, we present a dual design that instead relies on measuring the internal wrench at the robot's base. Despite the duality, this new approach offers a key practical advantage: it requires only a force/torque (FT) sensor embedded at the base and eliminates the need for external motion capture systems. Both observer types are inspired by principles of energy dissipation and can be naturally combined to enhance performance. We conduct a Lyapunov-based analysis to study the convergence rate of these boundary observers and reveal a useful property: as the observer gains increase, the convergence rate initially improves and then degrades. This convex trend enables efficient tuning of the observer gains. We also identify special cases where linear and angular states are fully determined by each other, which further relaxes sensing requirements. Experimental studies using a tendon-driven continuum robot validate the convergence of all observer variants under fast dynamic motions, the existence of optimal gains, robustness against unknown external forces, and the algorithm's real-time computational performance.
Multiagent Systems
Multi-Objective Multi-Agent Path Finding with Lexicographic Cost Preferences
Many real-world scenarios require multiple agents to coordinate in shared environments, while balancing trade-offs between multiple, potentially competing objectives. Current multi-objective multi-agent path finding (MO-MAPF) algorithms typically produce conflict-free plans by computing Pareto frontiers. They do not explicitly optimize for user-defined preferences, even when the preferences are available, and scale poorly with the number of objectives. We propose a lexicographic framework for modeling MO-MAPF, along with an algorithm \textit{Lexicographic Conflict-Based Search} (LCBS) that directly computes a single solution aligned with a lexicographic preference over objectives. LCBS integrates a priority-aware low-level $A^*$ search with conflict-based search, avoiding Pareto frontier construction and enabling efficient planning guided by preference over objectives. We provide insights into optimality and scalability, and empirically demonstrate that LCBS computes optimal solutions while scaling to instances with up to ten objectives -- far beyond the limits of existing MO-MAPF methods. Evaluations on standard and randomized MAPF benchmarks show consistently higher success rates against state-of-the-art baselines, especially with increasing number of objectives.
comment: 8 pages, 7 figures
A Multi-Agent Framework for Stateful Inference-Time Search
Recent work explores agentic inference-time techniques to perform structured, multi-step reasoning. However, stateless inference often struggles on multi-step tasks due to the absence of persistent state. Moreover, task-specific fine-tuning or instruction-tuning often achieve surface-level code generation but remain brittle on tasks requiring deeper reasoning and long-horizon dependencies. To address these limitations, we propose stateful multi-agent evolutionary search, a training-free framework that departs from prior stateless approaches by combining (i) persistent inference-time state, (ii) adversarial mutation, and (iii) evolutionary preservation. We demonstrate its effectiveness in automated unit test generation through the generation of edge cases. We generate robust edge cases using an evolutionary search process, where specialized agents sequentially propose, mutate, and score candidates. A controller maintains persistent state across generations, while evolutionary preservation ensures diversity and exploration across all possible cases. This yields a generalist agent capable of discovering robust, high-coverage edge cases across unseen codebases. Experiments show our stateful multi-agent inference framework achieves substantial gains in coverage over stateless single-step baselines, evaluated on prevalent unit-testing benchmarks such as HumanEval and TestGenEvalMini and using three diverse LLM families - Llama, Gemma, and GPT. These results indicate that combining persistent inference-time state with evolutionary search materially improves unit-test generation.
FURINA: A Fully Customizable Role-Playing Benchmark via Scalable Multi-Agent Collaboration Pipeline
As large language models (LLMs) advance in role-playing (RP) tasks, existing benchmarks quickly become obsolete due to their narrow scope, outdated interaction paradigms, and limited adaptability across diverse application scenarios. To address this gap, we introduce FURINA-Builder, a novel multi-agent collaboration pipeline that automatically constructs fully customizable RP benchmarks at any scale. It enables evaluation of arbitrary characters across diverse scenarios and prompt formats, as the first benchmark builder in RP area for adaptable assessment. FURINA-Builder simulates dialogues between a test character and other characters drawn from a well-constructed character-scene pool, while an LLM judge selects fine-grained evaluation dimensions and adjusts the test character's responses into final test utterances. Using this pipeline, we build FURINA-Bench, a new comprehensive role-playing benchmark featuring both established and synthesized test characters, each assessed with dimension-specific evaluation criteria. Human evaluation and preliminary separability analysis justify our pipeline and benchmark design. We conduct extensive evaluations of cutting-edge LLMs and find that o3 and DeepSeek-R1 achieve the best performance on English and Chinese RP tasks, respectively. Across all models, established characters consistently outperform synthesized ones, with reasoning capabilities further amplifying this disparity. Interestingly, we observe that model scale does not monotonically reduce hallucinations. More critically, for reasoning LLMs, we uncover a novel trade-off: reasoning improves RP performance but simultaneously increases RP hallucinations. This trade-off extends to a broader Pareto frontier between RP performance and reliability for all LLMs. These findings demonstrate the effectiveness of FURINA-Builder and the challenge posed by FURINA-Bench.
Measuring and Mitigating Identity Bias in Multi-Agent Debate via Anonymization
Multi-agent debate (MAD) aims to improve large language model (LLM) reasoning by letting multiple agents exchange answers and then aggregate their opinions. Yet recent studies reveal that agents are not neutral: they are prone to identity-driven sycophancy and self-bias, uncritically adopting a peer's view or stubbornly adhering to their own prior output, undermining the reliability of debate. In this work, we present the first principled framework that joins sycophancy and self-bias to mitigate and quantify identity bias in MAD. First, we formalize the debate dynamics as an identity-weighted Bayesian update process. Second, we propose response anonymization: by removing identity markers from prompts, agents cannot distinguish "self" from "peer", which forces equal weights on agent identity, thereby reducing bias. Third, we define the Identity Bias Coefficient (IBC), a principled metric that measures how often an agent follows a peer versus itself. Empirical studies across multiple models, datasets and debate rounds confirm that identity bias is widespread, with sycophancy far more common than self-bias. Our findings highlight the need to "mask" identity to ensure that MAD systems reason based on content rather than source identity. Code is released in https://github.com/deeplearning-wisc/MAD-identity-bias.
Code Like Humans: A Multi-Agent Solution for Medical Coding EMNLP
In medical coding, experts map unstructured clinical notes to alphanumeric codes for diagnoses and procedures. We introduce Code Like Humans: a new agentic framework for medical coding with large language models. It implements official coding guidelines for human experts, and it is the first solution that can support the full ICD-10 coding system (+70K labels). It achieves the best performance to date on rare diagnosis codes (fine-tuned discriminative classifiers retain an advantage for high-frequency codes, to which they are limited). Towards future work, we also contribute an analysis of system performance and identify its `blind spots' (codes that are systematically undercoded).
comment: EMNLP Findings 2025
GPS-MTM: Capturing Pattern of Normalcy in GPS-Trajectories with self-supervised learning
Foundation models have driven remarkable progress in text, vision, and video understanding, and are now poised to unlock similar breakthroughs in trajectory modeling. We introduce the GPSMasked Trajectory Transformer (GPS-MTM), a foundation model for large-scale mobility data that captures patterns of normalcy in human movement. Unlike prior approaches that flatten trajectories into coordinate streams, GPS-MTM decomposes mobility into two complementary modalities: states (point-of-interest categories) and actions (agent transitions). Leveraging a bi-directional Transformer with a self-supervised masked modeling objective, the model reconstructs missing segments across modalities, enabling it to learn rich semantic correlations without manual labels. Across benchmark datasets, including Numosim-LA, Urban Anomalies, and Geolife, GPS-MTM consistently outperforms on downstream tasks such as trajectory infilling and next-stop prediction. Its advantages are most pronounced in dynamic tasks (inverse and forward dynamics), where contextual reasoning is critical. These results establish GPS-MTM as a robust foundation model for trajectory analytics, positioning mobility data as a first-class modality for large-scale representation learning. Code is released for further reference.
comment: 4 pages, 2 figures
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science
Large Language Model (LLM) agents have shown great potential in addressing real-world data science problems. LLM-driven data science agents promise to automate the entire machine learning pipeline, yet their real-world effectiveness remains limited. Existing frameworks depend on rigid, pre-defined workflows and inflexible coding strategies; consequently, they excel only on relatively simple, classical problems and fail to capture the empirical expertise that human practitioners bring to complex, innovative tasks. In this work, we introduce AutoMind, an adaptive, knowledgeable LLM-agent framework that overcomes these deficiencies through three key advances: (1) a curated expert knowledge base that grounds the agent in domain expert knowledge, (2) an agentic knowledgeable tree search algorithm that strategically explores possible solutions, and (3) a self-adaptive coding strategy that dynamically tailors code generation to task complexity. Evaluations on two automated data science benchmarks demonstrate that AutoMind delivers superior performance versus state-of-the-art baselines. Additional analyses confirm favorable effectiveness, efficiency, and qualitative solution quality, highlighting AutoMind as an efficient and robust step toward fully automated data science. Code is at https://github.com/innovatingAI/AutoMind.
comment: Ongoing work
KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality
Large Language Models (LLMs), particularly slow-thinking models, often exhibit severe hallucination, outputting incorrect content due to an inability to accurately recognize knowledge boundaries during reasoning. While Reinforcement Learning (RL) can enhance complex reasoning abilities, its outcome-oriented reward mechanism often lacks factual supervision over the thinking process, further exacerbating the hallucination problem. To address the high hallucination in slow-thinking models, we propose Knowledge-enhanced RL, KnowRL. KnowRL guides models to perform fact-based slow thinking by integrating a factuality reward, based on knowledge verification, into the RL training process, helping them recognize their knowledge boundaries. KnowRL guides models to perform fact-based slow thinking by integrating a factuality reward, based on knowledge verification, into the RL training process, helping them recognize their knowledge boundaries. This targeted factual input during RL training enables the model to learn and internalize fact-based reasoning strategies. By directly rewarding adherence to facts within the reasoning steps, KnowRL fosters a more reliable thinking process. Experimental results on three hallucination evaluation datasets and two reasoning evaluation datasets demonstrate that KnowRL effectively mitigates hallucinations in slow-thinking models while maintaining their original strong reasoning capabilities. Our code is available at https://github.com/zjunlp/KnowRL.
comment: Work in progress
Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning
Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of NLP tasks, but they remain fundamentally stateless, constrained by limited context windows that hinder long-horizon reasoning. Recent efforts to address this limitation often augment LLMs with an external memory bank, yet most existing pipelines are static and heuristic-driven, lacking a learned mechanism for deciding what to store, update, or retrieve. We present Memory-R1, a reinforcement learning (RL) framework that equips LLMs with the ability to actively manage and utilize external memory through two specialized agents: a Memory Manager that learns structured operations, including ADD, UPDATE, DELETE, and NOOP; and an Answer Agent that pre-selects and reasons over relevant entries. Both agents are fine-tuned with outcome-driven RL (PPO and GRPO), enabling adaptive memory management with minimal supervision. With only 152 training QA pairs, Memory-R1 outperforms strong baselines and generalizes across diverse question types, three benchmarks (LoCoMo, MSC, LongMemEval), and multiple model scales (3B-14B).
Behavioral alignment in social networks
The orderly behaviors observed in large-scale groups, such as fish schooling and the organized movement of crowds, are both ubiquitous and essential for the survival and stability of these systems. Understanding how such complex collective behaviors emerge from simple local interactions and behavioral adjustments is a significant scientific challenge. Historically, research has predominantly focused on imitation and social learning, where individuals adopt the strategies of more successful peers to refine their behavior. However, in recent years, an alternative learning approach based on self-exploration and introspective learning has garnered increasing attention. In this paradigm, individuals assess their own circumstances and select strategies that best align with their specific conditions. Two examples are coordination and anti-coordination, where individuals align with and diverge from the local majority, respectively. In this study, we analyze networked systems of coordinating and anti-coordinating individuals, exploring the combined effects of system dynamics, network structure, and behavioral patterns. We address several practical questions, including the number of equilibria, their characteristics, the equilibrium time, and the resilience of the system. We find that the number of equilibrium states can be extremely large, even increasing exponentially with minor alterations to the network structure. Moreover, the network structure has a significant impact on the average equilibrium time. Despite the complexity of these findings, we find that variations can be captured by a single, simple network characteristic (the average path length), which we illustrate in both synthetic and empirical networks.
CodeCureAgent: Automatic Classification and Repair of Static Analysis Warnings
Static analysis tools are widely used to detect bugs, vulnerabilities, and code smells. Traditionally, developers must resolve these warnings manually. Because this process is tedious, developers sometimes ignore warnings, leading to an accumulation of warnings and a degradation of code quality. This paper presents CodeCureAgent, an approach that harnesses LLM-based agents to automatically analyze, classify, and repair static analysis warnings. Unlike previous work, our method does not follow a predetermined algorithm. Instead, we adopt an agentic framework that iteratively invokes tools to gather additional information from the codebase (e.g., via code search) and edit the codebase to resolve the warning. CodeCureAgent detects and suppresses false positives, while fixing true positives when identified. We equip CodeCureAgent with a three-step heuristic to approve patches: (1) build the project, (2) verify that the warning disappears without introducing new warnings, and (3) run the test suite. We evaluate CodeCureAgent on a dataset of 1,000 SonarQube warnings found in 106 Java projects and covering 291 distinct rules. Our approach produces plausible fixes for 96.8% of the warnings, outperforming state-of-the-art baseline approaches by 30.7% and 29.2% in plausible-fix rate, respectively. Manual inspection of 291 cases reveals a correct-fix rate of 86.3%, showing that CodeCureAgent can reliably repair static analysis warnings. The approach incurs LLM costs of about 2.9 cents (USD) and an end-to-end processing time of about four minutes per warning. We envision CodeCureAgent helping to clean existing codebases and being integrated into CI/CD pipelines to prevent the accumulation of static analysis warnings.
Consistent Opponent Modeling of Static Opponents in Imperfect-Information Games
The goal of agents in multi-agent environments is to maximize total reward against the opposing agents that are encountered. Following a game-theoretic solution concept, such as Nash equilibrium, may obtain a strong performance in some settings; however, such approaches fail to capitalize on historical and observed data from repeated interactions against our opponents. Opponent modeling algorithms integrate machine learning techniques to exploit suboptimal opponents utilizing available data; however, the effectiveness of such approaches in imperfect-information games to date is quite limited. We show that existing opponent modeling approaches fail to satisfy a simple desirable property even against static opponents drawn from a known prior distribution; namely, they do not guarantee that the model approaches the opponent's true strategy even in the limit as the number of game iterations approaches infinity. We develop a new algorithm that is able to achieve this property and runs efficiently by solving a convex minimization problem based on the sequence-form game representation using projected gradient descent. The algorithm is guaranteed to efficiently converge to the opponent's true strategy given observations from gameplay and possibly additional historical data if it is available.
The Price of Uncertainty for Social Consensus
How hard is it to achieve consensus in a social network under uncertainty? In this paper we model this problem as a social graph of agents where each vertex is initially colored red or blue. The goal of the agents is to achieve consensus, which is when the colors of all agents align. Agents attempt to do this locally through steps in which an agent changes their color to the color of the majority of their neighbors. In real life, agents may not know exactly how many of their neighbors are red or blue, which introduces uncertainty into this process. Modeling uncertainty as perturbations of relative magnitude $1+\varepsilon$ to these color neighbor counts, we show that even small values of $\varepsilon$ greatly hinder the ability to achieve consensus in a social network. We prove theoretically tight upper and lower bounds on the price of uncertainty, a metric defined by Balcan et al. to quantify the effect of uncertainty in network games.
comment: 17 pages
$\textit{Agents Under Siege}$: Breaking Pragmatic Multi-Agent LLM Systems with Optimized Prompt Attacks
Most discussions about Large Language Model (LLM) safety have focused on single-agent settings but multi-agent LLM systems now create novel adversarial risks because their behavior depends on communication between agents and decentralized reasoning. In this work, we innovatively focus on attacking pragmatic systems that have constrains such as limited token bandwidth, latency between message delivery, and defense mechanisms. We design a $\textit{permutation-invariant adversarial attack}$ that optimizes prompt distribution across latency and bandwidth-constraint network topologies to bypass distributed safety mechanisms within the system. Formulating the attack path as a problem of $\textit{maximum-flow minimum-cost}$, coupled with the novel $\textit{Permutation-Invariant Evasion Loss (PIEL)}$, we leverage graph-based optimization to maximize attack success rate while minimizing detection risk. Evaluating across models including $\texttt{Llama}$, $\texttt{Mistral}$, $\texttt{Gemma}$, $\texttt{DeepSeek}$ and other variants on various datasets like $\texttt{JailBreakBench}$ and $\texttt{AdversarialBench}$, our method outperforms conventional attacks by up to $7\times$, exposing critical vulnerabilities in multi-agent systems. Moreover, we demonstrate that existing defenses, including variants of $\texttt{Llama-Guard}$ and $\texttt{PromptGuard}$, fail to prohibit our attack, emphasizing the urgent need for multi-agent specific safety mechanisms.
Chisme: Fully Decentralized Differentiated Deep Learning for IoT Intelligence
As end-user device capability increases and demand for intelligent services at the Internet's edge rise, distributed learning has emerged as a key enabling technology. Existing approaches like federated learning (FL) and decentralized FL (DFL) enable distributed learning among clients, while gossip learning (GL) approaches have emerged to address the potential challenges in resource-constrained, connectivity-challenged infrastructure-less environments. However, most distributed learning approaches assume largely homogeneous data distributions and may not consider or exploit the heterogeneity of clients and their underlying data distributions. This paper introduces Chisme, a novel fully decentralized distributed learning algorithm designed to address the challenges of implementing robust intelligence in network edge contexts characterized by heterogeneous data distributions, episodic connectivity, and sparse network infrastructure. Chisme leverages cosine similarity-based data affinity heuristics calculated from received model exchanges to inform how much influence received models have when merging into the local model. By doing so, it facilitates stronger merging influence between clients with more similar model learning progressions, enabling clients to strategically balance between broader collaboration to build more general knowledge and more selective collaboration to build specific knowledge. We evaluate Chisme against contemporary approaches using image recognition and time-series prediction scenarios while considering different network connectivity conditions, representative of real-world distributed intelligent systems. Our experiments demonstrate that Chisme outperforms state-of-the-art edge intelligence approaches in almost every case -- clients using Chisme exhibit faster training convergence, lower final loss after training, and lower performance disparity between clients.
comment: This work has been submitted to the IEEE PerCom 2026 for potential publication
Systems and Control (CS)
A Genetic Algorithm Approach to Anti-Jamming UAV Swarm Behavior
In recent years, Unmanned Aerial Vehicles (UAVs) have brought a new true revolution to military tactics. While UAVs already constitute an advantage when operating alone, multi-UAV swarms expand the available possibilities, allowing the UAVs to collaborate and support each other as a team to carry out a given task. This entails the capability to exchange information related with situation awareness and action coordination by means of a suitable wireless communication technology. In such scenario, the adversary is expected to disrupt communications by jamming the communication channel. The latter becomes the Achilles heel of the swarm. While anti-jamming techniques constitute a well covered topic in the literature, the use of intelligent swarm behaviors to leverage those techniques is still an open research issue. This paper explores the use of Genetic Algorithms (GAs) to jointly optimize UAV swarm formation, beam-steering antennas and traffic routing in order to mitigate the effect of jamming in the main coordination channel, under the assumption that a more robust and low data rate channel is used for formation management signaling. Simulation results show the effectiveness of proposed approach. However, the significant computational cost paves the way for further research.
comment: 8 pages, conference paper
Stability Preserving Safe Control of a Bicopter
This paper presents a control law for stabilization and trajectory tracking of a multicopter subject to safety constraints. The proposed approach guarantees forward invariance of a prescribed safety set while ensuring smooth tracking performance. Unlike conventional control barrier function methods, the constrained control problem is transformed into an unconstrained one using state-dependent mappings together with carefully constructed Lyapunov functions. This approach enables explicit synthesis of the control law, instead of requiring a solution of constrained optimization at each step. The transformation also enables the controller to enforce safety without sacrificing stability or performance. Simulation results for a polytopic reference trajectory confined within a designated safe region demonstrate the effectiveness of the proposed method.
From Neural Sensing to Stimulation: An Interdisciplinary Roadmap for Neurotechnology
Neurotechnologies are transforming how we measure, interpret, and modulate brain-body interactions, integrating real-time sensing, computation, and stimulation to enable precise physiological control. They hold transformative potential across clinical and non-clinical domains, from treating disorders to enhancing cognition and performance. Realizing this potential requires navigating complex, interdisciplinary challenges spanning neuroscience, materials science, device engineering, signal processing, computational modelling, and regulatory and ethical frameworks. This Perspective presents a strategic roadmap for neurotechnology development, created by early-career researchers, highlighting their role at the intersection of disciplines and their capacity to bridge traditional silos. We identify five cross-cutting trade-offs that constrain progress across functionality, scalability, adaptability, and translatability, and illustrate how technical domains influence their resolution. Rather than a domain-specific review, we focus on shared challenges and strategic opportunities that transcend disciplines. We propose a unified framework for collaborative innovation and education, highlight ethical and regulatory priorities, and outline a timeline for overcoming key bottlenecks. By aligning technical development with translational and societal needs, this roadmap aims to accelerate equitable, effective, and future-ready adaptive neurotechnologies, guiding coordinated efforts across the global research and innovation community.
Identification and optimal control strategies for the transversal splitting of ultra--cold Bose gases
Splitting a Bose--Einstein condensate (BEC) is a key operation in fundamental physics experiments and emerging quantum technologies, where precise preparation of well--defined initial states requires fast yet coherent control of the condensate's nonlinear dynamics. This work formulates the BEC splitting process as an optimal feedforward control problem based on a physically interpretable, reduced--order model identified from limited experimental data. We introduce a systematic calibration strategy that combines optimal experiment selection and constrained nonlinear parameter estimation, enabling accurate system identification with minimal experimental overhead. Using this calibrated model, we compute energy--optimal trajectories via indirect optimal control to realize shortcuts to adiabaticity (STAs), achieving rapid transitions to the ground state of a double--well potential while suppressing excitations. Experiments confirm that the proposed control framework yields high--fidelity state transfers across multiple configurations, demonstrating its robustness and scalability for quantum control applications.
comment: To be published in IEEE Transactions on Control Systems Technology
Mitigating Increase-Decrease Gaming with Alternative Connection Agreements: A Defender-Attacker-Defender Game
Redispatch markets are widely used by system operators to manage network congestion. A well-known drawback, however, is that Flexibility Service Providers (FSPs) may strategically adjust their baselines in anticipation of redispatch actions, thereby aggravating congestion and raising system costs. To address this increase-decrease gaming, Distribution System Operators (DSOs) could use Alternative Connection Agreements (ACAs) to conditionally limit the available connection capacity of market participants in the day-ahead stage. In this paper, we present a novel Defender-Attacker-Defender game to investigate the potential of this approach in distribution networks under load and price uncertainty. We solve the resulting trilevel optimization model using a custom branch-and-bound algorithm, and we demonstrate that it efficiently solves the problem without exploring many nodes in the branch-and-bound search tree for most simulated scenarios. The case study demonstrates that applying ACAs can substantially lower redispatch costs (e.g. by 25%) for the DSO with only a limited impact on FSP profits. The effectiveness of the approach critically depends on how often the DSO can invoke ACAs and on the extent to which the DSO can anticipate strategic bidding behavior of the FSP.
Falsification-Driven Reinforcement Learning for Maritime Motion Planning
Compliance with maritime traffic rules is essential for the safe operation of autonomous vessels, yet training reinforcement learning (RL) agents to adhere to them is challenging. The behavior of RL agents is shaped by the training scenarios they encounter, but creating scenarios that capture the complexity of maritime navigation is non-trivial, and real-world data alone is insufficient. To address this, we propose a falsification-driven RL approach that generates adversarial training scenarios in which the vessel under test violates maritime traffic rules, which are expressed as signal temporal logic specifications. Our experiments on open-sea navigation with two vessels demonstrate that the proposed approach provides more relevant training scenarios and achieves more consistent rule compliance.
Decentralized CBF-based Safety Filters for Collision Avoidance of Cooperative Missile Systems with Input Constraints
This paper presents a decentralized safety filter for collision avoidance in multi-agent aerospace interception scenarios. The approach leverages robust control barrier functions (RCBFs) to guarantee forward invariance of safety sets under bounded inputs and high-relative-degree dynamics. Each effector executes its nominal cooperative guidance command, while a local quadratic program (QP) modifies the input only when necessary. Event-triggered activation based on range and zero-effort miss (ZEM) criteria ensures scalability by restricting active constraints to relevant neighbors. To resolve feasibility issues from simultaneous constraints, a slack-variable relaxation scheme is introduced that prioritizes critical agents in a Pareto-optimal manner. Simulation results in many-on-many interception scenarios demonstrate that the proposed framework maintains collision-free operation with minimal deviation from nominal guidance, providing a computationally efficient and scalable solution for safety-critical multi-agent aerospace systems.
comment: 7 pages, 5 figures
Resilient Multi-Dimensional Consensus and Distributed Optimization against Agent-Based and Denial-of-Service Attacks
In this paper, we consider the resilient multi-dimensional consensus and distributed optimization problems of multi-agent systems (MASs) in the presence of both agent-based and denial-of-service (DoS) attacks. The considered agent-based attacks can cover malicious, Byzantine, and stubborn agents. The links between agents in the network can be blocked by DoS attacks, which may lead the digraph to be time-varying and even disconnected. The objective is to ensure that the remaining benign agents achieve consensus. To this end, an "auxiliary point"-based resilient control algorithm is proposed for MASs. Under the proposed algorithm, each healthy agent constructs a "safe kernel" utilizing the states of its in-neighbors and updates its state toward a specific point within this kernel at each iteration. If an agent cannot receive its neighbors' states owing to DoS attacks, it will use the states received immediately before the DoS period. Moreover, a resilient multi-dimensional distributed optimization (RMDO) algorithm is also proposed. Theoretical proofs and numerical examples are presented to demonstrate the effectiveness of the proposed algorithms.
Delay Independent Safe Control with Neural Networks: Positive Lur'e Certificates for Risk Aware Autonomy
We present a risk-aware safety certification method for autonomous, learning enabled control systems. Focusing on two realistic risks, state/input delays and interval matrix uncertainty, we model the neural network (NN) controller with local sector bounds and exploit positivity structure to derive linear, delay-independent certificates that guarantee local exponential stability across admissible uncertainties. To benchmark performance, we adopt and implement a state-of-the-art IQC NN verification pipeline. On representative cases, our positivity-based tests run orders of magnitude faster than SDP-based IQC while certifying regimes the latter cannot-providing scalable safety guarantees that complement risk-aware control.
comment: Submitted to 2026 American Control Conference (ACC), New Orleans, LA
A Cascade of Systems and the Product of Their $θ$-Symmetric Scaled Relative Graphs
In this paper, we utilize a variant of the scaled relative graph (SRG), referred to as the $\theta$-symmetric SRG, to develop a graphical stability criterion for the feedback interconnection of a cascade of systems. A crucial submultiplicative property of $\theta$-symmetric SRG is established, enabling it to handle cyclic interconnections for which conventional graph separation methods are not applicable. By integrating both gain and refined phase information, the $\theta$-symmetric SRG provides a unified graphical characterization of the system, which better captures system properties and yields less conservative results. In the scalar case, the $\theta$-symmetric SRG can be reduced exactly to the scalar itself, whereas the standard SRG appears to be a conjugate pair. Consequently, the frequency-wise $\theta$-symmetric SRG is more suitable than the standard SRG as a multi-input multi-output extension of the classical Nyquist plot. Illustrative examples are included to demonstrate the effectiveness of the $\theta$-symmetric SRG.
comment: 9 pages, 4 figures
Safe Stabilization of the Stefan Problem with a High-Order Moving Boundary Dynamics by PDE Backstepping
This paper presents a safe stabilization of the Stefan PDE model with a moving boundary governed by a high-order dynamics. We consider a parabolic PDE with a time-varying domain governed by a second-order response with respect to the Neumann boundary value of the PDE state at the moving boundary. The objective is to design a boundary heat flux control to stabilize the moving boundary at a desired setpoint, with satisfying the required conditions of the model on PDE state and the moving boundary. We apply a PDE backstepping method for the control design with considering a constraint on the control law. The PDE and moving boundary constraints are shown to be satisfied by applying the maximum principle for parabolic PDEs. Then the closed-loop system is shown to be globally exponentially stable by performing Lyapunov analysis. The proposed control is implemented in numerical simulation, which illustrates the desired performance in safety and stability. An outline of the extension to third-order moving boundary dynamics is also presented. Code is released at https://github.com/shumon0423/HighOrderStefan_CDC2025.git.
comment: 6 pages, 4 figures, 64th IEEE Conference on Decision and Control (CDC) 2025
Model Predictive Path Integral Control for Roll-to-Roll Manufacturing
Roll-to-roll (R2R) manufacturing is a continuous processing technology essential for scalable production of thin-film materials and printed electronics, but precise control remains challenging due to subsystem interactions, nonlinearities, and process disturbances. This paper proposes a Model Predictive Path Integral (MPPI) control formulation for R2R systems, leveraging a GPU-based Monte-Carlo sampling approach to efficiently approximate optimal controls online. Crucially, MPPI easily handles non-differentiable cost functions, enabling the incorporation of complex performance criteria relevant to advanced manufacturing processes. A case study is presented that demonstrates that MPPI significantly improves tension regulation performance compared to conventional model predictive control (MPC), highlighting its suitability for real-time control in advanced manufacturing.
comment: 6 pages, 4 figures
GATO: GPU-Accelerated and Batched Trajectory Optimization for Scalable Edge Model Predictive Control
While Model Predictive Control (MPC) delivers strong performance across robotics applications, solving the underlying (batches of) nonlinear trajectory optimization (TO) problems online remains computationally demanding. Existing GPU-accelerated approaches typically (i) parallelize a single solve to meet real-time deadlines, (ii) scale to very large batches at slower-than-real-time rates, or (iii) achieve speed by restricting model generality (e.g., point-mass dynamics or a single linearization). This leaves a large gap in solver performance for many state-of-the-art MPC applications that require real-time batches of tens to low-hundreds of solves. As such, we present GATO, an open source, GPU-accelerated, batched TO solver co-designed across algorithm, software, and computational hardware to deliver real-time throughput for these moderate batch size regimes. Our approach leverages a combination of block-, warp-, and thread-level parallelism within and across solves for ultra-high performance. We demonstrate the effectiveness of our approach through a combination of: simulated benchmarks showing speedups of 18-21x over CPU baselines and 1.4-16x over GPU baselines as batch size increases; case studies highlighting improved disturbance rejection and convergence behavior; and finally a validation on hardware using an industrial manipulator. We open source GATO to support reproducibility and adoption.
Accuracy, Memory Efficiency and Generalization: A Comparative Study on Liquid Neural Networks and Recurrent Neural Networks
This review aims to conduct a comparative analysis of liquid neural networks (LNNs) and traditional recurrent neural networks (RNNs) and their variants, such as long short-term memory networks (LSTMs) and gated recurrent units (GRUs). The core dimensions of the analysis include model accuracy, memory efficiency, and generalization ability. By systematically reviewing existing research, this paper explores the basic principles, mathematical models, key characteristics, and inherent challenges of these neural network architectures in processing sequential data. Research findings reveal that LNN, as an emerging, biologically inspired, continuous-time dynamic neural network, demonstrates significant potential in handling noisy, non-stationary data, and achieving out-of-distribution (OOD) generalization. Additionally, some LNN variants outperform traditional RNN in terms of parameter efficiency and computational speed. However, RNN remains a cornerstone in sequence modeling due to its mature ecosystem and successful applications across various tasks. This review identifies the commonalities and differences between LNNs and RNNs, summarizes their respective shortcomings and challenges, and points out valuable directions for future research, particularly emphasizing the importance of improving the scalability of LNNs to promote their application in broader and more complex scenarios.
comment: 13 pages, 12 figures. Submitted to IEEE Transactions on Neural Networks and Learning Systems (TNNLS)
Adaptive Control Allocation for Underactuated Time-Scale Separated Non-Affine Systems
Many robotic systems are underactuated, meaning not all degrees of freedom can be directly controlled due to lack of actuators, input constraints, or state-dependent actuation. This property, compounded by modeling uncertainties and disturbances, complicates the control design process for trajectory tracking. In this work, we propose an adaptive control architecture for uncertain, nonlinear, underactuated systems with input constraints. Leveraging time-scale separation, we construct a reduced-order model where fast dynamics provide virtual inputs to the slower subsystem and use dynamic control allocation to select the optimal control inputs given the non-affine dynamics. To handle uncertainty, we introduce a state predictor-based adaptive law, and through singular perturbation theory and Lyapunov analysis, we prove stability and bounded tracking of reference trajectories. The proposed method is validated on a VTOL quadplane with nonlinear, state-dependent actuation, demonstrating its utility as a unified controller across various flight regimes, including cruise, landing transition, and hover.
comment: Code https://github.com/dcherenson/adaptive-control-underactuated
Optimal Batched Scheduling of Stochastic Processing Networks Using Atomic Action Decomposition
Stochastic processing networks (SPNs) have broad applications in healthcare, transportation, and communication networks. The control of SPN is to dynamically assign servers in batches under uncertainty to optimize long-run performance. This problem is challenging as the policy dimension grows exponentially with the number of servers, making standard reinforcement learning and policy optimization methods intractable at scale. We propose an atomic action decomposition framework that addresses this scalability challenge by breaking joint assignments into sequential single-server assignments. This yields policies with constant dimension, independent of the number of servers. We study two classes of atomic policies, the step-dependent and step-independent atomic policies, and prove that both achieve the same optimal long-run average reward as the original joint policies. These results establish that computing the optimal SPN control can be made scalable without loss of optimality using the atomic framework. Our results offer theoretical justification for the strong empirical success of the atomic framework in large-scale applications reported in previous articles.
Optimization via a Control-Centric Framework
Optimization plays a central role in intelligent systems and cyber-physical technologies, where the speed and reliability of convergence directly impact performance. In control theory, optimization-centric methods are standard: controllers are designed by repeatedly solving optimization problems, as in linear quadratic regulation, $H_\infty$ control, and model predictive control. In contrast, this paper develops a control-centric framework for optimization itself, where algorithms are constructed directly from Lyapunov stability principles rather than being proposed first and analyzed afterward. A key element is the stationarity vector, which encodes first-order optimality conditions and enables Lyapunov-based convergence analysis. By pairing a Lyapunov function with a selectable decay law, we obtain continuous-time dynamics with guaranteed exponential, finite-time, fixed-time, or prescribed-time convergence. Within this framework, we introduce three feedback realizations of increasing restrictiveness: the Hessian-gradient, Newton, and gradient dynamics. Each realization shapes the decay of the stationarity vector to achieve the desired rate. These constructions unify unconstrained optimization, extend naturally to constrained problems via Lyapunov-consistent primal-dual dynamics, and broaden the results for minimax and generalized Nash equilibrium seeking problems beyond exponential stability. The framework provides systematic design tools for optimization algorithms in control and game-theoretic problems.
comment: This work has been submitted to the IEEE for possible publication. 12 pages, 3 figures
Probabilistic Simulation of Aircraft Descent via a Physics-Informed Machine Learning Approach
This paper presents a method for generating probabilistic descent trajectories in simulations of real-world airspace. A dataset of 116,066 trajectories harvested from Mode S radar returns in UK airspace was used to train and test the model. Thirteen aircraft types with varying performance characteristics were investigated. It was found that the error in the mean prediction of time to reach the bottom of descent for the proposed method was less than that of the the Base of Aircraft Data (BADA) model by a factor of 10. Furthermore, the method was capable of generating a range of trajectories that were similar to the held out test dataset when analysed in distribution. The proposed method is hybrid, with aircraft drag and calibrated airspeed functions generated probabilistically to parameterise the BADA equations, ensuring the physical plausibility of generated trajectories.
Data-Driven Adaptive PID Control Based on Physics-Informed Neural Networks
This article proposes a data-driven PID controller design based on the principle of adaptive gain optimization, leveraging Physics-Informed Neural Networks (PINNs) generated for predictive modeling purposes. The proposed control design method utilizes gradients of the PID gain optimization, achieved through the automatic differentiation of PINNs, to apply model predictive control using a cost function based on tracking error and control inputs. By optimizing PINNs-based PID gains, the method achieves adaptive gain tuning that ensures stability while accounting for system nonlinearities. The proposed method features a systematic framework for integrating PINNs-based models of dynamical control systems into closed-loop control systems, enabling direct application to PID control design. A series of numerical experiments is conducted to demonstrate the effectiveness of the proposed method from the control perspectives based on both time and frequency domains.
comment: This work has been submitted to the IEEE Transactions on Control Systems Technology for possible publication
Robust Sensor Placement for Poisson Arrivals with False Alarm Aware Spatiotemporal Sensing
This paper studies sensor placement when detection performance varies stochastically due to environmental factors over space and time and false alarms are present, but a filter is used to attenuate the effect. We introduce a unified model that couples detection and false alarms through an availability function, which captures how false alarms reduce effective sensing and filtering responses to the disturbance. Building on this model, we give a sufficient condition under which filtering improves detection. In addition, we derive a coverage-based lower bound on the void probability. Furthermore, we prove robustness guarantees showing that performance remains stable when detection probabilities are learned from limited data. We validate the approach with numerical studies using AIS vessel-traffic data and synthetic maritime scenarios. Together, these results provide theory and practical guidance for deploying sensors in dynamic, uncertain environments.
comment: Submitted to IEEE ACC
Regular Pairings for Non-quadratic Lyapunov Functions and Contraction Analysis
Recent studies on stability and contractivity have highlighted the importance of semi-inner products, which we refer to as pairings, associated with general norms. A pairing is a binary operation that relates the derivative of a curve's norm to the radius-vector of the curve and its tangent. This relationship, known as the curve norm derivative formula, is crucial when using the norm as a Lyapunov function. Another important property of the pairing, used in stability and contraction criteria, is the so-called Lumer inequality, which relates the pairing to the induced logarithmic norm. We prove that the curve norm derivative formula and Lumer's inequality are, in fact, equivalent to each other and to several simpler properties. We then introduce and characterize regular pairings that satisfy all of these properties. Our results unify several independent theories of pairings (semi-inner products) developed in previous work on functional analysis and control theory. Additionally, we introduce the polyhedral max pairing and develop computational tools for polyhedral norms, advancing contraction theory in non-Euclidean spaces.
Reconfigurable Intelligent Surface-Assisted Cross-Layer Authentication for Secure and Efficient Vehicular Communications
Intelligent transportation systems increasingly depend on wireless communication for broadcasting traffic messages and facilitating real-time vehicular communication. In this context, message authentication is crucial for establishing secure and reliable communication. However, security solutions must consider the dynamic nature of vehicular communication links, which fluctuate between line-of-sight (LoS) and non-line-of-sight (NLoS) due to obstructions. This paper proposes a lightweight cross-layer authentication scheme that employs public-key infrastructure (PKI)-based authentication for initial legitimacy detection/handshaking while using key-based physical-layer re-authentication for message verification. This approach reduces signature generation and signaling overheads associated with each transmission, thereby enhancing network scalability. However, the receiver operating characteristic (ROC; Pd: detection vs. PFA: false alarm probabilities) of the latter decreases with lower signal-to-noise ratio (SNR). To address this, we investigate the use of reconfigurable intelligent surfaces (RISs) to strengthen the SNR directed toward the designated vehicle in shadowed areas (i.e., NLoS scenarios), thereby improving the ROC. Theoretical analysis and practical implementation are conducted using a 1-bit RIS consisting of 64 x 64 reflective meta-surfaces. Experimental results show a significant improvement in Pd, increasing from 0.82 to 0.96 at SNR = -6 dB for an orthogonal frequency-division multiplexing (OFDM) system with 128 subcarriers. We also conducted informal and formal security analyses using Burrows-Abadi-Needham (BAN) logic to prove the scheme's ability to resist passive and active attacks.
comment: 18 pages, 14 figures and 6 tables
Development of a magnetorheological hand exoskeleton featuring a high force-to-power ratio for enhanced grip endurance
Hand exoskeletons have significant potential in labor-intensive fields by mitigating hand grip fatigue, enhancing hand strength, and preventing injuries. However, most of the traditional hand exoskeletons are driven by motors, whose output force is limited in the constrained installation conditions. Besides, they also come with the disadvantages of high power consumption, complex and bulky assistive systems, and high instability. In this work, we develop a novel hand exoskeleton integrated with innovative magnetorheological (MR) clutches that offers a high force-to-power ratio to improve grip endurance. The clutch features an enhanced structure design, a micro roller enhancing structure, which can significantly boost output forces. The experimental data demonstrate that, when it is supplied with 2 V, the clutch can deliver a peak holding force of 381.15 N-55 times that when no voltage is provided (7 N). In this scenario, it only consumes 1.38 W, yielding a force-to-power ratio of 256.75N/W, which is 2.35 times higher than the best-reported actuator used for hand exoskeletons. This capability enables the designed MRHE to provide approximately 419.79 N support force for gripping. The designed MR hand exoskeleton is highly integrated, comprising an exoskeleton frame, MR clutches, a control unit, and a battery. Evaluations through static grip endurance tests and dynamic carrying and lifting tests confirm that the MR hand exoskeleton can effectively reduce muscle fatigue, extend grip endurance, and minimize injuries. These findings highlight its strong potential for practical applications in repetitive tasks such as carrying and lifting in industrial settings.
HBS -- Hardware Build System: A Tcl-based, minimal common abstraction approach for build system for hardware designs
Build systems become an indispensable part of the software implementation and deployment process. New programming languages are released with the build system integrated into the language tools, for example, Go, Rust, or Zig. However, in the hardware description domain, no official build systems have been released with the predominant Hardware Description Languages (HDL) such as VHDL or SystemVerilog. Moreover, hardware design projects are often multilanguage. The paper proposes a new build system for the hardware description domain. The system is called the Hardware Build System (HBS). The main goals of the system include simplicity, readability, a minimal number of dependencies, and ease of integration with the existing Electronic Design Automation (EDA) tools. The system proposes a novel, minimal common abstraction approach, whose particular implications are described in the article. All the core functionalities are implemented in Tcl. Only the EDA tool's independent features, such as dependency graph generation, are implemented in a Python wrapper.
Distributionally Robust System Level Synthesis With Output Feedback Affine Control Policy
This paper studies the finite-horizon robust optimal control of constrained linear systems subject to model mismatch and additive stochastic disturbances. Utilizing the system level synthesis (SLS) parameterization, we propose a novel SLS design using an output-feedback affine control policy and extend it to a distributionally robust setting to improve system resilience by minimizing the cost function while ensuring constraint satisfaction against the worst-case uncertainty distribution. The scopes of model mismatch and stochastic disturbances are quantified using the 1-norm and a Wasserstein metric-based ambiguity set, respectively. For the closed-loop dynamics, we analyze the distributional shift between the predicted output-input response -- computed using nominal parameters and empirical disturbance samples -- and the actual closed-loop distribution, highlighting its dependence on model mismatch and SLS parameterization. Assuming convex and Lipschitz continuous cost functions and constraints, we derive a tractable reformulation of the distributionally robust SLS (DR-SLS) problem by leveraging tools from robust control and distributionally robust optimization (DRO). Numerical experiments validate the performance and robustness of the proposed approach.
Control of Humanoid Robots with Parallel Mechanisms using Differential Actuation Models
Several recently released humanoid robots, inspired by the mechanical design of Cassie, employ actuator configurations in which the motors are displaced from the joints to reduce leg inertia. While studies accounting for the full kinematic complexity have demonstrated the benefits of these designs, the associated loop-closure constraints greatly increase computational cost and limit their use in control and learning. As a result, the non-linear transmission is often approximated by a constant reduction ratio, preventing exploitation of the mechanism's full capabilities. This paper introduces a compact analytical formulation for the two standard knee and ankle mechanisms that captures the exact non-linear transmission while remaining computationally efficient. The model is fully differentiable up to second order with a minimal formulation, enabling low-cost evaluation of dynamic derivatives for trajectory optimization and of the apparent transmission impedance for reinforcement learning. We integrate this formulation into trajectory optimization and locomotion policy learning, and compare it against simplified constant-ratio approaches. Hardware experiments demonstrate improved accuracy and robustness, showing that the proposed method provides a practical means to incorporate parallel actuation into modern control algorithms.
Sparse dynamic network reconstruction through L1-regularization of a Lyapunov equation
An important problem in many areas of science is that of recovering interaction networks from simultaneous time-series of many interacting dynamical processes. A common approach is to use the elements of the correlation matrix or its inverse as proxies of the interaction strengths, but the reconstructed networks are necessarily undirected. Transfer entropy methods have been proposed to reconstruct directed networks but the reconstructed network lacks information about interaction strengths. We propose a network reconstruction method that inherits the best of the two approaches by reconstructing a directed weighted network from noisy data under the assumption that the network is sparse and the dynamics are governed by a linear (or weakly-nonlinear) stochastic dynamical system. The two steps of our method are i) constructing an (infinite) family of candidate networks by solving the covariance matrix Lyapunov equation for the state matrix and ii) using L1-regularization to select a sparse solution. We further show how to use prior information on the (non)existence of a few directed edges to drastically improve the quality of the reconstruction.
Scalable analysis of stop-and-go waves: Representation, measurements and insights
Analyzing stop-and-go waves at the scale of miles and hours of data is an emerging challenge in traffic research. The past 5 years have seen an explosion in the availability of large-scale traffic data containing traffic waves and complex congestion patterns, making existing approaches unsuitable for repeatable and scalable analysis of traffic waves in these data. This paper makes a first step towards addressing this challenge by introducing an automatic and scalable stop-and-go wave identification method capable of capturing wave generation, propagation, dissipation, as well as bifurcation and merging, which have previously been observed only very rarely. Using a concise and simple critical-speed based definition of a stop-and-go wave, the proposed method identifies all wave boundaries that encompass spatio-temporal points where vehicle speed is below a chosen critical speed. The method is built upon a graph representation of the spatio-temporal points associated with stop-and-go waves, specifically wave front (start) points and wave tail (end) points, and approaches the solution as a graph component identification problem. It enables the measurement of wave properties at scale. The method is implemented in Python and demonstrated on a large-scale dataset, I-24 MOTION INCEPTION. Our results show insights on the complexity of traffic waves. Traffic waves can bifurcate and merge at a scale that has never been observed or described before. The clustering analysis of all the identified wave components reveals the different topological structures of traffic waves. We explored that the wave merge or bifurcation points can be explained by spatial features. The gallery of all the identified wave topologies is demonstrated at https://trafficwaves.github.io/.
OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction
A dominant paradigm for teaching humanoid robots complex skills is to retarget human motions as kinematic references to train reinforcement learning (RL) policies. However, existing retargeting pipelines often struggle with the significant embodiment gap between humans and robots, producing physically implausible artifacts like foot-skating and penetration. More importantly, common retargeting methods neglect the rich human-object and human-environment interactions essential for expressive locomotion and loco-manipulation. To address this, we introduce OmniRetarget, an interaction-preserving data generation engine based on an interaction mesh that explicitly models and preserves the crucial spatial and contact relationships between an agent, the terrain, and manipulated objects. By minimizing the Laplacian deformation between the human and robot meshes while enforcing kinematic constraints, OmniRetarget generates kinematically feasible trajectories. Moreover, preserving task-relevant interactions enables efficient data augmentation, from a single demonstration to different robot embodiments, terrains, and object configurations. We comprehensively evaluate OmniRetarget by retargeting motions from OMOMO, LAFAN1, and our in-house MoCap datasets, generating over 8-hour trajectories that achieve better kinematic constraint satisfaction and contact preservation than widely used baselines. Such high-quality data enables proprioceptive RL policies to successfully execute long-horizon (up to 30 seconds) parkour and loco-manipulation skills on a Unitree G1 humanoid, trained with only 5 reward terms and simple domain randomization shared by all tasks, without any learning curriculum.
comment: Project website: https://omniretarget.github.io
Conformal Robust Control of Linear Systems
End-to-end engineering design pipelines, in which designs are evaluated using concurrently defined optimal controllers, are becoming increasingly common in practice. To discover designs that perform well even under the misspecification of system dynamics, such end-to-end pipelines have now begun evaluating designs with a robust control objective in place of the nominal optimal control setup. Current approaches of specifying such robust control subproblems, however, rely on hand specification of perturbations anticipated to be present upon deployment or margin methods that ignore problem structure, resulting in a lack of theoretical guarantees and overly conservative empirical performance. We, instead, propose a novel methodology for LQR systems that leverages conformal prediction to specify such uncertainty regions in a data-driven fashion. Such regions have distribution-free coverage guarantees on the true system dynamics, in turn allowing for a probabilistic characterization of the regret of the resulting robust controller. We then demonstrate that such a controller can be efficiently produced via a novel policy gradient method that has convergence guarantees. We finally demonstrate the superior empirical performance of our method over alternate robust control specifications, such as $H_{\infty}$ and LQR with multiplicative noise, across a collection of engineering control systems.
Learn to Bid as a Price-Maker Wind Power Producer
Wind power producers (WPPs) participating in short-term power markets face significant imbalance costs due to their non-dispatchable and variable production. While some WPPs have a large enough market share to influence prices with their bidding decisions, existing optimal bidding methods rarely account for this aspect. Price-maker approaches typically model bidding as a bilevel optimization problem, but these methods require complex market models, estimating other participants' actions, and are computationally demanding. To address these challenges, we propose an online learning algorithm that leverages contextual information to optimize WPP bids in the price-maker setting. We formulate the strategic bidding problem as a contextual multi-armed bandit, ensuring provable regret minimization. The algorithm's performance is evaluated against various benchmark strategies using a numerical simulation of the German day-ahead and real-time markets.
Autonomy Architectures for Safe Planning in Unknown Environments Under Budget Constraints
Mission planning can often be formulated as a constrained control problem under multiple path constraints (i.e., safety constraints) and budget constraints (i.e., resource expenditure constraints). In a priori unknown environments, verifying that an offline solution will satisfy the constraints for all time can be difficult, if not impossible. We present ReRoot, a novel sampling-based framework that enforces safety and budget constraints for nonlinear systems in unknown environments. The main idea is that ReRoot grows multiple reverse RRT* trees online, starting from renewal sets, i.e., sets where the budget constraints are renewed. The dynamically feasible backup trajectories guarantee safety and reduce resource expenditure, which provides a principled backup policy when integrated into the gatekeeper safety verification architecture. We demonstrate our approach in simulation with a fixed-wing UAV in a GNSS-denied environment with a budget constraint on localization error that can be renewed at visual landmarks.
comment: Code: https://github.com/dcherenson/budget-constrained-planning
Systems and Control (EESS)
A Genetic Algorithm Approach to Anti-Jamming UAV Swarm Behavior
In recent years, Unmanned Aerial Vehicles (UAVs) have brought a new true revolution to military tactics. While UAVs already constitute an advantage when operating alone, multi-UAV swarms expand the available possibilities, allowing the UAVs to collaborate and support each other as a team to carry out a given task. This entails the capability to exchange information related with situation awareness and action coordination by means of a suitable wireless communication technology. In such scenario, the adversary is expected to disrupt communications by jamming the communication channel. The latter becomes the Achilles heel of the swarm. While anti-jamming techniques constitute a well covered topic in the literature, the use of intelligent swarm behaviors to leverage those techniques is still an open research issue. This paper explores the use of Genetic Algorithms (GAs) to jointly optimize UAV swarm formation, beam-steering antennas and traffic routing in order to mitigate the effect of jamming in the main coordination channel, under the assumption that a more robust and low data rate channel is used for formation management signaling. Simulation results show the effectiveness of proposed approach. However, the significant computational cost paves the way for further research.
comment: 8 pages, conference paper
Stability Preserving Safe Control of a Bicopter
This paper presents a control law for stabilization and trajectory tracking of a multicopter subject to safety constraints. The proposed approach guarantees forward invariance of a prescribed safety set while ensuring smooth tracking performance. Unlike conventional control barrier function methods, the constrained control problem is transformed into an unconstrained one using state-dependent mappings together with carefully constructed Lyapunov functions. This approach enables explicit synthesis of the control law, instead of requiring a solution of constrained optimization at each step. The transformation also enables the controller to enforce safety without sacrificing stability or performance. Simulation results for a polytopic reference trajectory confined within a designated safe region demonstrate the effectiveness of the proposed method.
From Neural Sensing to Stimulation: An Interdisciplinary Roadmap for Neurotechnology
Neurotechnologies are transforming how we measure, interpret, and modulate brain-body interactions, integrating real-time sensing, computation, and stimulation to enable precise physiological control. They hold transformative potential across clinical and non-clinical domains, from treating disorders to enhancing cognition and performance. Realizing this potential requires navigating complex, interdisciplinary challenges spanning neuroscience, materials science, device engineering, signal processing, computational modelling, and regulatory and ethical frameworks. This Perspective presents a strategic roadmap for neurotechnology development, created by early-career researchers, highlighting their role at the intersection of disciplines and their capacity to bridge traditional silos. We identify five cross-cutting trade-offs that constrain progress across functionality, scalability, adaptability, and translatability, and illustrate how technical domains influence their resolution. Rather than a domain-specific review, we focus on shared challenges and strategic opportunities that transcend disciplines. We propose a unified framework for collaborative innovation and education, highlight ethical and regulatory priorities, and outline a timeline for overcoming key bottlenecks. By aligning technical development with translational and societal needs, this roadmap aims to accelerate equitable, effective, and future-ready adaptive neurotechnologies, guiding coordinated efforts across the global research and innovation community.
Identification and optimal control strategies for the transversal splitting of ultra--cold Bose gases
Splitting a Bose--Einstein condensate (BEC) is a key operation in fundamental physics experiments and emerging quantum technologies, where precise preparation of well--defined initial states requires fast yet coherent control of the condensate's nonlinear dynamics. This work formulates the BEC splitting process as an optimal feedforward control problem based on a physically interpretable, reduced--order model identified from limited experimental data. We introduce a systematic calibration strategy that combines optimal experiment selection and constrained nonlinear parameter estimation, enabling accurate system identification with minimal experimental overhead. Using this calibrated model, we compute energy--optimal trajectories via indirect optimal control to realize shortcuts to adiabaticity (STAs), achieving rapid transitions to the ground state of a double--well potential while suppressing excitations. Experiments confirm that the proposed control framework yields high--fidelity state transfers across multiple configurations, demonstrating its robustness and scalability for quantum control applications.
comment: To be published in IEEE Transactions on Control Systems Technology
Mitigating Increase-Decrease Gaming with Alternative Connection Agreements: A Defender-Attacker-Defender Game
Redispatch markets are widely used by system operators to manage network congestion. A well-known drawback, however, is that Flexibility Service Providers (FSPs) may strategically adjust their baselines in anticipation of redispatch actions, thereby aggravating congestion and raising system costs. To address this increase-decrease gaming, Distribution System Operators (DSOs) could use Alternative Connection Agreements (ACAs) to conditionally limit the available connection capacity of market participants in the day-ahead stage. In this paper, we present a novel Defender-Attacker-Defender game to investigate the potential of this approach in distribution networks under load and price uncertainty. We solve the resulting trilevel optimization model using a custom branch-and-bound algorithm, and we demonstrate that it efficiently solves the problem without exploring many nodes in the branch-and-bound search tree for most simulated scenarios. The case study demonstrates that applying ACAs can substantially lower redispatch costs (e.g. by 25%) for the DSO with only a limited impact on FSP profits. The effectiveness of the approach critically depends on how often the DSO can invoke ACAs and on the extent to which the DSO can anticipate strategic bidding behavior of the FSP.
Falsification-Driven Reinforcement Learning for Maritime Motion Planning
Compliance with maritime traffic rules is essential for the safe operation of autonomous vessels, yet training reinforcement learning (RL) agents to adhere to them is challenging. The behavior of RL agents is shaped by the training scenarios they encounter, but creating scenarios that capture the complexity of maritime navigation is non-trivial, and real-world data alone is insufficient. To address this, we propose a falsification-driven RL approach that generates adversarial training scenarios in which the vessel under test violates maritime traffic rules, which are expressed as signal temporal logic specifications. Our experiments on open-sea navigation with two vessels demonstrate that the proposed approach provides more relevant training scenarios and achieves more consistent rule compliance.
Decentralized CBF-based Safety Filters for Collision Avoidance of Cooperative Missile Systems with Input Constraints
This paper presents a decentralized safety filter for collision avoidance in multi-agent aerospace interception scenarios. The approach leverages robust control barrier functions (RCBFs) to guarantee forward invariance of safety sets under bounded inputs and high-relative-degree dynamics. Each effector executes its nominal cooperative guidance command, while a local quadratic program (QP) modifies the input only when necessary. Event-triggered activation based on range and zero-effort miss (ZEM) criteria ensures scalability by restricting active constraints to relevant neighbors. To resolve feasibility issues from simultaneous constraints, a slack-variable relaxation scheme is introduced that prioritizes critical agents in a Pareto-optimal manner. Simulation results in many-on-many interception scenarios demonstrate that the proposed framework maintains collision-free operation with minimal deviation from nominal guidance, providing a computationally efficient and scalable solution for safety-critical multi-agent aerospace systems.
comment: 7 pages, 5 figures
Resilient Multi-Dimensional Consensus and Distributed Optimization against Agent-Based and Denial-of-Service Attacks
In this paper, we consider the resilient multi-dimensional consensus and distributed optimization problems of multi-agent systems (MASs) in the presence of both agent-based and denial-of-service (DoS) attacks. The considered agent-based attacks can cover malicious, Byzantine, and stubborn agents. The links between agents in the network can be blocked by DoS attacks, which may lead the digraph to be time-varying and even disconnected. The objective is to ensure that the remaining benign agents achieve consensus. To this end, an "auxiliary point"-based resilient control algorithm is proposed for MASs. Under the proposed algorithm, each healthy agent constructs a "safe kernel" utilizing the states of its in-neighbors and updates its state toward a specific point within this kernel at each iteration. If an agent cannot receive its neighbors' states owing to DoS attacks, it will use the states received immediately before the DoS period. Moreover, a resilient multi-dimensional distributed optimization (RMDO) algorithm is also proposed. Theoretical proofs and numerical examples are presented to demonstrate the effectiveness of the proposed algorithms.
Delay Independent Safe Control with Neural Networks: Positive Lur'e Certificates for Risk Aware Autonomy
We present a risk-aware safety certification method for autonomous, learning enabled control systems. Focusing on two realistic risks, state/input delays and interval matrix uncertainty, we model the neural network (NN) controller with local sector bounds and exploit positivity structure to derive linear, delay-independent certificates that guarantee local exponential stability across admissible uncertainties. To benchmark performance, we adopt and implement a state-of-the-art IQC NN verification pipeline. On representative cases, our positivity-based tests run orders of magnitude faster than SDP-based IQC while certifying regimes the latter cannot-providing scalable safety guarantees that complement risk-aware control.
comment: Submitted to 2026 American Control Conference (ACC), New Orleans, LA
A Cascade of Systems and the Product of Their $θ$-Symmetric Scaled Relative Graphs
In this paper, we utilize a variant of the scaled relative graph (SRG), referred to as the $\theta$-symmetric SRG, to develop a graphical stability criterion for the feedback interconnection of a cascade of systems. A crucial submultiplicative property of $\theta$-symmetric SRG is established, enabling it to handle cyclic interconnections for which conventional graph separation methods are not applicable. By integrating both gain and refined phase information, the $\theta$-symmetric SRG provides a unified graphical characterization of the system, which better captures system properties and yields less conservative results. In the scalar case, the $\theta$-symmetric SRG can be reduced exactly to the scalar itself, whereas the standard SRG appears to be a conjugate pair. Consequently, the frequency-wise $\theta$-symmetric SRG is more suitable than the standard SRG as a multi-input multi-output extension of the classical Nyquist plot. Illustrative examples are included to demonstrate the effectiveness of the $\theta$-symmetric SRG.
comment: 9 pages, 4 figures
Safe Stabilization of the Stefan Problem with a High-Order Moving Boundary Dynamics by PDE Backstepping
This paper presents a safe stabilization of the Stefan PDE model with a moving boundary governed by a high-order dynamics. We consider a parabolic PDE with a time-varying domain governed by a second-order response with respect to the Neumann boundary value of the PDE state at the moving boundary. The objective is to design a boundary heat flux control to stabilize the moving boundary at a desired setpoint, with satisfying the required conditions of the model on PDE state and the moving boundary. We apply a PDE backstepping method for the control design with considering a constraint on the control law. The PDE and moving boundary constraints are shown to be satisfied by applying the maximum principle for parabolic PDEs. Then the closed-loop system is shown to be globally exponentially stable by performing Lyapunov analysis. The proposed control is implemented in numerical simulation, which illustrates the desired performance in safety and stability. An outline of the extension to third-order moving boundary dynamics is also presented. Code is released at https://github.com/shumon0423/HighOrderStefan_CDC2025.git.
comment: 6 pages, 4 figures, 64th IEEE Conference on Decision and Control (CDC) 2025
Model Predictive Path Integral Control for Roll-to-Roll Manufacturing
Roll-to-roll (R2R) manufacturing is a continuous processing technology essential for scalable production of thin-film materials and printed electronics, but precise control remains challenging due to subsystem interactions, nonlinearities, and process disturbances. This paper proposes a Model Predictive Path Integral (MPPI) control formulation for R2R systems, leveraging a GPU-based Monte-Carlo sampling approach to efficiently approximate optimal controls online. Crucially, MPPI easily handles non-differentiable cost functions, enabling the incorporation of complex performance criteria relevant to advanced manufacturing processes. A case study is presented that demonstrates that MPPI significantly improves tension regulation performance compared to conventional model predictive control (MPC), highlighting its suitability for real-time control in advanced manufacturing.
comment: 6 pages, 4 figures
GATO: GPU-Accelerated and Batched Trajectory Optimization for Scalable Edge Model Predictive Control
While Model Predictive Control (MPC) delivers strong performance across robotics applications, solving the underlying (batches of) nonlinear trajectory optimization (TO) problems online remains computationally demanding. Existing GPU-accelerated approaches typically (i) parallelize a single solve to meet real-time deadlines, (ii) scale to very large batches at slower-than-real-time rates, or (iii) achieve speed by restricting model generality (e.g., point-mass dynamics or a single linearization). This leaves a large gap in solver performance for many state-of-the-art MPC applications that require real-time batches of tens to low-hundreds of solves. As such, we present GATO, an open source, GPU-accelerated, batched TO solver co-designed across algorithm, software, and computational hardware to deliver real-time throughput for these moderate batch size regimes. Our approach leverages a combination of block-, warp-, and thread-level parallelism within and across solves for ultra-high performance. We demonstrate the effectiveness of our approach through a combination of: simulated benchmarks showing speedups of 18-21x over CPU baselines and 1.4-16x over GPU baselines as batch size increases; case studies highlighting improved disturbance rejection and convergence behavior; and finally a validation on hardware using an industrial manipulator. We open source GATO to support reproducibility and adoption.
Accuracy, Memory Efficiency and Generalization: A Comparative Study on Liquid Neural Networks and Recurrent Neural Networks
This review aims to conduct a comparative analysis of liquid neural networks (LNNs) and traditional recurrent neural networks (RNNs) and their variants, such as long short-term memory networks (LSTMs) and gated recurrent units (GRUs). The core dimensions of the analysis include model accuracy, memory efficiency, and generalization ability. By systematically reviewing existing research, this paper explores the basic principles, mathematical models, key characteristics, and inherent challenges of these neural network architectures in processing sequential data. Research findings reveal that LNN, as an emerging, biologically inspired, continuous-time dynamic neural network, demonstrates significant potential in handling noisy, non-stationary data, and achieving out-of-distribution (OOD) generalization. Additionally, some LNN variants outperform traditional RNN in terms of parameter efficiency and computational speed. However, RNN remains a cornerstone in sequence modeling due to its mature ecosystem and successful applications across various tasks. This review identifies the commonalities and differences between LNNs and RNNs, summarizes their respective shortcomings and challenges, and points out valuable directions for future research, particularly emphasizing the importance of improving the scalability of LNNs to promote their application in broader and more complex scenarios.
comment: 13 pages, 12 figures. Submitted to IEEE Transactions on Neural Networks and Learning Systems (TNNLS)
Adaptive Control Allocation for Underactuated Time-Scale Separated Non-Affine Systems
Many robotic systems are underactuated, meaning not all degrees of freedom can be directly controlled due to lack of actuators, input constraints, or state-dependent actuation. This property, compounded by modeling uncertainties and disturbances, complicates the control design process for trajectory tracking. In this work, we propose an adaptive control architecture for uncertain, nonlinear, underactuated systems with input constraints. Leveraging time-scale separation, we construct a reduced-order model where fast dynamics provide virtual inputs to the slower subsystem and use dynamic control allocation to select the optimal control inputs given the non-affine dynamics. To handle uncertainty, we introduce a state predictor-based adaptive law, and through singular perturbation theory and Lyapunov analysis, we prove stability and bounded tracking of reference trajectories. The proposed method is validated on a VTOL quadplane with nonlinear, state-dependent actuation, demonstrating its utility as a unified controller across various flight regimes, including cruise, landing transition, and hover.
comment: Code https://github.com/dcherenson/adaptive-control-underactuated
Optimal Batched Scheduling of Stochastic Processing Networks Using Atomic Action Decomposition
Stochastic processing networks (SPNs) have broad applications in healthcare, transportation, and communication networks. The control of SPN is to dynamically assign servers in batches under uncertainty to optimize long-run performance. This problem is challenging as the policy dimension grows exponentially with the number of servers, making standard reinforcement learning and policy optimization methods intractable at scale. We propose an atomic action decomposition framework that addresses this scalability challenge by breaking joint assignments into sequential single-server assignments. This yields policies with constant dimension, independent of the number of servers. We study two classes of atomic policies, the step-dependent and step-independent atomic policies, and prove that both achieve the same optimal long-run average reward as the original joint policies. These results establish that computing the optimal SPN control can be made scalable without loss of optimality using the atomic framework. Our results offer theoretical justification for the strong empirical success of the atomic framework in large-scale applications reported in previous articles.
Optimization via a Control-Centric Framework
Optimization plays a central role in intelligent systems and cyber-physical technologies, where the speed and reliability of convergence directly impact performance. In control theory, optimization-centric methods are standard: controllers are designed by repeatedly solving optimization problems, as in linear quadratic regulation, $H_\infty$ control, and model predictive control. In contrast, this paper develops a control-centric framework for optimization itself, where algorithms are constructed directly from Lyapunov stability principles rather than being proposed first and analyzed afterward. A key element is the stationarity vector, which encodes first-order optimality conditions and enables Lyapunov-based convergence analysis. By pairing a Lyapunov function with a selectable decay law, we obtain continuous-time dynamics with guaranteed exponential, finite-time, fixed-time, or prescribed-time convergence. Within this framework, we introduce three feedback realizations of increasing restrictiveness: the Hessian-gradient, Newton, and gradient dynamics. Each realization shapes the decay of the stationarity vector to achieve the desired rate. These constructions unify unconstrained optimization, extend naturally to constrained problems via Lyapunov-consistent primal-dual dynamics, and broaden the results for minimax and generalized Nash equilibrium seeking problems beyond exponential stability. The framework provides systematic design tools for optimization algorithms in control and game-theoretic problems.
comment: This work has been submitted to the IEEE for possible publication. 12 pages, 3 figures
Probabilistic Simulation of Aircraft Descent via a Physics-Informed Machine Learning Approach
This paper presents a method for generating probabilistic descent trajectories in simulations of real-world airspace. A dataset of 116,066 trajectories harvested from Mode S radar returns in UK airspace was used to train and test the model. Thirteen aircraft types with varying performance characteristics were investigated. It was found that the error in the mean prediction of time to reach the bottom of descent for the proposed method was less than that of the the Base of Aircraft Data (BADA) model by a factor of 10. Furthermore, the method was capable of generating a range of trajectories that were similar to the held out test dataset when analysed in distribution. The proposed method is hybrid, with aircraft drag and calibrated airspeed functions generated probabilistically to parameterise the BADA equations, ensuring the physical plausibility of generated trajectories.
Data-Driven Adaptive PID Control Based on Physics-Informed Neural Networks
This article proposes a data-driven PID controller design based on the principle of adaptive gain optimization, leveraging Physics-Informed Neural Networks (PINNs) generated for predictive modeling purposes. The proposed control design method utilizes gradients of the PID gain optimization, achieved through the automatic differentiation of PINNs, to apply model predictive control using a cost function based on tracking error and control inputs. By optimizing PINNs-based PID gains, the method achieves adaptive gain tuning that ensures stability while accounting for system nonlinearities. The proposed method features a systematic framework for integrating PINNs-based models of dynamical control systems into closed-loop control systems, enabling direct application to PID control design. A series of numerical experiments is conducted to demonstrate the effectiveness of the proposed method from the control perspectives based on both time and frequency domains.
comment: This work has been submitted to the IEEE Transactions on Control Systems Technology for possible publication
Robust Sensor Placement for Poisson Arrivals with False Alarm Aware Spatiotemporal Sensing
This paper studies sensor placement when detection performance varies stochastically due to environmental factors over space and time and false alarms are present, but a filter is used to attenuate the effect. We introduce a unified model that couples detection and false alarms through an availability function, which captures how false alarms reduce effective sensing and filtering responses to the disturbance. Building on this model, we give a sufficient condition under which filtering improves detection. In addition, we derive a coverage-based lower bound on the void probability. Furthermore, we prove robustness guarantees showing that performance remains stable when detection probabilities are learned from limited data. We validate the approach with numerical studies using AIS vessel-traffic data and synthetic maritime scenarios. Together, these results provide theory and practical guidance for deploying sensors in dynamic, uncertain environments.
comment: Submitted to IEEE ACC
Regular Pairings for Non-quadratic Lyapunov Functions and Contraction Analysis
Recent studies on stability and contractivity have highlighted the importance of semi-inner products, which we refer to as pairings, associated with general norms. A pairing is a binary operation that relates the derivative of a curve's norm to the radius-vector of the curve and its tangent. This relationship, known as the curve norm derivative formula, is crucial when using the norm as a Lyapunov function. Another important property of the pairing, used in stability and contraction criteria, is the so-called Lumer inequality, which relates the pairing to the induced logarithmic norm. We prove that the curve norm derivative formula and Lumer's inequality are, in fact, equivalent to each other and to several simpler properties. We then introduce and characterize regular pairings that satisfy all of these properties. Our results unify several independent theories of pairings (semi-inner products) developed in previous work on functional analysis and control theory. Additionally, we introduce the polyhedral max pairing and develop computational tools for polyhedral norms, advancing contraction theory in non-Euclidean spaces.
Reconfigurable Intelligent Surface-Assisted Cross-Layer Authentication for Secure and Efficient Vehicular Communications
Intelligent transportation systems increasingly depend on wireless communication for broadcasting traffic messages and facilitating real-time vehicular communication. In this context, message authentication is crucial for establishing secure and reliable communication. However, security solutions must consider the dynamic nature of vehicular communication links, which fluctuate between line-of-sight (LoS) and non-line-of-sight (NLoS) due to obstructions. This paper proposes a lightweight cross-layer authentication scheme that employs public-key infrastructure (PKI)-based authentication for initial legitimacy detection/handshaking while using key-based physical-layer re-authentication for message verification. This approach reduces signature generation and signaling overheads associated with each transmission, thereby enhancing network scalability. However, the receiver operating characteristic (ROC; Pd: detection vs. PFA: false alarm probabilities) of the latter decreases with lower signal-to-noise ratio (SNR). To address this, we investigate the use of reconfigurable intelligent surfaces (RISs) to strengthen the SNR directed toward the designated vehicle in shadowed areas (i.e., NLoS scenarios), thereby improving the ROC. Theoretical analysis and practical implementation are conducted using a 1-bit RIS consisting of 64 x 64 reflective meta-surfaces. Experimental results show a significant improvement in Pd, increasing from 0.82 to 0.96 at SNR = -6 dB for an orthogonal frequency-division multiplexing (OFDM) system with 128 subcarriers. We also conducted informal and formal security analyses using Burrows-Abadi-Needham (BAN) logic to prove the scheme's ability to resist passive and active attacks.
comment: 18 pages, 14 figures and 6 tables
Development of a magnetorheological hand exoskeleton featuring a high force-to-power ratio for enhanced grip endurance
Hand exoskeletons have significant potential in labor-intensive fields by mitigating hand grip fatigue, enhancing hand strength, and preventing injuries. However, most of the traditional hand exoskeletons are driven by motors, whose output force is limited in the constrained installation conditions. Besides, they also come with the disadvantages of high power consumption, complex and bulky assistive systems, and high instability. In this work, we develop a novel hand exoskeleton integrated with innovative magnetorheological (MR) clutches that offers a high force-to-power ratio to improve grip endurance. The clutch features an enhanced structure design, a micro roller enhancing structure, which can significantly boost output forces. The experimental data demonstrate that, when it is supplied with 2 V, the clutch can deliver a peak holding force of 381.15 N-55 times that when no voltage is provided (7 N). In this scenario, it only consumes 1.38 W, yielding a force-to-power ratio of 256.75N/W, which is 2.35 times higher than the best-reported actuator used for hand exoskeletons. This capability enables the designed MRHE to provide approximately 419.79 N support force for gripping. The designed MR hand exoskeleton is highly integrated, comprising an exoskeleton frame, MR clutches, a control unit, and a battery. Evaluations through static grip endurance tests and dynamic carrying and lifting tests confirm that the MR hand exoskeleton can effectively reduce muscle fatigue, extend grip endurance, and minimize injuries. These findings highlight its strong potential for practical applications in repetitive tasks such as carrying and lifting in industrial settings.
HBS -- Hardware Build System: A Tcl-based, minimal common abstraction approach for build system for hardware designs
Build systems become an indispensable part of the software implementation and deployment process. New programming languages are released with the build system integrated into the language tools, for example, Go, Rust, or Zig. However, in the hardware description domain, no official build systems have been released with the predominant Hardware Description Languages (HDL) such as VHDL or SystemVerilog. Moreover, hardware design projects are often multilanguage. The paper proposes a new build system for the hardware description domain. The system is called the Hardware Build System (HBS). The main goals of the system include simplicity, readability, a minimal number of dependencies, and ease of integration with the existing Electronic Design Automation (EDA) tools. The system proposes a novel, minimal common abstraction approach, whose particular implications are described in the article. All the core functionalities are implemented in Tcl. Only the EDA tool's independent features, such as dependency graph generation, are implemented in a Python wrapper.
Distributionally Robust System Level Synthesis With Output Feedback Affine Control Policy
This paper studies the finite-horizon robust optimal control of constrained linear systems subject to model mismatch and additive stochastic disturbances. Utilizing the system level synthesis (SLS) parameterization, we propose a novel SLS design using an output-feedback affine control policy and extend it to a distributionally robust setting to improve system resilience by minimizing the cost function while ensuring constraint satisfaction against the worst-case uncertainty distribution. The scopes of model mismatch and stochastic disturbances are quantified using the 1-norm and a Wasserstein metric-based ambiguity set, respectively. For the closed-loop dynamics, we analyze the distributional shift between the predicted output-input response -- computed using nominal parameters and empirical disturbance samples -- and the actual closed-loop distribution, highlighting its dependence on model mismatch and SLS parameterization. Assuming convex and Lipschitz continuous cost functions and constraints, we derive a tractable reformulation of the distributionally robust SLS (DR-SLS) problem by leveraging tools from robust control and distributionally robust optimization (DRO). Numerical experiments validate the performance and robustness of the proposed approach.
Control of Humanoid Robots with Parallel Mechanisms using Differential Actuation Models
Several recently released humanoid robots, inspired by the mechanical design of Cassie, employ actuator configurations in which the motors are displaced from the joints to reduce leg inertia. While studies accounting for the full kinematic complexity have demonstrated the benefits of these designs, the associated loop-closure constraints greatly increase computational cost and limit their use in control and learning. As a result, the non-linear transmission is often approximated by a constant reduction ratio, preventing exploitation of the mechanism's full capabilities. This paper introduces a compact analytical formulation for the two standard knee and ankle mechanisms that captures the exact non-linear transmission while remaining computationally efficient. The model is fully differentiable up to second order with a minimal formulation, enabling low-cost evaluation of dynamic derivatives for trajectory optimization and of the apparent transmission impedance for reinforcement learning. We integrate this formulation into trajectory optimization and locomotion policy learning, and compare it against simplified constant-ratio approaches. Hardware experiments demonstrate improved accuracy and robustness, showing that the proposed method provides a practical means to incorporate parallel actuation into modern control algorithms.
Sparse dynamic network reconstruction through L1-regularization of a Lyapunov equation
An important problem in many areas of science is that of recovering interaction networks from simultaneous time-series of many interacting dynamical processes. A common approach is to use the elements of the correlation matrix or its inverse as proxies of the interaction strengths, but the reconstructed networks are necessarily undirected. Transfer entropy methods have been proposed to reconstruct directed networks but the reconstructed network lacks information about interaction strengths. We propose a network reconstruction method that inherits the best of the two approaches by reconstructing a directed weighted network from noisy data under the assumption that the network is sparse and the dynamics are governed by a linear (or weakly-nonlinear) stochastic dynamical system. The two steps of our method are i) constructing an (infinite) family of candidate networks by solving the covariance matrix Lyapunov equation for the state matrix and ii) using L1-regularization to select a sparse solution. We further show how to use prior information on the (non)existence of a few directed edges to drastically improve the quality of the reconstruction.
Scalable analysis of stop-and-go waves: Representation, measurements and insights
Analyzing stop-and-go waves at the scale of miles and hours of data is an emerging challenge in traffic research. The past 5 years have seen an explosion in the availability of large-scale traffic data containing traffic waves and complex congestion patterns, making existing approaches unsuitable for repeatable and scalable analysis of traffic waves in these data. This paper makes a first step towards addressing this challenge by introducing an automatic and scalable stop-and-go wave identification method capable of capturing wave generation, propagation, dissipation, as well as bifurcation and merging, which have previously been observed only very rarely. Using a concise and simple critical-speed based definition of a stop-and-go wave, the proposed method identifies all wave boundaries that encompass spatio-temporal points where vehicle speed is below a chosen critical speed. The method is built upon a graph representation of the spatio-temporal points associated with stop-and-go waves, specifically wave front (start) points and wave tail (end) points, and approaches the solution as a graph component identification problem. It enables the measurement of wave properties at scale. The method is implemented in Python and demonstrated on a large-scale dataset, I-24 MOTION INCEPTION. Our results show insights on the complexity of traffic waves. Traffic waves can bifurcate and merge at a scale that has never been observed or described before. The clustering analysis of all the identified wave components reveals the different topological structures of traffic waves. We explored that the wave merge or bifurcation points can be explained by spatial features. The gallery of all the identified wave topologies is demonstrated at https://trafficwaves.github.io/.
OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction
A dominant paradigm for teaching humanoid robots complex skills is to retarget human motions as kinematic references to train reinforcement learning (RL) policies. However, existing retargeting pipelines often struggle with the significant embodiment gap between humans and robots, producing physically implausible artifacts like foot-skating and penetration. More importantly, common retargeting methods neglect the rich human-object and human-environment interactions essential for expressive locomotion and loco-manipulation. To address this, we introduce OmniRetarget, an interaction-preserving data generation engine based on an interaction mesh that explicitly models and preserves the crucial spatial and contact relationships between an agent, the terrain, and manipulated objects. By minimizing the Laplacian deformation between the human and robot meshes while enforcing kinematic constraints, OmniRetarget generates kinematically feasible trajectories. Moreover, preserving task-relevant interactions enables efficient data augmentation, from a single demonstration to different robot embodiments, terrains, and object configurations. We comprehensively evaluate OmniRetarget by retargeting motions from OMOMO, LAFAN1, and our in-house MoCap datasets, generating over 8-hour trajectories that achieve better kinematic constraint satisfaction and contact preservation than widely used baselines. Such high-quality data enables proprioceptive RL policies to successfully execute long-horizon (up to 30 seconds) parkour and loco-manipulation skills on a Unitree G1 humanoid, trained with only 5 reward terms and simple domain randomization shared by all tasks, without any learning curriculum.
comment: Project website: https://omniretarget.github.io
Conformal Robust Control of Linear Systems
End-to-end engineering design pipelines, in which designs are evaluated using concurrently defined optimal controllers, are becoming increasingly common in practice. To discover designs that perform well even under the misspecification of system dynamics, such end-to-end pipelines have now begun evaluating designs with a robust control objective in place of the nominal optimal control setup. Current approaches of specifying such robust control subproblems, however, rely on hand specification of perturbations anticipated to be present upon deployment or margin methods that ignore problem structure, resulting in a lack of theoretical guarantees and overly conservative empirical performance. We, instead, propose a novel methodology for LQR systems that leverages conformal prediction to specify such uncertainty regions in a data-driven fashion. Such regions have distribution-free coverage guarantees on the true system dynamics, in turn allowing for a probabilistic characterization of the regret of the resulting robust controller. We then demonstrate that such a controller can be efficiently produced via a novel policy gradient method that has convergence guarantees. We finally demonstrate the superior empirical performance of our method over alternate robust control specifications, such as $H_{\infty}$ and LQR with multiplicative noise, across a collection of engineering control systems.
Learn to Bid as a Price-Maker Wind Power Producer
Wind power producers (WPPs) participating in short-term power markets face significant imbalance costs due to their non-dispatchable and variable production. While some WPPs have a large enough market share to influence prices with their bidding decisions, existing optimal bidding methods rarely account for this aspect. Price-maker approaches typically model bidding as a bilevel optimization problem, but these methods require complex market models, estimating other participants' actions, and are computationally demanding. To address these challenges, we propose an online learning algorithm that leverages contextual information to optimize WPP bids in the price-maker setting. We formulate the strategic bidding problem as a contextual multi-armed bandit, ensuring provable regret minimization. The algorithm's performance is evaluated against various benchmark strategies using a numerical simulation of the German day-ahead and real-time markets.
Autonomy Architectures for Safe Planning in Unknown Environments Under Budget Constraints
Mission planning can often be formulated as a constrained control problem under multiple path constraints (i.e., safety constraints) and budget constraints (i.e., resource expenditure constraints). In a priori unknown environments, verifying that an offline solution will satisfy the constraints for all time can be difficult, if not impossible. We present ReRoot, a novel sampling-based framework that enforces safety and budget constraints for nonlinear systems in unknown environments. The main idea is that ReRoot grows multiple reverse RRT* trees online, starting from renewal sets, i.e., sets where the budget constraints are renewed. The dynamically feasible backup trajectories guarantee safety and reduce resource expenditure, which provides a principled backup policy when integrated into the gatekeeper safety verification architecture. We demonstrate our approach in simulation with a fixed-wing UAV in a GNSS-denied environment with a budget constraint on localization error that can be renewed at visual landmarks.
comment: Code: https://github.com/dcherenson/budget-constrained-planning
Systems and Control (CS)
Multi-Segment Photonic Power Converters for Energy Harvesting and High-Speed Optical Wireless Communication
The demand for energy-efficient high-speed wireless communication, coupled with the rapid rise of IoT devices, requires systems that integrate power harvesting with optical data reception to eliminate the need for charging or battery replacements. Recent advances have explored the use of solar cells as optical receivers for high-speed data detection alongside power harvesting. \acs{GaAs}-based \acp{PPC} provide six times greater electron mobility than silicon- or cadmium telluride-based cells, enabling faster data detection and improved power efficiency. However, their bandwidth is constrained by junction capacitance, which increases with active area, creating a trade-off between power output and data rate. To address this, we propose and test multi-segment \acs{GaAs}-based \Acp{PPC} that serve as both energy harvesters and data detectors. By segmenting the active area into 2, 4, or 6 subcells, forming circular areas with diameters of 1, 1.5, or 2.08~mm, we reduce capacitance and boost bandwidth while preserving light collection. Fabricated on a semi-insulating \ac{GaAs} substrate with etched trenches for electrical isolation, the series-connected subcells optimize absorption and minimize parasitic effects. The \Acp{PPC} were used for an eye-safe 1.5~m optical wireless link, employing \ac{OFDM} with adaptive bit and power loading. The system achieved a world record data rate of 3.8~Gbps, which is four times higher than prior works. The system converts 39.7\% of optical power from a beam of 2.3~mW, although the segmentation increases the sensitivity of the alignment. These findings provide new solutions for off-grid backhaul for future communication networks, such as 6th generation (6G) cellular.
Differentiable Model Predictive Control on the GPU
Differentiable model predictive control (MPC) offers a powerful framework for combining learning and control. However, its adoption has been limited by the inherently sequential nature of traditional optimization algorithms, which are challenging to parallelize on modern computing hardware like GPUs. In this work, we tackle this bottleneck by introducing a GPU-accelerated differentiable optimization tool for MPC. This solver leverages sequential quadratic programming and a custom preconditioned conjugate gradient (PCG) routine with tridiagonal preconditioning to exploit the problem's structure and enable efficient parallelization. We demonstrate substantial speedups over CPU- and GPU-based baselines, significantly improving upon state-of-the-art training times on benchmark reinforcement learning and imitation learning tasks. Finally, we showcase the method on the challenging task of reinforcement learning for driving at the limits of handling, where it enables robust drifting of a Toyota Supra through water puddles.
Learning Mixtures of Linear Dynamical Systems (MoLDS) via Hybrid Tensor-EM Method
Mixtures of linear dynamical systems (MoLDS) provide a path to model time-series data that exhibit diverse temporal dynamics across trajectories. However, its application remains challenging in complex and noisy settings, limiting its effectiveness for neural data analysis. Tensor-based moment methods can provide global identifiability guarantees for MoLDS, but their performance degrades under noise and complexity. Commonly used expectation-maximization (EM) methods offer flexibility in fitting latent models but are highly sensitive to initialization and prone to poor local minima. Here, we propose a tensor-based method that provides identifiability guarantees for learning MoLDS, which is followed by EM updates to combine the strengths of both approaches. The novelty in our approach lies in the construction of moment tensors using the input-output data to recover globally consistent estimates of mixture weights and system parameters. These estimates can then be refined through a Kalman EM algorithm, with closed-form updates for all LDS parameters. We validate our framework on synthetic benchmarks and real-world datasets. On synthetic data, the proposed Tensor-EM method achieves more reliable recovery and improved robustness compared to either pure tensor or randomly initialized EM methods. We then analyze neural recordings from the primate somatosensory cortex while a non-human primate performs reaches in different directions. Our method successfully models and clusters different conditions as separate subsystems, consistent with supervised single-LDS fits for each condition. Finally, we apply this approach to another neural dataset where monkeys perform a sequential reaching task. These results demonstrate that MoLDS provides an effective framework for modeling complex neural data, and that Tensor-EM is a reliable approach to MoLDS learning for these applications.
comment: 20 pages, 7 figures
Toward Model Matching for Remotely Controlled Differential Drive Robotic Vehicles
The problem of regulation of the orientation angle of a remotely controlled differential-drive mobile robot with actuator dynamics and network-induced delays is studied. Using a preinstalled two-layer nonlinear control scheme that decouples linear and angular velocities and regulates heading, a third, delay-dependent layer that achieves exact model matching from the orientation angle command to the orientation angle is introduced. The proposed outer loop controller is a delay dependent dynamic measurable output-feedback controller with dynamic proper precompensator. Parameterization yields a simple characteristic quasi-polynomial with coefficients constrained to satisfy stability for all delays up to a computable bound. Computational experiments confirm accurate tracking, fast settling and bounded internal signals and control voltages. The approach offers an analytic design alternative to AI-based tuning for delayed robotic systems.
Optimal Batched Scheduling of Stochastic Processing Networks Using Atomic Action Decomposition
Stochastic processing networks (SPNs) have broad applications in healthcare, transportation, and communication networks. The control of SPN is to dynamically assign servers in batches under uncertainty to optimize long-run performance. This problem is challenging as the policy dimension grows exponentially with the number of servers, making standard reinforcement learning and policy optimization methods intractable at scale. We propose an atomic action decomposition framework that addresses this scalability challenge by breaking joint assignments into sequential single-server assignments. This yields policies with constant dimension, independent of the number of servers. We study two classes of atomic policies, the step-dependent and step-independent atomic policies, and prove that both achieve the same optimal long-run average reward as the original joint policies. These results establish that computing the optimal SPN control can be made scalable without loss of optimality using the atomic framework. Our results offer theoretical justification for the strong empirical success of the atomic framework in large-scale applications reported in previous articles.
Hybrid Quantum-Classical Policy Gradient for Adaptive Control of Cyber-Physical Systems: A Comparative Study of VQC vs. MLP
The comparative evaluation between classical and quantum reinforcement learning (QRL) paradigms was conducted to investigate their convergence behavior, robustness under observational noise, and computational efficiency in a benchmark control environment. The study employed a multilayer perceptron (MLP) agent as a classical baseline and a parameterized variational quantum circuit (VQC) as a quantum counterpart, both trained on the CartPole-v1 environment over 500 episodes. Empirical results demonstrated that the classical MLP achieved near-optimal policy convergence with a mean return of 498.7 +/- 3.2, maintaining stable equilibrium throughout training. In contrast, the VQC exhibited limited learning capability, with an average return of 14.6 +/- 4.8, primarily constrained by circuit depth and qubit connectivity. Noise robustness analysis further revealed that the MLP policy deteriorated gracefully under Gaussian perturbations, while the VQC displayed higher sensitivity at equivalent noise levels. Despite the lower asymptotic performance, the VQC exhibited significantly lower parameter count and marginally increased training time, highlighting its potential scalability for low-resource quantum processors. The results suggest that while classical neural policies remain dominant in current control benchmarks, quantum-enhanced architectures could offer promising efficiency advantages once hardware noise and expressivity limitations are mitigated.
comment: 6 pages, 5 figures, 2 tables, 17 equations, 1 algorithm
Distributed Platoon Control Under Quantization: Stability Analysis and Privacy Preservation
Distributed control of connected and automated vehicles has attracted considerable interest for its potential to improve traffic efficiency and safety. However, such control schemes require sharing privacy-sensitive vehicle data, which introduces risks of information leakage and potential malicious activities. This paper investigates the stability and privacy-preserving properties of distributed platoon control under two types of quantizers: deterministic and probabilistic. For deterministic quantization, we show that the resulting control strategy ensures the system errors remain uniformly ultimately bounded. Moreover, in the absence of auxiliary information, an eavesdropper cannot uniquely infer sensitive vehicle states. In contrast, the use of probabilistic quantization enables asymptotic convergence of the vehicle platoon in expectation with bounded variance. Importantly, probabilistic quantizers can satisfy differential privacy guarantees, thereby preserving privacy even when the eavesdropper possesses arbitrary auxiliary information. We further analyze the trade-off between control performance and privacy by formulating an optimization problem that characterizes the impact of the quantization step on both metrics. Numerical simulations are provided to illustrate the performance differences between the two quantization strategies.
comment: 12 pages, 6 figures
Safe Landing on Small Celestial Bodies with Gravitational Uncertainty Using Disturbance Estimation and Control Barrier Functions
Soft landing on small celestial bodies (SCBs) poses unique challenges, as uncertainties in gravitational models and poorly characterized, dynamic environments require a high level of autonomy. Existing control approaches lack formal guarantees for safety constraint satisfaction, necessary to ensure the safe execution of the maneuvers. This paper introduces a control that addresses this limitation by integrating trajectory tracking, disturbance estimation, and safety enforcement. An extended high-gain observer is employed to estimate disturbances resulting from gravitational model uncertainties. We then apply a feedback-linearizing and disturbance-canceling controller that achieves exponential tracking of reference trajectories. Finally, we use a control barrier function based minimum-intervention controller to enforce state and input constraints through out the maneuver execution. This control combines trajectory tracking of offline generated reference trajectories with formal guarantees of safety, which follows common guidance and control architectures for spacecraft and allows aggressive maneuvers to be executed without compromising safety. Numerical simulations using fuel-optimal trajectories demonstrate the effectiveness of the controller in achieving precise and safe soft-landing, highlighting its potential for autonomous SCB missions.
Human-in-the-loop Optimisation in Robot-assisted Gait Training
Wearable robots offer a promising solution for quantitatively monitoring gait and providing systematic, adaptive assistance to promote patient independence and improve gait. However, due to significant interpersonal and intrapersonal variability in walking patterns, it is important to design robot controllers that can adapt to the unique characteristics of each individual. This paper investigates the potential of human-in-the-loop optimisation (HILO) to deliver personalised assistance in gait training. The Covariance Matrix Adaptation Evolution Strategy (CMA-ES) was employed to continuously optimise an assist-as-needed controller of a lower-limb exoskeleton. Six healthy individuals participated over a two-day experiment. Our results suggest that while the CMA-ES appears to converge to a unique set of stiffnesses for each individual, no measurable impact on the subjects' performance was observed during the validation trials. These findings highlight the impact of human-robot co-adaptation and human behaviour variability, whose effect may be greater than potential benefits of personalising rule-based assistive controllers. Our work contributes to understanding the limitations of current personalisation approaches in exoskeleton-assisted gait rehabilitation and identifies key challenges for effective implementation of human-in-the-loop optimisation in this domain.
Federated Split Learning for Resource-Constrained Robots in Industrial IoT: Framework Comparison, Optimization Strategies, and Future Directions
Federated split learning (FedSL) has emerged as a promising paradigm for enabling collaborative intelligence in industrial Internet of Things (IoT) systems, particularly in smart factories where data privacy, communication efficiency, and device heterogeneity are critical concerns. In this article, we present a comprehensive study of FedSL frameworks tailored for resource-constrained robots in industrial scenarios. We compare synchronous, asynchronous, hierarchical, and heterogeneous FedSL frameworks in terms of workflow, scalability, adaptability, and limitations under dynamic industrial conditions. Furthermore, we systematically categorize token fusion strategies into three paradigms: input-level (pre-fusion), intermediate-level (intra-fusion), and output-level (post-fusion), and summarize their respective strengths in industrial applications. We also provide adaptive optimization techniques to enhance the efficiency and feasibility of FedSL implementation, including model compression, split layer selection, computing frequency allocation, and wireless resource management. Simulation results validate the performance of these frameworks under industrial detection scenarios. Finally, we outline open issues and research directions of FedSL in future smart manufacturing systems.
comment: 9 pages, 5 figures, submitted to the IEEE magazine
Sample-Efficient and Smooth Cross-Entropy Method Model Predictive Control Using Deterministic Samples
Cross-entropy method model predictive control (CEM--MPC) is a powerful gradient-free technique for nonlinear optimal control, but its performance is often limited by the reliance on random sampling. This conventional approach can lead to inefficient exploration of the solution space and non-smooth control inputs, requiring a large number of samples to achieve satisfactory results. To address these limitations, we propose deterministic sampling CEM (dsCEM), a novel framework that replaces the random sampling step with deterministic samples derived from localized cumulative distributions (LCDs). Our approach introduces modular schemes to generate and adapt these sample sets, incorporating temporal correlations to ensure smooth control trajectories. This method can be used as a drop-in replacement for the sampling step in existing CEM-based controllers. Experimental evaluations on two nonlinear control tasks demonstrate that dsCEM consistently outperforms state-of-the-art iCEM in terms of cumulative cost and control input smoothness, particularly in the critical low-sample regime.
Generative AI-Driven Hierarchical Multi-Agent Framework for Zero-Touch Optical Networks
The rapid development of Generative Artificial Intelligence (GenAI) has catalyzed a transformative technological revolution across all walks of life. As the backbone of wideband communication, optical networks are expecting high-level autonomous operation and zero-touch management to accommodate their expanding network scales and escalating transmission bandwidth. The integration of GenAI is deemed as the pivotal solution for realizing zero-touch optical networks. However, the lifecycle management of optical networks involves a multitude of tasks and necessitates seamless collaboration across multiple layers, which poses significant challenges to the existing single-agent GenAI systems. In this paper, we propose a GenAI-driven hierarchical multi-agent framework designed to streamline multi-task autonomous execution for zero-touch optical networks. We present the architecture, implementation, and applications of this framework. A field-deployed mesh network is utilized to demonstrate three typical scenarios throughout the lifecycle of optical network: quality of transmission estimation in the planning stage, dynamic channel adding/dropping in the operation stage, and system capacity increase in the upgrade stage. The case studies, illustrate the capabilities of multi-agent framework in multi-task allocation, coordination, execution, evaluation, and summarization. This work provides a promising approach for the future development of intelligent, efficient, and collaborative network management solutions, paving the way for more specialized and adaptive zero-touch optical networks.
comment: 7 pages,6 figures, Accepted by lEEE Communications Magazine, Open call
GO-Flock: Goal-Oriented Flocking in 3D Unknown Environments with Depth Maps
Artificial Potential Field (APF) methods are widely used for reactive flocking control, but they often suffer from challenges such as deadlocks and local minima, especially in the presence of obstacles. Existing solutions to address these issues are typically passive, leading to slow and inefficient collective navigation. As a result, many APF approaches have only been validated in obstacle-free environments or simplified, pseudo 3D simulations. This paper presents GO-Flock, a hybrid flocking framework that integrates planning with reactive APF-based control. GO-Flock consists of an upstream Perception Module, which processes depth maps to extract waypoints and virtual agents for obstacle avoidance, and a downstream Collective Navigation Module, which applies a novel APF strategy to achieve effective flocking behavior in cluttered environments. We evaluate GO-Flock against passive APF-based approaches to demonstrate their respective merits, such as their flocking behavior and the ability to overcome local minima. Finally, we validate GO-Flock through obstacle-filled environment and also hardware-in-the-loop experiments where we successfully flocked a team of nine drones, six physical and three virtual, in a forest environment.
Real-Time Glass Detection and Reprojection using Sensor Fusion Onboard Aerial Robots ICRA 2026
Autonomous aerial robots are increasingly being deployed in real-world scenarios, where transparent obstacles present significant challenges to reliable navigation and mapping. These materials pose a unique problem for traditional perception systems because they lack discernible features and can cause conventional depth sensors to fail, leading to inaccurate maps and potential collisions. To ensure safe navigation, robots must be able to accurately detect and map these transparent obstacles. Existing methods often rely on large, expensive sensors or algorithms that impose high computational burdens, making them unsuitable for low Size, Weight, and Power (SWaP) robots. In this work, we propose a novel and computationally efficient framework for detecting and mapping transparent obstacles onboard a sub-300g quadrotor. Our method fuses data from a Time-of-Flight (ToF) camera and an ultrasonic sensor with a custom, lightweight 2D convolution model. This specialized approach accurately detects specular reflections and propagates their depth into corresponding empty regions of the depth map, effectively rendering transparent obstacles visible. The entire pipeline operates in real-time, utilizing only a small fraction of a CPU core on an embedded processor. We validate our system through a series of experiments in both controlled and real-world environments, demonstrating the utility of our method through experiments where the robot maps indoor environments containing glass. Our work is, to our knowledge, the first of its kind to demonstrate a real-time, onboard transparent obstacle mapping system on a low-SWaP quadrotor using only the CPU.
comment: 8 pages, 8 figures, submitted to ICRA 2026
Terrain-Aided Navigation Using a Point Cloud Measurement Sensor
We investigate the use of a point cloud measurement in terrain-aided navigation. Our goal is to aid an inertial navigation system, by exploring ways to generate a useful measurement innovation error for effective nonlinear state estimation. We compare two such measurement models that involve the scanning of a digital terrain elevation model: a) one that is based on typical ray-casting from a given pose, that returns the predicted point cloud measurement from that pose, and b) another computationally less intensive one that does not require raycasting and we refer to herein as a sliding grid. Besides requiring a pose, it requires the pattern of the point cloud measurement itself and returns a predicted point cloud measurement. We further investigate the observability properties of the altitude for both measurement models. As a baseline, we compare the use of a point cloud measurement performance to the use of a radar altimeter and show the gains in accuracy. We conclude by showing that a point cloud measurement outperforms the use of a radar altimeter, and the point cloud measurement model to use depends on the computational resources
Three-dimensional Integrated Guidance and Control for Leader-Follower Flexible Formation of Fixed Wing UAVs
This paper presents a nonlinear integrated guidance and control (IGC) approach for flexible leader-follower formation flight of fixed-wing unmanned aerial vehicles (UAVs) while accounting for high-fidelity aerodynamics and thrust dynamics. Unlike conventional leader-follower schemes that fix the follower's position relative to the leader, the follower is steered to maintain range and bearing angles (which is the angle between its velocity vector and its line-of-sight (LOS) with respect to the leader) arbitrarily close to the prescribed values, enabling the follower to maintain formation on a hemispherical region behind the leader. The proposed IGC framework directly maps leader-follower relative range dynamics to throttle commands, and the follower's velocity orientation relative to the LOS to aerodynamic control surface deflections. This enables synergism between guidance and control subsystems. The control design uses a dynamic surface control-based backstepping approach to achieve convergence to the desired formation set, where Lyapunov barrier functions are incorporated to ensure the follower's bearing angle is constrained within specified bounds. Rigorous stability analysis guarantees uniform ultimate boundedness of all error states and strict constraint satisfaction in the presence of aerodynamic nonlinearities. The proposed flexible formation scheme allows the follower to have an orientation mismatch relative to the leader to execute anticipatory reconfiguration by transitioning between the relative positions in the admissible formation set when the leader aggressively maneuvers. The proposed IGC law relies only on relative information and onboard sensors without the information about the leader's maneuver, making it suitable for GPS-denied or non-cooperative scenarios. Finally, we present simulation results to vindicate the effectiveness and robustness of our approach.
Comparing Normal Form Representations for Station-Keeping near Cislunar Libration Points
The normal forms provide useful approximations for many trajectories of interest within the circular restricted three-body problem. This paper aims to thoroughly compare two of these forms: the Birkhoff normal form and the resonant normal form, highlighting the strengths of each for the representation of center manifold trajectories. A method of station-keeping is introduced, analogous to Floquet modes, in which the unstable component is minimized at specific points along a trajectory through impulsive maneuvers. Three different formulations of the same station-keeping approach are posed, collectively spanning Lyapunov, vertical, and halo orbits, as well as Lissajous and quasihalo trajectories.
comment: 2025 AAS/AIAA Space Flight Mechanics Meeting
Time-causal and time-recursive wavelets
When to apply wavelet analysis to real-time temporal signals, where the future cannot be accessed, it is essential to base all the steps in the signal processing pipeline on computational mechanisms that are truly time-causal. This paper describes how a time-causal wavelet analysis can be performed based on concepts developed in the area of temporal scale-space theory, originating from a complete classification of temporal smoothing kernels that guarantee non-creation of new structures from finer to coarser temporal scale levels. By necessity, convolution with truncated exponential kernels in cascade constitutes the only permissable class of kernels, as well as their temporal derivatives as a natural complement to fulfil the admissibility conditions of wavelet representations. For a particular way of choosing the time constants in the resulting infinite convolution of truncated exponential kernels, to ensure temporal scale covariance and thus self-similarity over temporal scales, we describe how mother wavelets can be chosen as temporal derivatives of the resulting time-causal limit kernel. By developing connections between wavelet theory and scale-space theory, we characterize and quantify how the continuous scaling properties transfer to the discrete implementation, demonstrating how the proposed time-causal wavelet representation can reflect the duration of locally dominant temporal structures in the input signals. We propose that this notion of time-causal wavelet analysis could be a valuable tool for signal processing tasks, where streams of signals are to be processed in real time, specifically for signals that may contain local variations over a rich span of temporal scales, or more generally for analysing physical or biophysical temporal phenomena, where a fully time-causal analysis is called for to be physically realistic.
comment: 23 pages, 8 figures
Techno-economic analysis of self-sustainable thermophotovoltaic systems for grid-scale energy generation
To facilitate the widespread adoption of renewable energy, dispatchable, zero-emission power sources are essential for grid stability. This work performs a comprehensive techno-economic analysis of a self-sustainable thermophotovoltaic (TPV) system, an architecture that integrates solar charging to function as a standalone power generation asset. Using theory-based models for air-bridge InGaAs and Si diode cells, our analysis reveals that while the system is not currently competitive from a pure levelized of storage cost (LCOS) perspective due to the high capital expenditure for thermal battery materials, its primary value lies in its competitive levelized cost of electricity (LCOE). The results demonstrate that the LCOE of this self-sustaining system can be competitive with conventional dispatchable generators, such as gas turbines. Furthermore, at scales exceeding the gigawatt-hour level, a Si-based system can also achieve an LCOE comparable to that of traditional gas-turbine power plants, despite having a lower conversion efficiency than its InGaAs counterpart. This highlights a practical engineering pathway for leveraging silicon's immense manufacturing scalability, offering a lower-risk route to deployment compared to III-V materials. Ultimately, this work establishes the self-sustainable TPV architecture as a compelling pathway toward providing grid-scale, on-demand, zero-emission power.
comment: 27 pages, 6 figures, 1 table
Nonlinear System Identification for Model-Based Control of Waked Wind Turbines
This work presents a nonlinear system identification framework for modeling the power extraction dynamics of wind turbines, including both freestream and waked conditions. The approach models turbine dynamics using data-driven power coefficient maps expressed as combinations of compact radial basis functions and polynomial bases, parameterized in terms of tip-speed ratio and upstream conditions. These surrogate models are embedded in a first-order dynamic system suitable for model-based control. Experimental validation is carried out in two wind tunnel configurations: a low-turbulence tandem setup and a high-turbulence wind farm scenario. In the tandem case, the identified model is integrated into an adapted K\omega^2 controller, resulting in improved tip-speed ratio tracking and power stability compared to BEM-based and steady-state models. In the wind farm scenario, the model captures the statistical behavior of the turbines despite unresolved turbulence. The proposed method enables interpretable, adaptive control across a range of operating conditions without relying on black-box learning strategies.
comment: Submitted to: Data-Centric Engineering Journal Length: 27 pages (including references) Figures: 14 numbered figures (from Fig. 1 to Fig. 14) Keywords: Wind Turbines, Wake Interaction, Nonlinear System Identification, Adaptive Control, RBF Regression, Model-Based Control, Wind Tunnel Experiments
Electrical System Architecture for Aviation Electrification
The electrification of aircraft is reshaping the foundations of aerospace design by positioning electrical systems at the center of propulsion, control, and onboard functionality. This chapter provides an overview of electrical system architectures for electric and hybrid electric aircraft, highlighting both established principles and emerging design strategies. The discussion begins with the motivations for electrification, including reducing environmental impact, improving operational efficiency, and replacing complex pneumatic and hydraulic subsystems with lighter and more reliable electrical alternatives. Aircraft electrical architectures are classified into four major categories: conventional, more electric, all electric, and hybrid electric. A range of system topologies is examined, including direct current (DC), alternating current (AC), hybrid, and distributed configurations. Each is considered in terms of its effectiveness in delivering power, enabling redundancy, supporting fault isolation, and managing thermal performance. Real world examples are presented to demonstrate practical applications, with case studies drawn from the Boeing 787 Dreamliner, the Eviation Alice commuter aircraft, and NASA X57 Maxwell demonstrator. These examples illustrate the ongoing transition from incremental subsystem electrification toward fully integrated architectures that promise higher efficiency and greater sustainability.
Neural Network-based Co-design of Output-Feedback Control Barrier Function and Observer
Control Barrier Functions (CBFs) provide a powerful framework for ensuring safety in dynamical systems. However, their application typically relies on full state information, which is often violated in real-world scenarios due to the availability of partial state information. In this work, we propose a neural network-based framework for the co-design of a safety controller, observer, and CBF for partially observed continuous-time systems. By formulating barrier conditions over an augmented state space, our approach ensures safety without requiring bounded estimation errors or handcrafted barrier functions. All components are jointly trained by formulating appropriate loss functions, and we introduce a validity condition to provide formal safety guarantees beyond the training data. Finally, we demonstrate the effectiveness of the proposed approach through several case studies.
comment: There were errors in paper (introduction section and notations)
Efficient MPC-Based Energy Management System for Secure and Cost-Effective Microgrid Operations
Model predictive control (MPC)-based energy management systems (EMS) are essential for ensuring optimal, secure, and stable operation in microgrids with high penetrations of distributed energy resources. However, due to the high computational cost for the decision-making, the conventional MPC-based EMS typically adopts a simplified integrated-bus power balance model. While this simplification is effective for small networks, large-scale systems require a more detailed branch flow model to account for the increased impact of grid power losses and security constraints. This work proposes an efficient and reliable MPC-based EMS that incorporates power-loss effects and grid-security constraints. %, while adaptively shaping the battery power profile in response to online renewable inputs, achieving reduced operational costs. It enhances system reliability, reduces operational costs, and shows strong potential for online implementation due to its reduced computational effort. Specifically, a second-order cone program (SOCP) branch flow relaxation is integrated into the constraint set, yielding a convex formulation that guarantees globally optimal solutions with high computational efficiency. Owing to the radial topology of the microgrid, this relaxation is practically tight, ensuring equivalence to the original problem. Building on this foundation, an online demand response (DR) module is designed to further reduce the operation cost through peak shaving. To the best of our knowledge, no prior MPC-EMS framework has simultaneously modeled losses and security constraints while coordinating flexible loads within a unified architecture. The developed framework enables secure operation with effective peak shaving and reduced total cost. The effectiveness of the proposed method is validated on 10-bus, 18-bus, and 33-bus systems.
Equivariant Filter for Relative Attitude and Target's Angular Velocity Estimation
Accurate estimation of the relative attitude and angular velocity between two rigid bodies is fundamental in aerospace applications such as spacecraft rendezvous and docking. In these scenarios, a chaser vehicle must determine the orientation and angular velocity of a target object using onboard sensors. This work addresses the challenge of designing an Equivariant Filter (EqF) that can reliably estimate both the relative attitude and the target angular velocity using noisy observations of two known, non-collinear vectors fixed in the target frame. To derive the EqF, a symmetry for the system is proposed and an equivariant lift onto the symmetry group is calculated. Observability and convergence properties are analyzed. Simulations demonstrate the filter's performance, with Monte Carlo runs yielding statistically significant results. The impact of low-rate measurements is also examined and a strategy to mitigate this effect is proposed. Experimental results, using fiducial markers and both conventional and event cameras for measurement acquisition, further validate the approach, confirming its effectiveness in a realistic setting.
comment: This work has been submitted to the IEEE for possible publication
Optimal Duration of Reserve Capacity Ancillary Services for Distributed Energy Resources
The increasing integration of distributed energy resources (DERs) into power systems presents opportunities and challenges for ancillary services (AS) provision. Technical requirements of existing AS (i.e., duration, reliability, ramp rate, and lead time) have been designed for traditional generating units, making their provision by DER aggregates particularly challenging. This paper proposes a method to design the duration of reserve capacity AS products considering the operational constraints of DERs and the temporal dynamics of system imbalances. The optimal product duration is determined by maximizing product availability and aligning the supply profile with the system's balancing needs. We apply the methodology to a realistic Swiss low-voltage network with a diverse DER portfolio. The results reveal that (i) shorter product durations maximize average availability and (ii) long product durations improve the alignment with system balancing needs. This paper offers valuable insights for system operators to design AS products tailored for DER participation.
Image-Based Visual Servoing for Enhanced Cooperation of Dual-Arm Manipulation
The cooperation of a pair of robot manipulators is required to manipulate a target object without any fixtures. The conventional control methods coordinate the end-effector pose of each manipulator with that of the other using their kinematics and joint coordinate measurements. Yet, the manipulators' inaccurate kinematics and joint coordinate measurements can cause significant pose synchronization errors in practice. This paper thus proposes an image-based visual servoing approach for enhancing the cooperation of a dual-arm manipulation system. On top of the classical control, the visual servoing controller lets each manipulator use its carried camera to measure the image features of the other's marker and adapt its end-effector pose with the counterpart on the move. Because visual measurements are robust to kinematic errors, the proposed control can reduce the end-effector pose synchronization errors and the fluctuations of the interaction forces of the pair of manipulators on the move. Theoretical analyses have rigorously proven the stability of the closed-loop system. Comparative experiments on real robots have substantiated the effectiveness of the proposed control.
comment: 8 pages, 7 figures. Project website: https://zizhe.io/ral-ibvs-enhanced/. This work has been accepted to the IEEE Robotics and Automation Letters in Feb 2025
Generalizable Physics-Informed Learning for Stochastic Safety-Critical Systems
Accurate estimation of long-term risk is essential for the design and analysis of stochastic dynamical systems. Existing risk quantification methods typically rely on extensive datasets involving risk events observed over extended time horizons, which can be prohibitively expensive to acquire. Motivated by this gap, we propose an efficient method for learning long-term risk probabilities using short-term samples with limited occurrence of risk events. Specifically, we establish that four distinct classes of long-term risk probabilities are characterized by specific partial differential equations (PDEs). Using this characterization, we introduce a physics-informed learning framework that combines empirical data with physics information to infer risk probabilities. We then analyze the theoretical properties of this framework in terms of generalization and convergence. Through numerical experiments, we demonstrate that our framework not only generalizes effectively beyond the sampled states and time horizons but also offers additional benefits such as improved sample efficiency, rapid online inference capabilities under changing system dynamics, and stable computation of probability gradients. These results highlight how embedding PDE constraints, which contain explicit gradient terms and inform how risk probabilities depend on state, time horizon, and system parameters, improves interpolation and generalization between/beyond the available data.
Multi-Agent Stage-wise Conservative Linear Bandits
In many real-world applications such as recommendation systems, multiple learning agents must balance exploration and exploitation while maintaining safety guarantees to avoid catastrophic failures. We study the stochastic linear bandit problem in a multi-agent networked setting where agents must satisfy stage-wise conservative constraints. A network of $N$ agents collaboratively maximizes cumulative reward while ensuring that the expected reward at every round is no less than $(1-\alpha)$ times that of a baseline policy. Each agent observes local rewards with unknown parameters, but the network optimizes for the global parameter (average of local parameters). Agents communicate only with immediate neighbors, and each communication round incurs additional regret. We propose MA-SCLUCB (Multi-Agent Stage-wise Conservative Linear UCB), an episodic algorithm alternating between action selection and consensus-building phases. We prove that MA-SCLUCB achieves regret $\tilde{O}\left(\frac{d}{\sqrt{N}}\sqrt{T}\cdot\frac{\log(NT)}{\sqrt{\log(1/|\lambda_2|)}}\right)$ with high probability, where $d$ is the dimension, $T$ is the horizon, and $|\lambda_2|$ is the network's second largest eigenvalue magnitude. Our analysis shows: (i) collaboration yields $\frac{1}{\sqrt{N}}$ improvement despite local communication, (ii) communication overhead grows only logarithmically for well-connected networks, and (iii) stage-wise safety adds only lower-order regret. Thus, distributed learning with safety guarantees achieves near-optimal performance in reasonably connected networks.
Systems and Control (EESS)
Multi-Segment Photonic Power Converters for Energy Harvesting and High-Speed Optical Wireless Communication
The demand for energy-efficient high-speed wireless communication, coupled with the rapid rise of IoT devices, requires systems that integrate power harvesting with optical data reception to eliminate the need for charging or battery replacements. Recent advances have explored the use of solar cells as optical receivers for high-speed data detection alongside power harvesting. \acs{GaAs}-based \acp{PPC} provide six times greater electron mobility than silicon- or cadmium telluride-based cells, enabling faster data detection and improved power efficiency. However, their bandwidth is constrained by junction capacitance, which increases with active area, creating a trade-off between power output and data rate. To address this, we propose and test multi-segment \acs{GaAs}-based \Acp{PPC} that serve as both energy harvesters and data detectors. By segmenting the active area into 2, 4, or 6 subcells, forming circular areas with diameters of 1, 1.5, or 2.08~mm, we reduce capacitance and boost bandwidth while preserving light collection. Fabricated on a semi-insulating \ac{GaAs} substrate with etched trenches for electrical isolation, the series-connected subcells optimize absorption and minimize parasitic effects. The \Acp{PPC} were used for an eye-safe 1.5~m optical wireless link, employing \ac{OFDM} with adaptive bit and power loading. The system achieved a world record data rate of 3.8~Gbps, which is four times higher than prior works. The system converts 39.7\% of optical power from a beam of 2.3~mW, although the segmentation increases the sensitivity of the alignment. These findings provide new solutions for off-grid backhaul for future communication networks, such as 6th generation (6G) cellular.
Differentiable Model Predictive Control on the GPU
Differentiable model predictive control (MPC) offers a powerful framework for combining learning and control. However, its adoption has been limited by the inherently sequential nature of traditional optimization algorithms, which are challenging to parallelize on modern computing hardware like GPUs. In this work, we tackle this bottleneck by introducing a GPU-accelerated differentiable optimization tool for MPC. This solver leverages sequential quadratic programming and a custom preconditioned conjugate gradient (PCG) routine with tridiagonal preconditioning to exploit the problem's structure and enable efficient parallelization. We demonstrate substantial speedups over CPU- and GPU-based baselines, significantly improving upon state-of-the-art training times on benchmark reinforcement learning and imitation learning tasks. Finally, we showcase the method on the challenging task of reinforcement learning for driving at the limits of handling, where it enables robust drifting of a Toyota Supra through water puddles.
Learning Mixtures of Linear Dynamical Systems (MoLDS) via Hybrid Tensor-EM Method
Mixtures of linear dynamical systems (MoLDS) provide a path to model time-series data that exhibit diverse temporal dynamics across trajectories. However, its application remains challenging in complex and noisy settings, limiting its effectiveness for neural data analysis. Tensor-based moment methods can provide global identifiability guarantees for MoLDS, but their performance degrades under noise and complexity. Commonly used expectation-maximization (EM) methods offer flexibility in fitting latent models but are highly sensitive to initialization and prone to poor local minima. Here, we propose a tensor-based method that provides identifiability guarantees for learning MoLDS, which is followed by EM updates to combine the strengths of both approaches. The novelty in our approach lies in the construction of moment tensors using the input-output data to recover globally consistent estimates of mixture weights and system parameters. These estimates can then be refined through a Kalman EM algorithm, with closed-form updates for all LDS parameters. We validate our framework on synthetic benchmarks and real-world datasets. On synthetic data, the proposed Tensor-EM method achieves more reliable recovery and improved robustness compared to either pure tensor or randomly initialized EM methods. We then analyze neural recordings from the primate somatosensory cortex while a non-human primate performs reaches in different directions. Our method successfully models and clusters different conditions as separate subsystems, consistent with supervised single-LDS fits for each condition. Finally, we apply this approach to another neural dataset where monkeys perform a sequential reaching task. These results demonstrate that MoLDS provides an effective framework for modeling complex neural data, and that Tensor-EM is a reliable approach to MoLDS learning for these applications.
comment: 20 pages, 7 figures
Toward Model Matching for Remotely Controlled Differential Drive Robotic Vehicles
The problem of regulation of the orientation angle of a remotely controlled differential-drive mobile robot with actuator dynamics and network-induced delays is studied. Using a preinstalled two-layer nonlinear control scheme that decouples linear and angular velocities and regulates heading, a third, delay-dependent layer that achieves exact model matching from the orientation angle command to the orientation angle is introduced. The proposed outer loop controller is a delay dependent dynamic measurable output-feedback controller with dynamic proper precompensator. Parameterization yields a simple characteristic quasi-polynomial with coefficients constrained to satisfy stability for all delays up to a computable bound. Computational experiments confirm accurate tracking, fast settling and bounded internal signals and control voltages. The approach offers an analytic design alternative to AI-based tuning for delayed robotic systems.
Optimal Batched Scheduling of Stochastic Processing Networks Using Atomic Action Decomposition
Stochastic processing networks (SPNs) have broad applications in healthcare, transportation, and communication networks. The control of SPN is to dynamically assign servers in batches under uncertainty to optimize long-run performance. This problem is challenging as the policy dimension grows exponentially with the number of servers, making standard reinforcement learning and policy optimization methods intractable at scale. We propose an atomic action decomposition framework that addresses this scalability challenge by breaking joint assignments into sequential single-server assignments. This yields policies with constant dimension, independent of the number of servers. We study two classes of atomic policies, the step-dependent and step-independent atomic policies, and prove that both achieve the same optimal long-run average reward as the original joint policies. These results establish that computing the optimal SPN control can be made scalable without loss of optimality using the atomic framework. Our results offer theoretical justification for the strong empirical success of the atomic framework in large-scale applications reported in previous articles.
Hybrid Quantum-Classical Policy Gradient for Adaptive Control of Cyber-Physical Systems: A Comparative Study of VQC vs. MLP
The comparative evaluation between classical and quantum reinforcement learning (QRL) paradigms was conducted to investigate their convergence behavior, robustness under observational noise, and computational efficiency in a benchmark control environment. The study employed a multilayer perceptron (MLP) agent as a classical baseline and a parameterized variational quantum circuit (VQC) as a quantum counterpart, both trained on the CartPole-v1 environment over 500 episodes. Empirical results demonstrated that the classical MLP achieved near-optimal policy convergence with a mean return of 498.7 +/- 3.2, maintaining stable equilibrium throughout training. In contrast, the VQC exhibited limited learning capability, with an average return of 14.6 +/- 4.8, primarily constrained by circuit depth and qubit connectivity. Noise robustness analysis further revealed that the MLP policy deteriorated gracefully under Gaussian perturbations, while the VQC displayed higher sensitivity at equivalent noise levels. Despite the lower asymptotic performance, the VQC exhibited significantly lower parameter count and marginally increased training time, highlighting its potential scalability for low-resource quantum processors. The results suggest that while classical neural policies remain dominant in current control benchmarks, quantum-enhanced architectures could offer promising efficiency advantages once hardware noise and expressivity limitations are mitigated.
comment: 6 pages, 5 figures, 2 tables, 17 equations, 1 algorithm
Distributed Platoon Control Under Quantization: Stability Analysis and Privacy Preservation
Distributed control of connected and automated vehicles has attracted considerable interest for its potential to improve traffic efficiency and safety. However, such control schemes require sharing privacy-sensitive vehicle data, which introduces risks of information leakage and potential malicious activities. This paper investigates the stability and privacy-preserving properties of distributed platoon control under two types of quantizers: deterministic and probabilistic. For deterministic quantization, we show that the resulting control strategy ensures the system errors remain uniformly ultimately bounded. Moreover, in the absence of auxiliary information, an eavesdropper cannot uniquely infer sensitive vehicle states. In contrast, the use of probabilistic quantization enables asymptotic convergence of the vehicle platoon in expectation with bounded variance. Importantly, probabilistic quantizers can satisfy differential privacy guarantees, thereby preserving privacy even when the eavesdropper possesses arbitrary auxiliary information. We further analyze the trade-off between control performance and privacy by formulating an optimization problem that characterizes the impact of the quantization step on both metrics. Numerical simulations are provided to illustrate the performance differences between the two quantization strategies.
comment: 12 pages, 6 figures
Safe Landing on Small Celestial Bodies with Gravitational Uncertainty Using Disturbance Estimation and Control Barrier Functions
Soft landing on small celestial bodies (SCBs) poses unique challenges, as uncertainties in gravitational models and poorly characterized, dynamic environments require a high level of autonomy. Existing control approaches lack formal guarantees for safety constraint satisfaction, necessary to ensure the safe execution of the maneuvers. This paper introduces a control that addresses this limitation by integrating trajectory tracking, disturbance estimation, and safety enforcement. An extended high-gain observer is employed to estimate disturbances resulting from gravitational model uncertainties. We then apply a feedback-linearizing and disturbance-canceling controller that achieves exponential tracking of reference trajectories. Finally, we use a control barrier function based minimum-intervention controller to enforce state and input constraints through out the maneuver execution. This control combines trajectory tracking of offline generated reference trajectories with formal guarantees of safety, which follows common guidance and control architectures for spacecraft and allows aggressive maneuvers to be executed without compromising safety. Numerical simulations using fuel-optimal trajectories demonstrate the effectiveness of the controller in achieving precise and safe soft-landing, highlighting its potential for autonomous SCB missions.
Human-in-the-loop Optimisation in Robot-assisted Gait Training
Wearable robots offer a promising solution for quantitatively monitoring gait and providing systematic, adaptive assistance to promote patient independence and improve gait. However, due to significant interpersonal and intrapersonal variability in walking patterns, it is important to design robot controllers that can adapt to the unique characteristics of each individual. This paper investigates the potential of human-in-the-loop optimisation (HILO) to deliver personalised assistance in gait training. The Covariance Matrix Adaptation Evolution Strategy (CMA-ES) was employed to continuously optimise an assist-as-needed controller of a lower-limb exoskeleton. Six healthy individuals participated over a two-day experiment. Our results suggest that while the CMA-ES appears to converge to a unique set of stiffnesses for each individual, no measurable impact on the subjects' performance was observed during the validation trials. These findings highlight the impact of human-robot co-adaptation and human behaviour variability, whose effect may be greater than potential benefits of personalising rule-based assistive controllers. Our work contributes to understanding the limitations of current personalisation approaches in exoskeleton-assisted gait rehabilitation and identifies key challenges for effective implementation of human-in-the-loop optimisation in this domain.
Federated Split Learning for Resource-Constrained Robots in Industrial IoT: Framework Comparison, Optimization Strategies, and Future Directions
Federated split learning (FedSL) has emerged as a promising paradigm for enabling collaborative intelligence in industrial Internet of Things (IoT) systems, particularly in smart factories where data privacy, communication efficiency, and device heterogeneity are critical concerns. In this article, we present a comprehensive study of FedSL frameworks tailored for resource-constrained robots in industrial scenarios. We compare synchronous, asynchronous, hierarchical, and heterogeneous FedSL frameworks in terms of workflow, scalability, adaptability, and limitations under dynamic industrial conditions. Furthermore, we systematically categorize token fusion strategies into three paradigms: input-level (pre-fusion), intermediate-level (intra-fusion), and output-level (post-fusion), and summarize their respective strengths in industrial applications. We also provide adaptive optimization techniques to enhance the efficiency and feasibility of FedSL implementation, including model compression, split layer selection, computing frequency allocation, and wireless resource management. Simulation results validate the performance of these frameworks under industrial detection scenarios. Finally, we outline open issues and research directions of FedSL in future smart manufacturing systems.
comment: 9 pages, 5 figures, submitted to the IEEE magazine
Sample-Efficient and Smooth Cross-Entropy Method Model Predictive Control Using Deterministic Samples
Cross-entropy method model predictive control (CEM--MPC) is a powerful gradient-free technique for nonlinear optimal control, but its performance is often limited by the reliance on random sampling. This conventional approach can lead to inefficient exploration of the solution space and non-smooth control inputs, requiring a large number of samples to achieve satisfactory results. To address these limitations, we propose deterministic sampling CEM (dsCEM), a novel framework that replaces the random sampling step with deterministic samples derived from localized cumulative distributions (LCDs). Our approach introduces modular schemes to generate and adapt these sample sets, incorporating temporal correlations to ensure smooth control trajectories. This method can be used as a drop-in replacement for the sampling step in existing CEM-based controllers. Experimental evaluations on two nonlinear control tasks demonstrate that dsCEM consistently outperforms state-of-the-art iCEM in terms of cumulative cost and control input smoothness, particularly in the critical low-sample regime.
Generative AI-Driven Hierarchical Multi-Agent Framework for Zero-Touch Optical Networks
The rapid development of Generative Artificial Intelligence (GenAI) has catalyzed a transformative technological revolution across all walks of life. As the backbone of wideband communication, optical networks are expecting high-level autonomous operation and zero-touch management to accommodate their expanding network scales and escalating transmission bandwidth. The integration of GenAI is deemed as the pivotal solution for realizing zero-touch optical networks. However, the lifecycle management of optical networks involves a multitude of tasks and necessitates seamless collaboration across multiple layers, which poses significant challenges to the existing single-agent GenAI systems. In this paper, we propose a GenAI-driven hierarchical multi-agent framework designed to streamline multi-task autonomous execution for zero-touch optical networks. We present the architecture, implementation, and applications of this framework. A field-deployed mesh network is utilized to demonstrate three typical scenarios throughout the lifecycle of optical network: quality of transmission estimation in the planning stage, dynamic channel adding/dropping in the operation stage, and system capacity increase in the upgrade stage. The case studies, illustrate the capabilities of multi-agent framework in multi-task allocation, coordination, execution, evaluation, and summarization. This work provides a promising approach for the future development of intelligent, efficient, and collaborative network management solutions, paving the way for more specialized and adaptive zero-touch optical networks.
comment: 7 pages,6 figures, Accepted by lEEE Communications Magazine, Open call
GO-Flock: Goal-Oriented Flocking in 3D Unknown Environments with Depth Maps
Artificial Potential Field (APF) methods are widely used for reactive flocking control, but they often suffer from challenges such as deadlocks and local minima, especially in the presence of obstacles. Existing solutions to address these issues are typically passive, leading to slow and inefficient collective navigation. As a result, many APF approaches have only been validated in obstacle-free environments or simplified, pseudo 3D simulations. This paper presents GO-Flock, a hybrid flocking framework that integrates planning with reactive APF-based control. GO-Flock consists of an upstream Perception Module, which processes depth maps to extract waypoints and virtual agents for obstacle avoidance, and a downstream Collective Navigation Module, which applies a novel APF strategy to achieve effective flocking behavior in cluttered environments. We evaluate GO-Flock against passive APF-based approaches to demonstrate their respective merits, such as their flocking behavior and the ability to overcome local minima. Finally, we validate GO-Flock through obstacle-filled environment and also hardware-in-the-loop experiments where we successfully flocked a team of nine drones, six physical and three virtual, in a forest environment.
Real-Time Glass Detection and Reprojection using Sensor Fusion Onboard Aerial Robots ICRA 2026
Autonomous aerial robots are increasingly being deployed in real-world scenarios, where transparent obstacles present significant challenges to reliable navigation and mapping. These materials pose a unique problem for traditional perception systems because they lack discernible features and can cause conventional depth sensors to fail, leading to inaccurate maps and potential collisions. To ensure safe navigation, robots must be able to accurately detect and map these transparent obstacles. Existing methods often rely on large, expensive sensors or algorithms that impose high computational burdens, making them unsuitable for low Size, Weight, and Power (SWaP) robots. In this work, we propose a novel and computationally efficient framework for detecting and mapping transparent obstacles onboard a sub-300g quadrotor. Our method fuses data from a Time-of-Flight (ToF) camera and an ultrasonic sensor with a custom, lightweight 2D convolution model. This specialized approach accurately detects specular reflections and propagates their depth into corresponding empty regions of the depth map, effectively rendering transparent obstacles visible. The entire pipeline operates in real-time, utilizing only a small fraction of a CPU core on an embedded processor. We validate our system through a series of experiments in both controlled and real-world environments, demonstrating the utility of our method through experiments where the robot maps indoor environments containing glass. Our work is, to our knowledge, the first of its kind to demonstrate a real-time, onboard transparent obstacle mapping system on a low-SWaP quadrotor using only the CPU.
comment: 8 pages, 8 figures, submitted to ICRA 2026
Terrain-Aided Navigation Using a Point Cloud Measurement Sensor
We investigate the use of a point cloud measurement in terrain-aided navigation. Our goal is to aid an inertial navigation system, by exploring ways to generate a useful measurement innovation error for effective nonlinear state estimation. We compare two such measurement models that involve the scanning of a digital terrain elevation model: a) one that is based on typical ray-casting from a given pose, that returns the predicted point cloud measurement from that pose, and b) another computationally less intensive one that does not require raycasting and we refer to herein as a sliding grid. Besides requiring a pose, it requires the pattern of the point cloud measurement itself and returns a predicted point cloud measurement. We further investigate the observability properties of the altitude for both measurement models. As a baseline, we compare the use of a point cloud measurement performance to the use of a radar altimeter and show the gains in accuracy. We conclude by showing that a point cloud measurement outperforms the use of a radar altimeter, and the point cloud measurement model to use depends on the computational resources
Three-dimensional Integrated Guidance and Control for Leader-Follower Flexible Formation of Fixed Wing UAVs
This paper presents a nonlinear integrated guidance and control (IGC) approach for flexible leader-follower formation flight of fixed-wing unmanned aerial vehicles (UAVs) while accounting for high-fidelity aerodynamics and thrust dynamics. Unlike conventional leader-follower schemes that fix the follower's position relative to the leader, the follower is steered to maintain range and bearing angles (which is the angle between its velocity vector and its line-of-sight (LOS) with respect to the leader) arbitrarily close to the prescribed values, enabling the follower to maintain formation on a hemispherical region behind the leader. The proposed IGC framework directly maps leader-follower relative range dynamics to throttle commands, and the follower's velocity orientation relative to the LOS to aerodynamic control surface deflections. This enables synergism between guidance and control subsystems. The control design uses a dynamic surface control-based backstepping approach to achieve convergence to the desired formation set, where Lyapunov barrier functions are incorporated to ensure the follower's bearing angle is constrained within specified bounds. Rigorous stability analysis guarantees uniform ultimate boundedness of all error states and strict constraint satisfaction in the presence of aerodynamic nonlinearities. The proposed flexible formation scheme allows the follower to have an orientation mismatch relative to the leader to execute anticipatory reconfiguration by transitioning between the relative positions in the admissible formation set when the leader aggressively maneuvers. The proposed IGC law relies only on relative information and onboard sensors without the information about the leader's maneuver, making it suitable for GPS-denied or non-cooperative scenarios. Finally, we present simulation results to vindicate the effectiveness and robustness of our approach.
Comparing Normal Form Representations for Station-Keeping near Cislunar Libration Points
The normal forms provide useful approximations for many trajectories of interest within the circular restricted three-body problem. This paper aims to thoroughly compare two of these forms: the Birkhoff normal form and the resonant normal form, highlighting the strengths of each for the representation of center manifold trajectories. A method of station-keeping is introduced, analogous to Floquet modes, in which the unstable component is minimized at specific points along a trajectory through impulsive maneuvers. Three different formulations of the same station-keeping approach are posed, collectively spanning Lyapunov, vertical, and halo orbits, as well as Lissajous and quasihalo trajectories.
comment: 2025 AAS/AIAA Space Flight Mechanics Meeting
Time-causal and time-recursive wavelets
When to apply wavelet analysis to real-time temporal signals, where the future cannot be accessed, it is essential to base all the steps in the signal processing pipeline on computational mechanisms that are truly time-causal. This paper describes how a time-causal wavelet analysis can be performed based on concepts developed in the area of temporal scale-space theory, originating from a complete classification of temporal smoothing kernels that guarantee non-creation of new structures from finer to coarser temporal scale levels. By necessity, convolution with truncated exponential kernels in cascade constitutes the only permissable class of kernels, as well as their temporal derivatives as a natural complement to fulfil the admissibility conditions of wavelet representations. For a particular way of choosing the time constants in the resulting infinite convolution of truncated exponential kernels, to ensure temporal scale covariance and thus self-similarity over temporal scales, we describe how mother wavelets can be chosen as temporal derivatives of the resulting time-causal limit kernel. By developing connections between wavelet theory and scale-space theory, we characterize and quantify how the continuous scaling properties transfer to the discrete implementation, demonstrating how the proposed time-causal wavelet representation can reflect the duration of locally dominant temporal structures in the input signals. We propose that this notion of time-causal wavelet analysis could be a valuable tool for signal processing tasks, where streams of signals are to be processed in real time, specifically for signals that may contain local variations over a rich span of temporal scales, or more generally for analysing physical or biophysical temporal phenomena, where a fully time-causal analysis is called for to be physically realistic.
comment: 23 pages, 8 figures
Techno-economic analysis of self-sustainable thermophotovoltaic systems for grid-scale energy generation
To facilitate the widespread adoption of renewable energy, dispatchable, zero-emission power sources are essential for grid stability. This work performs a comprehensive techno-economic analysis of a self-sustainable thermophotovoltaic (TPV) system, an architecture that integrates solar charging to function as a standalone power generation asset. Using theory-based models for air-bridge InGaAs and Si diode cells, our analysis reveals that while the system is not currently competitive from a pure levelized of storage cost (LCOS) perspective due to the high capital expenditure for thermal battery materials, its primary value lies in its competitive levelized cost of electricity (LCOE). The results demonstrate that the LCOE of this self-sustaining system can be competitive with conventional dispatchable generators, such as gas turbines. Furthermore, at scales exceeding the gigawatt-hour level, a Si-based system can also achieve an LCOE comparable to that of traditional gas-turbine power plants, despite having a lower conversion efficiency than its InGaAs counterpart. This highlights a practical engineering pathway for leveraging silicon's immense manufacturing scalability, offering a lower-risk route to deployment compared to III-V materials. Ultimately, this work establishes the self-sustainable TPV architecture as a compelling pathway toward providing grid-scale, on-demand, zero-emission power.
comment: 27 pages, 6 figures, 1 table
Nonlinear System Identification for Model-Based Control of Waked Wind Turbines
This work presents a nonlinear system identification framework for modeling the power extraction dynamics of wind turbines, including both freestream and waked conditions. The approach models turbine dynamics using data-driven power coefficient maps expressed as combinations of compact radial basis functions and polynomial bases, parameterized in terms of tip-speed ratio and upstream conditions. These surrogate models are embedded in a first-order dynamic system suitable for model-based control. Experimental validation is carried out in two wind tunnel configurations: a low-turbulence tandem setup and a high-turbulence wind farm scenario. In the tandem case, the identified model is integrated into an adapted K\omega^2 controller, resulting in improved tip-speed ratio tracking and power stability compared to BEM-based and steady-state models. In the wind farm scenario, the model captures the statistical behavior of the turbines despite unresolved turbulence. The proposed method enables interpretable, adaptive control across a range of operating conditions without relying on black-box learning strategies.
comment: Submitted to: Data-Centric Engineering Journal Length: 27 pages (including references) Figures: 14 numbered figures (from Fig. 1 to Fig. 14) Keywords: Wind Turbines, Wake Interaction, Nonlinear System Identification, Adaptive Control, RBF Regression, Model-Based Control, Wind Tunnel Experiments
Electrical System Architecture for Aviation Electrification
The electrification of aircraft is reshaping the foundations of aerospace design by positioning electrical systems at the center of propulsion, control, and onboard functionality. This chapter provides an overview of electrical system architectures for electric and hybrid electric aircraft, highlighting both established principles and emerging design strategies. The discussion begins with the motivations for electrification, including reducing environmental impact, improving operational efficiency, and replacing complex pneumatic and hydraulic subsystems with lighter and more reliable electrical alternatives. Aircraft electrical architectures are classified into four major categories: conventional, more electric, all electric, and hybrid electric. A range of system topologies is examined, including direct current (DC), alternating current (AC), hybrid, and distributed configurations. Each is considered in terms of its effectiveness in delivering power, enabling redundancy, supporting fault isolation, and managing thermal performance. Real world examples are presented to demonstrate practical applications, with case studies drawn from the Boeing 787 Dreamliner, the Eviation Alice commuter aircraft, and NASA X57 Maxwell demonstrator. These examples illustrate the ongoing transition from incremental subsystem electrification toward fully integrated architectures that promise higher efficiency and greater sustainability.
Neural Network-based Co-design of Output-Feedback Control Barrier Function and Observer
Control Barrier Functions (CBFs) provide a powerful framework for ensuring safety in dynamical systems. However, their application typically relies on full state information, which is often violated in real-world scenarios due to the availability of partial state information. In this work, we propose a neural network-based framework for the co-design of a safety controller, observer, and CBF for partially observed continuous-time systems. By formulating barrier conditions over an augmented state space, our approach ensures safety without requiring bounded estimation errors or handcrafted barrier functions. All components are jointly trained by formulating appropriate loss functions, and we introduce a validity condition to provide formal safety guarantees beyond the training data. Finally, we demonstrate the effectiveness of the proposed approach through several case studies.
comment: There were errors in paper (introduction section and notations)
Efficient MPC-Based Energy Management System for Secure and Cost-Effective Microgrid Operations
Model predictive control (MPC)-based energy management systems (EMS) are essential for ensuring optimal, secure, and stable operation in microgrids with high penetrations of distributed energy resources. However, due to the high computational cost for the decision-making, the conventional MPC-based EMS typically adopts a simplified integrated-bus power balance model. While this simplification is effective for small networks, large-scale systems require a more detailed branch flow model to account for the increased impact of grid power losses and security constraints. This work proposes an efficient and reliable MPC-based EMS that incorporates power-loss effects and grid-security constraints. %, while adaptively shaping the battery power profile in response to online renewable inputs, achieving reduced operational costs. It enhances system reliability, reduces operational costs, and shows strong potential for online implementation due to its reduced computational effort. Specifically, a second-order cone program (SOCP) branch flow relaxation is integrated into the constraint set, yielding a convex formulation that guarantees globally optimal solutions with high computational efficiency. Owing to the radial topology of the microgrid, this relaxation is practically tight, ensuring equivalence to the original problem. Building on this foundation, an online demand response (DR) module is designed to further reduce the operation cost through peak shaving. To the best of our knowledge, no prior MPC-EMS framework has simultaneously modeled losses and security constraints while coordinating flexible loads within a unified architecture. The developed framework enables secure operation with effective peak shaving and reduced total cost. The effectiveness of the proposed method is validated on 10-bus, 18-bus, and 33-bus systems.
Equivariant Filter for Relative Attitude and Target's Angular Velocity Estimation
Accurate estimation of the relative attitude and angular velocity between two rigid bodies is fundamental in aerospace applications such as spacecraft rendezvous and docking. In these scenarios, a chaser vehicle must determine the orientation and angular velocity of a target object using onboard sensors. This work addresses the challenge of designing an Equivariant Filter (EqF) that can reliably estimate both the relative attitude and the target angular velocity using noisy observations of two known, non-collinear vectors fixed in the target frame. To derive the EqF, a symmetry for the system is proposed and an equivariant lift onto the symmetry group is calculated. Observability and convergence properties are analyzed. Simulations demonstrate the filter's performance, with Monte Carlo runs yielding statistically significant results. The impact of low-rate measurements is also examined and a strategy to mitigate this effect is proposed. Experimental results, using fiducial markers and both conventional and event cameras for measurement acquisition, further validate the approach, confirming its effectiveness in a realistic setting.
comment: This work has been submitted to the IEEE for possible publication
Optimal Duration of Reserve Capacity Ancillary Services for Distributed Energy Resources
The increasing integration of distributed energy resources (DERs) into power systems presents opportunities and challenges for ancillary services (AS) provision. Technical requirements of existing AS (i.e., duration, reliability, ramp rate, and lead time) have been designed for traditional generating units, making their provision by DER aggregates particularly challenging. This paper proposes a method to design the duration of reserve capacity AS products considering the operational constraints of DERs and the temporal dynamics of system imbalances. The optimal product duration is determined by maximizing product availability and aligning the supply profile with the system's balancing needs. We apply the methodology to a realistic Swiss low-voltage network with a diverse DER portfolio. The results reveal that (i) shorter product durations maximize average availability and (ii) long product durations improve the alignment with system balancing needs. This paper offers valuable insights for system operators to design AS products tailored for DER participation.
Image-Based Visual Servoing for Enhanced Cooperation of Dual-Arm Manipulation
The cooperation of a pair of robot manipulators is required to manipulate a target object without any fixtures. The conventional control methods coordinate the end-effector pose of each manipulator with that of the other using their kinematics and joint coordinate measurements. Yet, the manipulators' inaccurate kinematics and joint coordinate measurements can cause significant pose synchronization errors in practice. This paper thus proposes an image-based visual servoing approach for enhancing the cooperation of a dual-arm manipulation system. On top of the classical control, the visual servoing controller lets each manipulator use its carried camera to measure the image features of the other's marker and adapt its end-effector pose with the counterpart on the move. Because visual measurements are robust to kinematic errors, the proposed control can reduce the end-effector pose synchronization errors and the fluctuations of the interaction forces of the pair of manipulators on the move. Theoretical analyses have rigorously proven the stability of the closed-loop system. Comparative experiments on real robots have substantiated the effectiveness of the proposed control.
comment: 8 pages, 7 figures. Project website: https://zizhe.io/ral-ibvs-enhanced/. This work has been accepted to the IEEE Robotics and Automation Letters in Feb 2025
Generalizable Physics-Informed Learning for Stochastic Safety-Critical Systems
Accurate estimation of long-term risk is essential for the design and analysis of stochastic dynamical systems. Existing risk quantification methods typically rely on extensive datasets involving risk events observed over extended time horizons, which can be prohibitively expensive to acquire. Motivated by this gap, we propose an efficient method for learning long-term risk probabilities using short-term samples with limited occurrence of risk events. Specifically, we establish that four distinct classes of long-term risk probabilities are characterized by specific partial differential equations (PDEs). Using this characterization, we introduce a physics-informed learning framework that combines empirical data with physics information to infer risk probabilities. We then analyze the theoretical properties of this framework in terms of generalization and convergence. Through numerical experiments, we demonstrate that our framework not only generalizes effectively beyond the sampled states and time horizons but also offers additional benefits such as improved sample efficiency, rapid online inference capabilities under changing system dynamics, and stable computation of probability gradients. These results highlight how embedding PDE constraints, which contain explicit gradient terms and inform how risk probabilities depend on state, time horizon, and system parameters, improves interpolation and generalization between/beyond the available data.
Multi-Agent Stage-wise Conservative Linear Bandits
In many real-world applications such as recommendation systems, multiple learning agents must balance exploration and exploitation while maintaining safety guarantees to avoid catastrophic failures. We study the stochastic linear bandit problem in a multi-agent networked setting where agents must satisfy stage-wise conservative constraints. A network of $N$ agents collaboratively maximizes cumulative reward while ensuring that the expected reward at every round is no less than $(1-\alpha)$ times that of a baseline policy. Each agent observes local rewards with unknown parameters, but the network optimizes for the global parameter (average of local parameters). Agents communicate only with immediate neighbors, and each communication round incurs additional regret. We propose MA-SCLUCB (Multi-Agent Stage-wise Conservative Linear UCB), an episodic algorithm alternating between action selection and consensus-building phases. We prove that MA-SCLUCB achieves regret $\tilde{O}\left(\frac{d}{\sqrt{N}}\sqrt{T}\cdot\frac{\log(NT)}{\sqrt{\log(1/|\lambda_2|)}}\right)$ with high probability, where $d$ is the dimension, $T$ is the horizon, and $|\lambda_2|$ is the network's second largest eigenvalue magnitude. Our analysis shows: (i) collaboration yields $\frac{1}{\sqrt{N}}$ improvement despite local communication, (ii) communication overhead grows only logarithmically for well-connected networks, and (iii) stage-wise safety adds only lower-order regret. Thus, distributed learning with safety guarantees achieves near-optimal performance in reasonably connected networks.
Robotics
Dropping the D: RGB-D SLAM Without the Depth Sensor
We present DropD-SLAM, a real-time monocular SLAM system that achieves RGB-D-level accuracy without relying on depth sensors. The system replaces active depth input with three pretrained vision modules: a monocular metric depth estimator, a learned keypoint detector, and an instance segmentation network. Dynamic objects are suppressed using dilated instance masks, while static keypoints are assigned predicted depth values and backprojected into 3D to form metrically scaled features. These are processed by an unmodified RGB-D SLAM back end for tracking and mapping. On the TUM RGB-D benchmark, DropD-SLAM attains 7.4 cm mean ATE on static sequences and 1.8 cm on dynamic sequences, matching or surpassing state-of-the-art RGB-D methods while operating at 22 FPS on a single GPU. These results suggest that modern pretrained vision models can replace active depth sensors as reliable, real-time sources of metric scale, marking a step toward simpler and more cost-effective SLAM systems.
EmbodiedCoder: Parameterized Embodied Mobile Manipulation via Modern Coding Model
Recent advances in control robot methods, from end-to-end vision-language-action frameworks to modular systems with predefined primitives, have advanced robots' ability to follow natural language instructions. Nonetheless, many approaches still struggle to scale to diverse environments, as they often rely on large annotated datasets and offer limited interpretability.In this work, we introduce EmbodiedCoder, a training-free framework for open-world mobile robot manipulation that leverages coding models to directly generate executable robot trajectories. By grounding high-level instructions in code, EmbodiedCoder enables flexible object geometry parameterization and manipulation trajectory synthesis without additional data collection or fine-tuning.This coding-based paradigm provides a transparent and generalizable way to connect perception with manipulation. Experiments on real mobile robots show that EmbodiedCoder achieves robust performance across diverse long-term tasks and generalizes effectively to novel objects and environments.Our results demonstrate an interpretable approach for bridging high-level reasoning and low-level control, moving beyond fixed primitives toward versatile robot intelligence. See the project page at: https://anonymous.4open.science/w/Embodied-Coder/
comment: Demo Page: https://anonymous.4open.science/w/Embodied-Coder/
DYMO-Hair: Generalizable Volumetric Dynamics Modeling for Robot Hair Manipulation
Hair care is an essential daily activity, yet it remains inaccessible to individuals with limited mobility and challenging for autonomous robot systems due to the fine-grained physical structure and complex dynamics of hair. In this work, we present DYMO-Hair, a model-based robot hair care system. We introduce a novel dynamics learning paradigm that is suited for volumetric quantities such as hair, relying on an action-conditioned latent state editing mechanism, coupled with a compact 3D latent space of diverse hairstyles to improve generalizability. This latent space is pre-trained at scale using a novel hair physics simulator, enabling generalization across previously unseen hairstyles. Using the dynamics model with a Model Predictive Path Integral (MPPI) planner, DYMO-Hair is able to perform visual goal-conditioned hair styling. Experiments in simulation demonstrate that DYMO-Hair's dynamics model outperforms baselines on capturing local deformation for diverse, unseen hairstyles. DYMO-Hair further outperforms baselines in closed-loop hair styling tasks on unseen hairstyles, with an average of 22% lower final geometric error and 42% higher success rate than the state-of-the-art system. Real-world experiments exhibit zero-shot transferability of our system to wigs, achieving consistent success on challenging unseen hairstyles where the state-of-the-art system fails. Together, these results introduce a foundation for model-based robot hair care, advancing toward more generalizable, flexible, and accessible robot hair styling in unconstrained physical environments. More details are available on our project page: https://chengyzhao.github.io/DYMOHair-web/.
comment: Project page: https://chengyzhao.github.io/DYMOHair-web/
A Preview of HoloOcean 2.0 ICRA 2025
Marine robotics simulators play a fundamental role in the development of marine robotic systems. With increased focus on the marine robotics field in recent years, there has been significant interest in developing higher fidelitysimulation of marine sensors, physics, and visual rendering capabilities to support autonomous marine robot development and validation. HoloOcean 2.0, the next major release of HoloOcean, brings state-of-the-art features under a general marine simulator capable of supporting a variety of tasks. New features in HoloOcean 2.0 include migration to Unreal Engine (UE) 5.3, advanced vehicle dynamics using models from Fossen, and support for ROS2 using a custom bridge. Additional features are currently in development, including significantly more efficient ray tracing-based sidescan, forward-looking, and bathymetric sonar implementations; semantic sensors; environment generation tools; volumetric environmental effects; and realistic waves.
comment: 5 pages, 9 figures, submitted to the ICRA 2025 aq2uasim workshop
Vision-Guided Targeted Grasping and Vibration for Robotic Pollination in Controlled Environments
Robotic pollination offers a promising alternative to manual labor and bumblebee-assisted methods in controlled agriculture, where wind-driven pollination is absent and regulatory restrictions limit the use of commercial pollinators. In this work, we present and validate a vision-guided robotic framework that uses data from an end-effector mounted RGB-D sensor and combines 3D plant reconstruction, targeted grasp planning, and physics-based vibration modeling to enable precise pollination. First, the plant is reconstructed in 3D and registered to the robot coordinate frame to identify obstacle-free grasp poses along the main stem. Second, a discrete elastic rod model predicts the relationship between actuation parameters and flower dynamics, guiding the selection of optimal pollination strategies. Finally, a manipulator with soft grippers grasps the stem and applies controlled vibrations to induce pollen release. End-to-end experiments demonstrate a 92.5\% main-stem grasping success rate, and simulation-guided optimization of vibration parameters further validates the feasibility of our approach, ensuring that the robot can safely and effectively perform pollination without damaging the flower. To our knowledge, this is the first robotic system to jointly integrate vision-based grasping and vibration modeling for automated precision pollination.
Towards Autonomous Tape Handling for Robotic Wound Redressing
Chronic wounds, such as diabetic, pressure, and venous ulcers, affect over 6.5 million patients in the United States alone and generate an annual cost exceeding \$25 billion. Despite this burden, chronic wound care remains a routine yet manual process performed exclusively by trained clinicians due to its critical safety demands. We envision a future in which robotics and automation support wound care to lower costs and enhance patient outcomes. This paper introduces an autonomous framework for one of the most fundamental yet challenging subtasks in wound redressing: adhesive tape manipulation. Specifically, we address two critical capabilities: tape initial detachment (TID) and secure tape placement. To handle the complex adhesive dynamics of detachment, we propose a force-feedback imitation learning approach trained from human teleoperation demonstrations. For tape placement, we develop a numerical trajectory optimization method based to ensure smooth adhesion and wrinkle-free application across diverse anatomical surfaces. We validate these methods through extensive experiments, demonstrating reliable performance in both quantitative evaluations and integrated wound redressing pipelines. Our results establish tape manipulation as an essential step toward practical robotic wound care automation.
Multi-Robot Distributed Optimization for Exploration and Mapping of Unknown Environments using Bioinspired Tactile-Sensor
This project proposes a bioinspired multi-robot system using Distributed Optimization for efficient exploration and mapping of unknown environments. Each robot explores its environment and creates a map, which is afterwards put together to form a global 2D map of the environment. Inspired by wall-following behaviors, each robot autonomously explores its neighborhood based on a tactile sensor, similar to the antenna of a cockroach, mounted on the surface of the robot. Instead of avoiding obstacles, robots log collision points when they touch obstacles. This decentralized control strategy ensures effective task allocation and efficient exploration of unknown terrains, with applications in search and rescue, industrial inspection, and environmental monitoring. The approach was validated through experiments using e-puck robots in a simulated 1.5 x 1.5 m environment with three obstacles. The results demonstrated the system's effectiveness in achieving high coverage, minimizing collisions, and constructing accurate 2D maps.
Cross-Embodiment Dexterous Hand Articulation Generation via Morphology-Aware Learning
Dexterous grasping with multi-fingered hands remains challenging due to high-dimensional articulations and the cost of optimization-based pipelines. Existing end-to-end methods require training on large-scale datasets for specific hands, limiting their ability to generalize across different embodiments. We propose an eigengrasp-based, end-to-end framework for cross-embodiment grasp generation. From a hand's morphology description, we derive a morphology embedding and an eigengrasp set. Conditioned on these, together with the object point cloud and wrist pose, an amplitude predictor regresses articulation coefficients in a low-dimensional space, which are decoded into full joint articulations. Articulation learning is supervised with a Kinematic-Aware Articulation Loss (KAL) that emphasizes fingertip-relevant motions and injects morphology-specific structure. In simulation on unseen objects across three dexterous hands, our model attains a 91.9% average grasp success rate with less than 0.4 seconds inference per grasp. With few-shot adaptation to an unseen hand, it achieves 85.6% success on unseen objects in simulation, and real-world experiments on this few-shot generalized hand achieve an 87% success rate. The code and additional materials will be made available upon publication on our project website https://connor-zh.github.io/cross_embodiment_dexterous_grasping.
Hybrid Quantum-Classical Policy Gradient for Adaptive Control of Cyber-Physical Systems: A Comparative Study of VQC vs. MLP
The comparative evaluation between classical and quantum reinforcement learning (QRL) paradigms was conducted to investigate their convergence behavior, robustness under observational noise, and computational efficiency in a benchmark control environment. The study employed a multilayer perceptron (MLP) agent as a classical baseline and a parameterized variational quantum circuit (VQC) as a quantum counterpart, both trained on the CartPole-v1 environment over 500 episodes. Empirical results demonstrated that the classical MLP achieved near-optimal policy convergence with a mean return of 498.7 +/- 3.2, maintaining stable equilibrium throughout training. In contrast, the VQC exhibited limited learning capability, with an average return of 14.6 +/- 4.8, primarily constrained by circuit depth and qubit connectivity. Noise robustness analysis further revealed that the MLP policy deteriorated gracefully under Gaussian perturbations, while the VQC displayed higher sensitivity at equivalent noise levels. Despite the lower asymptotic performance, the VQC exhibited significantly lower parameter count and marginally increased training time, highlighting its potential scalability for low-resource quantum processors. The results suggest that while classical neural policies remain dominant in current control benchmarks, quantum-enhanced architectures could offer promising efficiency advantages once hardware noise and expressivity limitations are mitigated.
comment: 6 pages, 5 figures, 2 tables, 17 equations, 1 algorithm
Information-Theoretic Policy Pre-Training with Empowerment
Empowerment, an information-theoretic measure of an agent's potential influence on its environment, has emerged as a powerful intrinsic motivation and exploration framework for reinforcement learning (RL). Besides for unsupervised RL and skill learning algorithms, the specific use of empowerment as a pre-training signal has received limited attention in the literature. We show that empowerment can be used as a pre-training signal for data-efficient downstream task adaptation. For this we extend the traditional notion of empowerment by introducing discounted empowerment, which balances the agent's control over the environment across short- and long-term horizons. Leveraging this formulation, we propose a novel pre-training paradigm that initializes policies to maximize discounted empowerment, enabling agents to acquire a robust understanding of environmental dynamics. We analyze empowerment-based pre-training for various existing RL algorithms and empirically demonstrate its potential as a general-purpose initialization strategy: empowerment-maximizing policies with long horizons are data-efficient and effective, leading to improved adaptability in downstream tasks. Our findings pave the way for future research to scale this framework to high-dimensional and complex tasks, further advancing the field of RL.
Coordinate-Consistent Localization via Continuous-Time Calibration and Fusion of UWB and SLAM Observations
Onboard simultaneous localization and mapping (SLAM) methods are commonly used to provide accurate localization information for autonomous robots. However, the coordinate origin of SLAM estimate often resets for each run. On the other hand, UWB-based localization with fixed anchors can ensure a consistent coordinate reference across sessions; however, it requires an accurate assignment of the anchor nodes' coordinates. To this end, we propose a two-stage approach that calibrates and fuses UWB data and SLAM data to achieve coordinate-wise consistent and accurate localization in the same environment. In the first stage, we solve a continuous-time batch optimization problem by using the range and odometry data from one full run, incorporating height priors and anchor-to-anchor distance factors to recover the anchors' 3D positions. For the subsequent runs in the second stage, a sliding-window optimization scheme fuses the UWB and SLAM data, which facilitates accurate localization in the same coordinate system. Experiments are carried out on the NTU VIRAL dataset with six scenarios of UAV flight, and we show that calibration using data in one run is sufficient to enable accurate localization in the remaining runs. We release our source code to benefit the community at https://github.com/ntdathp/slam-uwb-calibration.
AI-Enabled Capabilities to Facilitate Next-Generation Rover Surface Operations
Current planetary rovers operate at traverse speeds of approximately 10 cm/s, fundamentally limiting exploration efficiency. This work presents integrated AI systems which significantly improve autonomy through three components: (i) the FASTNAV Far Obstacle Detector (FOD), capable of facilitating sustained 1.0 m/s speeds via computer vision-based obstacle detection; (ii) CISRU, a multi-robot coordination framework enabling human-robot collaboration for in-situ resource utilisation; and (iii) the ViBEKO and AIAXR deep learning-based terrain classification studies. Field validation in Mars analogue environments demonstrated these systems at Technology Readiness Level 4, providing measurable improvements in traverse speed, classification accuracy, and operational safety for next-generation planetary missions.
comment: Paper for 18th Symposium on Advanced Space Technologies in Robotics and Automation (ASTRA), presented on October 7th at Leiden, Netherlands
The DISTANT Design for Remote Transmission and Steering Systems for Planetary Robotics
Planetary exploration missions require robust locomotion systems capable of operating in extreme environments over extended periods. This paper presents the DISTANT (Distant Transmission and Steering Systems) design, a novel approach for relocating rover traction and steering actuators from wheel-mounted positions to a thermally protected warm box within the rover body. The design addresses critical challenges in long-distance traversal missions by protecting sensitive components from thermal cycling, dust contamination, and mechanical wear. A double wishbone suspension configuration with cardan joints and capstan drive steering has been selected as the optimal architecture following comprehensive trade-off analysis. The system enables independent wheel traction, steering control, and suspension management whilst maintaining all motorisation within the protected environment. The design meets a 50 km traverse requirement without performance degradation, with integrated dust protection mechanisms and thermal management solutions. Testing and validation activities are planned for Q1 2026 following breadboard manufacturing at 1:3 scale.
comment: Paper for 18th Symposium on Advanced Space Technologies in Robotics and Automation (ASTRA), presented on October 7th at Leiden, Netherlands
Learning to Crawl: Latent Model-Based Reinforcement Learning for Soft Robotic Adaptive Locomotion
Soft robotic crawlers are mobile robots that utilize soft body deformability and compliance to achieve locomotion through surface contact. Designing control strategies for such systems is challenging due to model inaccuracies, sensor noise, and the need to discover locomotor gaits. In this work, we present a model-based reinforcement learning (MB-RL) framework in which latent dynamics inferred from onboard sensors serve as a predictive model that guides an actor-critic algorithm to optimize locomotor policies. We evaluate the framework on a minimal crawler model in simulation using inertial measurement units and time-of-flight sensors as observations. The learned latent dynamics enable short-horizon motion prediction while the actor-critic discovers effective locomotor policies. This approach highlights the potential of latent-dynamics MB-RL for enabling embodied soft robotic adaptive locomotion based solely on noisy sensor feedback.
A Co-Design Framework for Energy-Aware Monoped Jumping with Detailed Actuator Modeling
A monoped's jump height and energy consumption depend on both, its mechanical design and control strategy. Existing co-design frameworks typically optimize for either maximum height or minimum energy, neglecting their trade-off. They also often omit gearbox parameter optimization and use oversimplified actuator mass models, producing designs difficult to replicate in practice. In this work, we introduce a novel three-stage co-design optimization framework that jointly maximizes jump height while minimizing mechanical energy consumption of a monoped. The proposed method explicitly incorporates realistic actuator mass models and optimizes mechanical design (including gearbox) and control parameters within a unified framework. The resulting design outputs are then used to automatically generate a parameterized CAD model suitable for direct fabrication, significantly reducing manual design iterations. Our experimental evaluations demonstrate a 50 percent reduction in mechanical energy consumption compared to the baseline design, while achieving a jump height of 0.8m. Video presentation is available at http://y2u.be/XW8IFRCcPgM
comment: 7 pages, 8 figures, 1 table, Accepted at IEEE-RAS 24th International Conference on Humanoid Robots (Humanoids) 2025, Aman Singh, Aastha Mishra - Authors contributed equally
The Safety Challenge of World Models for Embodied AI Agents: A Review
The rapid progress in embodied artificial intelligence has highlighted the necessity for more advanced and integrated models that can perceive, interpret, and predict environmental dynamics. In this context, World Models (WMs) have been introduced to provide embodied agents with the abilities to anticipate future environmental states and fill in knowledge gaps, thereby enhancing agents' ability to plan and execute actions. However, when dealing with embodied agents it is fundamental to ensure that predictions are safe for both the agent and the environment. In this article, we conduct a comprehensive literature review of World Models in the domains of autonomous driving and robotics, with a specific focus on the safety implications of scene and control generation tasks. Our review is complemented by an empirical analysis, wherein we collect and examine predictions from state-of-the-art models, identify and categorize common faults (herein referred to as pathologies), and provide a quantitative evaluation of the results.
VCoT-Grasp: Grasp Foundation Models with Visual Chain-of-Thought Reasoning for Language-driven Grasp Generation
Robotic grasping is one of the most fundamental tasks in robotic manipulation, and grasp detection/generation has long been the subject of extensive research. Recently, language-driven grasp generation has emerged as a promising direction due to its practical interaction capabilities. However, most existing approaches either lack sufficient reasoning and generalization capabilities or depend on complex modular pipelines. Moreover, current grasp foundation models tend to overemphasize dialog and object semantics, resulting in inferior performance and restriction to single-object grasping. To maintain strong reasoning ability and generalization in cluttered environments, we propose VCoT-Grasp, an end-to-end grasp foundation model that incorporates visual chain-of-thought reasoning to enhance visual understanding for grasp generation. VCoT-Grasp adopts a multi-turn processing paradigm that dynamically focuses on visual inputs while providing interpretable reasoning traces. For training, we refine and introduce a large-scale dataset, VCoT-GraspSet, comprising 167K synthetic images with over 1.36M grasps, as well as 400+ real-world images with more than 1.2K grasps, annotated with intermediate bounding boxes. Extensive experiments on both VCoT-GraspSet and real robot demonstrate that our method significantly improves grasp success rates and generalizes effectively to unseen objects, backgrounds, and distractors. More details can be found at https://zhanghr2001.github.io/VCoT-Grasp.github.io.
Human-in-the-loop Optimisation in Robot-assisted Gait Training
Wearable robots offer a promising solution for quantitatively monitoring gait and providing systematic, adaptive assistance to promote patient independence and improve gait. However, due to significant interpersonal and intrapersonal variability in walking patterns, it is important to design robot controllers that can adapt to the unique characteristics of each individual. This paper investigates the potential of human-in-the-loop optimisation (HILO) to deliver personalised assistance in gait training. The Covariance Matrix Adaptation Evolution Strategy (CMA-ES) was employed to continuously optimise an assist-as-needed controller of a lower-limb exoskeleton. Six healthy individuals participated over a two-day experiment. Our results suggest that while the CMA-ES appears to converge to a unique set of stiffnesses for each individual, no measurable impact on the subjects' performance was observed during the validation trials. These findings highlight the impact of human-robot co-adaptation and human behaviour variability, whose effect may be greater than potential benefits of personalising rule-based assistive controllers. Our work contributes to understanding the limitations of current personalisation approaches in exoskeleton-assisted gait rehabilitation and identifies key challenges for effective implementation of human-in-the-loop optimisation in this domain.
Precise and Efficient Collision Prediction under Uncertainty in Autonomous Driving ICRA 2026
This research introduces two efficient methods to estimate the collision risk of planned trajectories in autonomous driving under uncertain driving conditions. Deterministic collision checks of planned trajectories are often inaccurate or overly conservative, as noisy perception, localization errors, and uncertain predictions of other traffic participants introduce significant uncertainty into the planning process. This paper presents two semi-analytic methods to compute the collision probability of planned trajectories with arbitrary convex obstacles. The first approach evaluates the probability of spatial overlap between an autonomous vehicle and surrounding obstacles, while the second estimates the collision probability based on stochastic boundary crossings. Both formulations incorporate full state uncertainties, including position, orientation, and velocity, and achieve high accuracy at computational costs suitable for real-time planning. Simulation studies verify that the proposed methods closely match Monte Carlo results while providing significant runtime advantages, enabling their use in risk-aware trajectory planning. The collision estimation methods are available as open-source software: https://github.com/TUM-AVS/Collision-Probability-Estimation
comment: 8 pages, submitted to the IEEE ICRA 2026, Vienna, Austria
Federated Split Learning for Resource-Constrained Robots in Industrial IoT: Framework Comparison, Optimization Strategies, and Future Directions
Federated split learning (FedSL) has emerged as a promising paradigm for enabling collaborative intelligence in industrial Internet of Things (IoT) systems, particularly in smart factories where data privacy, communication efficiency, and device heterogeneity are critical concerns. In this article, we present a comprehensive study of FedSL frameworks tailored for resource-constrained robots in industrial scenarios. We compare synchronous, asynchronous, hierarchical, and heterogeneous FedSL frameworks in terms of workflow, scalability, adaptability, and limitations under dynamic industrial conditions. Furthermore, we systematically categorize token fusion strategies into three paradigms: input-level (pre-fusion), intermediate-level (intra-fusion), and output-level (post-fusion), and summarize their respective strengths in industrial applications. We also provide adaptive optimization techniques to enhance the efficiency and feasibility of FedSL implementation, including model compression, split layer selection, computing frequency allocation, and wireless resource management. Simulation results validate the performance of these frameworks under industrial detection scenarios. Finally, we outline open issues and research directions of FedSL in future smart manufacturing systems.
comment: 9 pages, 5 figures, submitted to the IEEE magazine
Stable Robot Motions on Manifolds: Learning Lyapunov-Constrained Neural Manifold ODEs
Learning stable dynamical systems from data is crucial for safe and reliable robot motion planning and control. However, extending stability guarantees to trajectories defined on Riemannian manifolds poses significant challenges due to the manifold's geometric constraints. To address this, we propose a general framework for learning stable dynamical systems on Riemannian manifolds using neural ordinary differential equations. Our method guarantees stability by projecting the neural vector field evolving on the manifold so that it strictly satisfies the Lyapunov stability criterion, ensuring stability at every system state. By leveraging a flexible neural parameterisation for both the base vector field and the Lyapunov function, our framework can accurately represent complex trajectories while respecting manifold constraints by evolving solutions directly on the manifold. We provide an efficient training strategy for applying our framework and demonstrate its utility by solving Riemannian LASA datasets on the unit quaternion (S^3) and symmetric positive-definite matrix manifolds, as well as robotic motions evolving on \mathbb{R}^3 \times S^3. We demonstrate the performance, scalability, and practical applicability of our approach through extensive simulations and by learning robot motions in a real-world experiment.
comment: 12 pages, 6 figures
Oracle-Guided Masked Contrastive Reinforcement Learning for Visuomotor Policies
A prevailing approach for learning visuomotor policies is to employ reinforcement learning to map high-dimensional visual observations directly to action commands. However, the combination of high-dimensional visual inputs and agile maneuver outputs leads to long-standing challenges, including low sample efficiency and significant sim-to-real gaps. To address these issues, we propose Oracle-Guided Masked Contrastive Reinforcement Learning (OMC-RL), a novel framework designed to improve the sample efficiency and asymptotic performance of visuomotor policy learning. OMC-RL explicitly decouples the learning process into two stages: an upstream representation learning stage and a downstream policy learning stage. In the upstream stage, a masked Transformer module is trained with temporal modeling and contrastive learning to extract temporally-aware and task-relevant representations from sequential visual inputs. After training, the learned encoder is frozen and used to extract visual representations from consecutive frames, while the Transformer module is discarded. In the downstream stage, an oracle teacher policy with privileged access to global state information supervises the agent during early training to provide informative guidance and accelerate early policy learning. This guidance is gradually reduced to allow independent exploration as training progresses. Extensive experiments in simulated and real-world environments demonstrate that OMC-RL achieves superior sample efficiency and asymptotic policy performance, while also improving generalization across diverse and perceptually complex scenarios.
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI
Large language models leverage internet-scale text data, yet embodied AI remains constrained by the prohibitive costs of physical trajectory collection. Desktop environments -- particularly gaming -- offer a compelling alternative: they provide rich sensorimotor interactions at scale while maintaining the structured observation-action coupling essential for embodied learning. We present D2E (Desktop to Embodied AI), a framework that demonstrates desktop interactions can serve as an effective pretraining substrate for robotics embodied AI tasks. Unlike prior work that remained domain-specific (e.g., VPT for Minecraft) or kept data proprietary (e.g., SIMA), D2E establishes a complete pipeline from scalable desktop data collection to verified transfer in embodied domains. Our framework comprises three components: (1) the OWA Toolkit that unifies diverse desktop interactions into a standardized format with 152x compression, (2) the Generalist-IDM that achieves strong zero-shot generalization across unseen games through timestamp-based event prediction, enabling internet-scale pseudo-labeling, and (3) VAPT that transfers desktop-pretrained representations to physical manipulation and navigation. Using 1.3K+ hours of data (259 hours of human demonstrations, and 1K+ hours of pseudo-labeled gameplay), we achieve a total of 96.6% success rate on LIBERO manipulation and 83.3% on CANVAS navigation benchmarks. This validates that sensorimotor primitives in digital interactions exhibit sufficient invariance to transfer meaningfully to physical embodied tasks, establishing desktop pretraining as a practical paradigm for robotics. We will make all our work public, including the OWA toolkit, datasets of human-collected and pseudo-labeled, and VAPT-trained models available at https://worv-ai.github.io/d2e/
Verifier-free Test-Time Sampling for Vision Language Action Models
Vision-Language-Action models (VLAs) have demonstrated remarkable performance in robot control. However, they remain fundamentally limited in tasks that require high precision due to their single-inference paradigm. While test-time scaling approaches using external verifiers have shown promise, they require additional training and fail to generalize to unseen conditions. We propose Masking Distribution Guided Selection (MG-Select), a novel test-time scaling framework for VLAs that leverages the model's internal properties without requiring additional training or external modules. Our approach utilizes KL divergence from a reference action token distribution as a confidence metric for selecting the optimal action from multiple candidates. We introduce a reference distribution generated by the same VLA but with randomly masked states and language conditions as inputs, ensuring maximum uncertainty while remaining aligned with the target task distribution. Additionally, we propose a joint training strategy that enables the model to learn both conditional and unconditional distributions by applying dropout to state and language conditions, thereby further improving the quality of the reference distribution. Our experiments demonstrate that MG-Select achieves significant performance improvements, including a 28%/35% improvement in real-world in-distribution/out-of-distribution tasks, along with a 168% relative gain on RoboCasa pick-and-place tasks trained with 30 demonstrations.
comment: 14 pages; 3 figures
DeLTa: Demonstration and Language-Guided Novel Transparent Object Manipulation
Despite the prevalence of transparent object interactions in human everyday life, transparent robotic manipulation research remains limited to short-horizon tasks and basic grasping capabilities.Although some methods have partially addressed these issues, most of them have limitations in generalizability to novel objects and are insufficient for precise long-horizon robot manipulation. To address this limitation, we propose DeLTa (Demonstration and Language-Guided Novel Transparent Object Manipulation), a novel framework that integrates depth estimation, 6D pose estimation, and vision-language planning for precise long-horizon manipulation of transparent objects guided by natural task instructions. A key advantage of our method is its single-demonstration approach, which generalizes 6D trajectories to novel transparent objects without requiring category-level priors or additional training. Additionally, we present a task planner that refines the VLM-generated plan to account for the constraints of a single-arm, eye-in-hand robot for long-horizon object manipulation tasks. Through comprehensive evaluation, we demonstrate that our method significantly outperforms existing transparent object manipulation approaches, particularly in long-horizon scenarios requiring precise manipulation capabilities. Project page: https://sites.google.com/view/DeLTa25/
comment: Project page: https://sites.google.com/view/DeLTa25/
MetaVLA: Unified Meta Co-training For Efficient Embodied Adaption
Vision-Language-Action (VLA) models show promise in embodied reasoning, yet remain far from true generalists-they often require task-specific fine-tuning, and generalize poorly to unseen tasks. We propose MetaVLA, a unified, backbone-agnostic post-training framework for efficient and scalable alignment. MetaVLA introduces Context-Aware Meta Co-Training, which consolidates diverse target tasks into a single fine-tuning stage while leveraging structurally diverse auxiliary tasks to improve in-domain generalization. Unlike naive multi-task SFT, MetaVLA integrates a lightweight meta-learning mechanism-derived from Attentive Neural Processes-to enable rapid adaptation from diverse contexts with minimal architectural change or inference overhead. On the LIBERO benchmark, MetaVLA with six auxiliary tasks outperforms OpenVLA by up to 8.0% on long-horizon tasks, reduces training steps from 240K to 75K, and cuts GPU time by ~76%. These results show that scalable, low-resource post-training is achievable-paving the way toward general-purpose embodied agents. Code will be available.
GO-Flock: Goal-Oriented Flocking in 3D Unknown Environments with Depth Maps
Artificial Potential Field (APF) methods are widely used for reactive flocking control, but they often suffer from challenges such as deadlocks and local minima, especially in the presence of obstacles. Existing solutions to address these issues are typically passive, leading to slow and inefficient collective navigation. As a result, many APF approaches have only been validated in obstacle-free environments or simplified, pseudo 3D simulations. This paper presents GO-Flock, a hybrid flocking framework that integrates planning with reactive APF-based control. GO-Flock consists of an upstream Perception Module, which processes depth maps to extract waypoints and virtual agents for obstacle avoidance, and a downstream Collective Navigation Module, which applies a novel APF strategy to achieve effective flocking behavior in cluttered environments. We evaluate GO-Flock against passive APF-based approaches to demonstrate their respective merits, such as their flocking behavior and the ability to overcome local minima. Finally, we validate GO-Flock through obstacle-filled environment and also hardware-in-the-loop experiments where we successfully flocked a team of nine drones, six physical and three virtual, in a forest environment.
ARRC: Advanced Reasoning Robot Control - Knowledge-Driven Autonomous Manipulation Using Retrieval-Augmented Generation
We present ARRC (Advanced Reasoning Robot Control), a practical system that connects natural-language instructions to safe local robotic control by combining Retrieval-Augmented Generation (RAG) with RGB-D perception and guarded execution on an affordable robot arm. The system indexes curated robot knowledge (movement patterns, task templates, and safety heuristics) in a vector database, retrieves task-relevant context for each instruction, and conditions a large language model (LLM) to produce JSON-structured action plans. Plans are executed on a UFactory xArm 850 fitted with a Dynamixel-driven parallel gripper and an Intel RealSense D435 camera. Perception uses AprilTag detections fused with depth to produce object-centric metric poses. Execution is enforced via software safety gates: workspace bounds, speed and force caps, timeouts, and bounded retries. We describe the architecture, knowledge design, integration choices, and a reproducible evaluation protocol for tabletop scan, approach, and pick-place tasks. Experimental results demonstrate the efficacy of the proposed approach. Our design shows that RAG-based planning can substantially improve plan validity and adaptability while keeping perception and low-level control local to the robot.
Correlation-Aware Dual-View Pose and Velocity Estimation for Dynamic Robotic Manipulation
Accurate pose and velocity estimation is essential for effective spatial task planning in robotic manipulators. While centralized sensor fusion has traditionally been used to improve pose estimation accuracy, this paper presents a novel decentralized fusion approach to estimate both pose and velocity. We use dual-view measurements from an eye-in-hand and an eye-to-hand vision sensor configuration mounted on a manipulator to track a target object whose motion is modeled as random walk (stochastic acceleration model). The robot runs two independent adaptive extended Kalman filters formulated on a matrix Lie group, developed as part of this work. These filters predict poses and velocities on the manifold $\mathbb{SE}(3) \times \mathbb{R}^3 \times \mathbb{R}^3$ and update the state on the manifold $\mathbb{SE}(3)$. The final fused state comprising the fused pose and velocities of the target is obtained using a correlation-aware fusion rule on Lie groups. The proposed method is evaluated on a UFactory xArm 850 equipped with Intel RealSense cameras, tracking a moving target. Experimental results validate the effectiveness and robustness of the proposed decentralized dual-view estimation framework, showing consistent improvements over state-of-the-art methods.
Real-Time Glass Detection and Reprojection using Sensor Fusion Onboard Aerial Robots ICRA 2026
Autonomous aerial robots are increasingly being deployed in real-world scenarios, where transparent obstacles present significant challenges to reliable navigation and mapping. These materials pose a unique problem for traditional perception systems because they lack discernible features and can cause conventional depth sensors to fail, leading to inaccurate maps and potential collisions. To ensure safe navigation, robots must be able to accurately detect and map these transparent obstacles. Existing methods often rely on large, expensive sensors or algorithms that impose high computational burdens, making them unsuitable for low Size, Weight, and Power (SWaP) robots. In this work, we propose a novel and computationally efficient framework for detecting and mapping transparent obstacles onboard a sub-300g quadrotor. Our method fuses data from a Time-of-Flight (ToF) camera and an ultrasonic sensor with a custom, lightweight 2D convolution model. This specialized approach accurately detects specular reflections and propagates their depth into corresponding empty regions of the depth map, effectively rendering transparent obstacles visible. The entire pipeline operates in real-time, utilizing only a small fraction of a CPU core on an embedded processor. We validate our system through a series of experiments in both controlled and real-world environments, demonstrating the utility of our method through experiments where the robot maps indoor environments containing glass. Our work is, to our knowledge, the first of its kind to demonstrate a real-time, onboard transparent obstacle mapping system on a low-SWaP quadrotor using only the CPU.
comment: 8 pages, 8 figures, submitted to ICRA 2026
What You Don't Know Can Hurt You: How Well do Latent Safety Filters Understand Partially Observable Safety Constraints?
Safe control techniques, such as Hamilton-Jacobi reachability, provide principled methods for synthesizing safety-preserving robot policies but typically assume hand-designed state spaces and full observability. Recent work has relaxed these assumptions via latent-space safe control, where state representations and dynamics are learned jointly through world models that reconstruct future high-dimensional observations (e.g., RGB images) from current observations and actions. This enables safety constraints that are difficult to specify analytically (e.g., spilling) to be framed as classification problems in latent space, allowing controllers to operate directly from raw observations. However, these methods assume that safety-critical features are observable in the learned latent state. We ask: when are latent state spaces sufficient for safe control? To study this, we examine temperature-based failures, comparable to overheating in cooking or manufacturing tasks, and find that RGB-only observations can produce myopic safety behaviors, e.g., avoiding seeing failure states rather than preventing failure itself. To predict such behaviors, we introduce a mutual information-based measure that identifies when observations fail to capture safety-relevant features. Finally, we propose a multimodal-supervised training strategy that shapes the latent state with additional sensory inputs during training, but requires no extra modalities at deployment, and validate our approach in simulation and on hardware with a Franka Research 3 manipulator preventing a pot of wax from overheating.
comment: 8 tables 6 figures
Active Next-Best-View Optimization for Risk-Averse Path Planning
Safe navigation in uncertain environments requires planning methods that integrate risk aversion with active perception. In this work, we present a unified framework that refines a coarse reference path by constructing tail-sensitive risk maps from Average Value-at-Risk statistics on an online-updated 3D Gaussian-splat Radiance Field. These maps enable the generation of locally safe and feasible trajectories. In parallel, we formulate Next-Best-View (NBV) selection as an optimization problem on the SE(3) pose manifold, where Riemannian gradient descent maximizes an expected information gain objective to reduce uncertainty most critical for imminent motion. Our approach advances the state-of-the-art by coupling risk-averse path refinement with NBV planning, while introducing scalable gradient decompositions that support efficient online updates in complex environments. We demonstrate the effectiveness of the proposed framework through extensive computational studies.
Terrain-Aided Navigation Using a Point Cloud Measurement Sensor
We investigate the use of a point cloud measurement in terrain-aided navigation. Our goal is to aid an inertial navigation system, by exploring ways to generate a useful measurement innovation error for effective nonlinear state estimation. We compare two such measurement models that involve the scanning of a digital terrain elevation model: a) one that is based on typical ray-casting from a given pose, that returns the predicted point cloud measurement from that pose, and b) another computationally less intensive one that does not require raycasting and we refer to herein as a sliding grid. Besides requiring a pose, it requires the pattern of the point cloud measurement itself and returns a predicted point cloud measurement. We further investigate the observability properties of the altitude for both measurement models. As a baseline, we compare the use of a point cloud measurement performance to the use of a radar altimeter and show the gains in accuracy. We conclude by showing that a point cloud measurement outperforms the use of a radar altimeter, and the point cloud measurement model to use depends on the computational resources
Three-dimensional Integrated Guidance and Control for Leader-Follower Flexible Formation of Fixed Wing UAVs
This paper presents a nonlinear integrated guidance and control (IGC) approach for flexible leader-follower formation flight of fixed-wing unmanned aerial vehicles (UAVs) while accounting for high-fidelity aerodynamics and thrust dynamics. Unlike conventional leader-follower schemes that fix the follower's position relative to the leader, the follower is steered to maintain range and bearing angles (which is the angle between its velocity vector and its line-of-sight (LOS) with respect to the leader) arbitrarily close to the prescribed values, enabling the follower to maintain formation on a hemispherical region behind the leader. The proposed IGC framework directly maps leader-follower relative range dynamics to throttle commands, and the follower's velocity orientation relative to the LOS to aerodynamic control surface deflections. This enables synergism between guidance and control subsystems. The control design uses a dynamic surface control-based backstepping approach to achieve convergence to the desired formation set, where Lyapunov barrier functions are incorporated to ensure the follower's bearing angle is constrained within specified bounds. Rigorous stability analysis guarantees uniform ultimate boundedness of all error states and strict constraint satisfaction in the presence of aerodynamic nonlinearities. The proposed flexible formation scheme allows the follower to have an orientation mismatch relative to the leader to execute anticipatory reconfiguration by transitioning between the relative positions in the admissible formation set when the leader aggressively maneuvers. The proposed IGC law relies only on relative information and onboard sensors without the information about the leader's maneuver, making it suitable for GPS-denied or non-cooperative scenarios. Finally, we present simulation results to vindicate the effectiveness and robustness of our approach.
Constrained Natural Language Action Planning for Resilient Embodied Systems
Replicating human-level intelligence in the execution of embodied tasks remains challenging due to the unconstrained nature of real-world environments. Novel use of large language models (LLMs) for task planning seeks to address the previously intractable state/action space of complex planning tasks, but hallucinations limit their reliability, and thus, viability beyond a research context. Additionally, the prompt engineering required to achieve adequate system performance lacks transparency, and thus, repeatability. In contrast to LLM planning, symbolic planning methods offer strong reliability and repeatability guarantees, but struggle to scale to the complexity and ambiguity of real-world tasks. We introduce a new robotic planning method that augments LLM planners with symbolic planning oversight to improve reliability and repeatability, and provide a transparent approach to defining hard constraints with considerably stronger clarity than traditional prompt engineering. Importantly, these augmentations preserve the reasoning capabilities of LLMs and retain impressive generalization in open-world environments. We demonstrate our approach in simulated and real-world environments. On the ALFWorld planning benchmark, our approach outperforms current state-of-the-art methods, achieving a near-perfect 99% success rate. Deployment of our method to a real-world quadruped robot resulted in 100% task success compared to 50% and 30% for pure LLM and symbolic planners across embodied pick and place tasks. Our approach presents an effective strategy to enhance the reliability, repeatability and transparency of LLM-based robot planners while retaining their key strengths: flexibility and generalizability to complex real-world environments. We hope that this work will contribute to the broad goal of building resilient embodied intelligent systems.
A Formal gatekeeper Framework for Safe Dual Control with Active Exploration
Planning safe trajectories under model uncertainty is a fundamental challenge. Robust planning ensures safety by considering worst-case realizations, yet ignores uncertainty reduction and leads to overly conservative behavior. Actively reducing uncertainty on-the-fly during a nominal mission defines the dual control problem. Most approaches address this by adding a weighted exploration term to the cost, tuned to trade off the nominal objective and uncertainty reduction, but without formal consideration of when exploration is beneficial. Moreover, safety is enforced in some methods but not in others. We propose a framework that integrates robust planning with active exploration under formal guarantees as follows: The key innovation and contribution is that exploration is pursued only when it provides a verifiable improvement without compromising safety. To achieve this, we utilize our earlier work on gatekeeper as an architecture for safety verification, and extend it so that it generates both safe and informative trajectories that reduce uncertainty and the cost of the mission, or keep it within a user-defined budget. The methodology is evaluated via simulation case studies on the online dual control of a quadrotor under parametric uncertainty.
comment: Submitted to American Control Conference (ACC) 2026
Vi-TacMan: Articulated Object Manipulation via Vision and Touch
Autonomous manipulation of articulated objects remains a fundamental challenge for robots in human environments. Vision-based methods can infer hidden kinematics but can yield imprecise estimates on unfamiliar objects. Tactile approaches achieve robust control through contact feedback but require accurate initialization. This suggests a natural synergy: vision for global guidance, touch for local precision. Yet no framework systematically exploits this complementarity for generalized articulated manipulation. Here we present Vi-TacMan, which uses vision to propose grasps and coarse directions that seed a tactile controller for precise execution. By incorporating surface normals as geometric priors and modeling directions via von Mises-Fisher distributions, our approach achieves significant gains over baselines (all p<0.0001). Critically, manipulation succeeds without explicit kinematic models -- the tactile controller refines coarse visual estimates through real-time contact regulation. Tests on more than 50,000 simulated and diverse real-world objects confirm robust cross-category generalization. This work establishes that coarse visual cues suffice for reliable manipulation when coupled with tactile feedback, offering a scalable paradigm for autonomous systems in unstructured environments.
Bioinspired Tapered-Spring Turbulence Sensor for Underwater Flow Detection
This paper presents a bio-inspired underwater whisker sensor for robust hydrodynamic disturbance detection and efficient signal analysis based on Physical Reservoir Computing (PRC). The design uses a tapered nylon spring with embedded accelerometers to achieve spatially distributed vibration sensing and frequency separation along the whisker. Towing-tank experiments and computational fluid dynamics simulations confirmed that the whisker effectively distinguishes vortex regimes across different fin angles and maintains Strouhal scaling with flow velocity, where higher speeds increase vibration intensity without affecting the dominant frequencies. Frequency-domain analysis, Shannon entropy, and machine learning further validated the sensing performance: vortex shedding frequencies were identified with less than 10\% error, entropy captured the transition from coherent vortex streets to turbulence, and logistic regression achieved 86.0\% classification accuracy with millisecond-level inference. These results demonstrate that structurally encoded whisker sensing provides a scalable and real-time solution for underwater perception, wake tracking, and turbulence-aware navigation in autonomous marine robots.
comment: 9 pages, 9 figures
pRRTC: GPU-Parallel RRT-Connect for Fast, Consistent, and Low-Cost Motion Planning
Sampling-based motion planning algorithms, like the Rapidly-Exploring Random Tree (RRT) and its widely used variant, RRT-Connect, provide efficient solutions for high-dimensional planning problems faced by real-world robots. However, these methods remain computationally intensive, particularly in complex environments that require many collision checks. To improve performance, recent efforts have explored parallelizing specific components of RRT such as collision checking, or running multiple planners independently. However, little has been done to develop an integrated parallelism approach, co-designed for large-scale parallelism. In this work we present pRRTC, a RRT-Connect based planner co-designed for GPU acceleration across the entire algorithm through parallel expansion and SIMT-optimized collision checking. We evaluate the effectiveness of pRRTC on the MotionBenchMaker dataset using robots with 7, 8, and 14 degrees of freedom (DoF). Compared to the state-of-the-art, pRRTC achieves as much as a 10x speedup on constrained reaching tasks with a 5.4x reduction in standard deviation. pRRTC also achieves a 1.4x reduction in average initial path cost. Finally, we deploy pRRTC on a 14-DoF dual Franka Panda arm setup and demonstrate real-time, collision-free motion planning with dynamic obstacles. We open-source our planner to support the wider community.
comment: 7 pages, 7 figures, 1 table. Submitted to IEEE International Conference on Robotics and Automation 2026
BC-ADMM: An Efficient Non-convex Constrained Optimizer with Robotic Applications
Non-convex constrained optimizations are ubiquitous in robotic applications such as multi-agent navigation, UAV trajectory optimization, and soft robot simulation. For this problem class, conventional optimizers suffer from small step sizes and slow convergence. We propose BC-ADMM, a variant of Alternating Direction Method of Multiplier (ADMM), that can solve a class of non-convex constrained optimizations with biconvex constraint relaxation. Our algorithm allows larger step sizes by breaking the problem into small-scale sub-problems that can be easily solved in parallel. We show that our method has both theoretical convergence speed guarantees and practical convergence guarantees in the asymptotic sense. Through numerical experiments in a row of four robotic applications, we show that BC-ADMM has faster convergence than conventional gradient descent and Newton's method in terms of wall clock time.
Toward Dynamic Control of Tendon-driven Continuum Robots using Clarke Transform IROS 2025
In this paper, we propose a dynamic model and control framework for tendon-driven continuum robots (TDCRs) with multiple segments and an arbitrary number of tendons per segment. Our approach leverages the Clarke transform, the Euler-Lagrange formalism, and the piecewise constant curvature assumption to formulate a dynamic model on a two-dimensional manifold embedded in the joint space that inherently satisfies tendon constraints. We present linear and constraint-informed controllers that operate directly on this manifold, along with practical methods for preventing negative tendon forces without compromising control fidelity. This opens up new design possibilities for overactuated TDCRs with improved force distribution and stiffness without increasing controller complexity. We validate these approaches in simulation and on a physical prototype with one segment and five tendons, demonstrating accurate dynamic behavior and robust trajectory tracking under real-time conditions.
comment: Accepted for publication at IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025), 8 pages, and 8 figures
CottonSim: A vision-guided autonomous robotic system for cotton harvesting in Gazebo simulation
Cotton is a major cash crop in the United States, with the country being a leading global producer and exporter. Nearly all U.S. cotton is grown in the Cotton Belt, spanning 17 states in the southern region. Harvesting remains a critical yet challenging stage, impacted by the use of costly, environmentally harmful defoliants and heavy, expensive cotton pickers. These factors contribute to yield loss, reduced fiber quality, and soil compaction, which collectively threaten long-term sustainability. To address these issues, this study proposes a lightweight, small-scale, vision-guided autonomous robotic cotton picker as an alternative. An autonomous system, built on Clearpath's Husky platform and integrated with the CottonEye perception system, was developed and tested in the Gazebo simulation environment. A virtual cotton field was designed to facilitate autonomous navigation testing. The navigation system used Global Positioning System (GPS) and map-based guidance, assisted by an RGBdepth camera and a YOLOv8nseg instance segmentation model. The model achieved a mean Average Precision (mAP) of 85.2%, a recall of 88.9%, and a precision of 93.0%. The GPS-based approach reached a 100% completion rate (CR) within a $(5e-6)^{\circ}$ threshold, while the map-based method achieved a 96.7% CR within a 0.25 m threshold. The developed Robot Operating System (ROS) packages enable robust simulation of autonomous cotton picking, offering a scalable baseline for future agricultural robotics. CottonSim code and datasets are publicly available on GitHub: https://github.com/imtheva/CottonSim
comment: 16 pages, 15 figures, 4 tables
Capturing a Moving Target by Two Robots in the F2F Model
We study a search problem on capturing a moving target on an infinite real line. Two autonomous mobile robots (which can move with a maximum speed of 1) are initially placed at the origin, while an oblivious moving target is initially placed at a distance $d$ away from the origin. The robots can move along the line in any direction, but the target is oblivious, cannot change direction, and moves either away from or toward the origin at a constant speed $v$. Our aim is to design efficient algorithms for the two robots to capture the target. The target is captured only when both robots are co-located with it. The robots communicate with each other only face-to-face (F2F), meaning they can exchange information only when co-located, while the target remains oblivious and has no communication capabilities. We design algorithms under various knowledge scenarios, which take into account the prior knowledge the robots have about the starting distance $d$, the direction of movement (either toward or away from the origin), and the speed $v$ of the target. As a measure of the efficiency of the algorithms, we use the competitive ratio, which is the ratio of the capture time of an algorithm with limited knowledge to the capture time in the full-knowledge model. In our analysis, we are mindful of the cost of changing direction of movement, and show how to accomplish the capture of the target with at most three direction changes (turns).
Emergent interactions lead to collective frustration in robotic matter
Current artificial intelligence systems show near-human-level capabilities when deployed in isolation. Systems of a few collaborating intelligent agents are being engineered to perform tasks collectively. This raises the question of whether robotic matter, where many learning and intelligent agents interact, shows emergence of collective behaviour. And if so, which kind of phenomena would such systems exhibit? Here, we study a paradigmatic model for robotic matter: a stochastic many-particle system in which each particle is endowed with a deep neural network that predicts its transitions based on the particles' environments. For a one-dimensional model, we show that robotic matter exhibits complex emergent phenomena, including transitions between long-lived learning regimes, the emergence of particle species, and frustration. We also find a density-dependent phase transition with signatures of criticality. Using active matter theory, we show that this phase transition is a consequence of self-organisation mediated by emergent inter-particle interactions. Our simple model captures key features of more complex forms of robotic systems.
mindmap: Spatial Memory in Deep Feature Maps for 3D Action Policies
End-to-end learning of robot control policies, structured as neural networks, has emerged as a promising approach to robotic manipulation. To complete many common tasks, relevant objects are required to pass in and out of a robot's field of view. In these settings, spatial memory - the ability to remember the spatial composition of the scene - is an important competency. However, building such mechanisms into robot learning systems remains an open research problem. We introduce mindmap (Spatial Memory in Deep Feature Maps for 3D Action Policies), a 3D diffusion policy that generates robot trajectories based on a semantic 3D reconstruction of the environment. We show in simulation experiments that our approach is effective at solving tasks where state-of-the-art approaches without memory mechanisms struggle. We release our reconstruction system, training code, and evaluation tasks to spur research in this direction.
comment: Accepted to CoRL 2025 Workshop RemembeRL
FlowVLA: Visual Chain of Thought-based Motion Reasoning for Vision-Language-Action Models
Many Vision-Language-Action (VLA) models are built upon an internal world model trained via next-frame prediction ``$v_t \rightarrow v_{t+1}$''. However, this paradigm attempts to predict the future frame's appearance directly, without explicitly reasoning about the underlying dynamics. \textbf{This lack of an explicit motion reasoning step} often leads to physically implausible visual forecasts and inefficient policy learning. To address this limitation, we introduce the \textbf{Visual Chain of Thought (Visual CoT)}, a paradigm that compels the model to first reason about \textbf{motion dynamics} before generating the future frame. We instantiate this paradigm by proposing \textbf{FlowVLA}, an autoregressive Transformer that explicitly materializes this reasoning process as ``$v_t \rightarrow f_t \rightarrow v_{t+1}$'', where $f_t$ is an intermediate optical flow prediction that inherently encodes motion. By forcing the model to first follow the motion plan encoded by $f_t$, this process inherently \textbf{aligns the pre-training objective of dynamics prediction with the downstream task of action generation.} We conduct experiments on challenging robotics manipulation benchmarks, as well as real-robot evaluations. Our FlowVLA not only generates \textbf{more coherent and physically plausible visual predictions}, but also achieves state-of-the-art policy performance with \textbf{substantially improved sample efficiency}, pointing toward a more principled foundation for world modeling in VLAs. Project page: https://irpn-lab.github.io/FlowVLA/
Equivariant Filter for Relative Attitude and Target's Angular Velocity Estimation
Accurate estimation of the relative attitude and angular velocity between two rigid bodies is fundamental in aerospace applications such as spacecraft rendezvous and docking. In these scenarios, a chaser vehicle must determine the orientation and angular velocity of a target object using onboard sensors. This work addresses the challenge of designing an Equivariant Filter (EqF) that can reliably estimate both the relative attitude and the target angular velocity using noisy observations of two known, non-collinear vectors fixed in the target frame. To derive the EqF, a symmetry for the system is proposed and an equivariant lift onto the symmetry group is calculated. Observability and convergence properties are analyzed. Simulations demonstrate the filter's performance, with Monte Carlo runs yielding statistically significant results. The impact of low-rate measurements is also examined and a strategy to mitigate this effect is proposed. Experimental results, using fiducial markers and both conventional and event cameras for measurement acquisition, further validate the approach, confirming its effectiveness in a realistic setting.
comment: This work has been submitted to the IEEE for possible publication
Identifying Uncertainty in Self-Adaptive Robotics with Large Language Models
Future self-adaptive robots are expected to operate in highly dynamic environments while effectively managing uncertainties. However, identifying the sources and impacts of uncertainties in such robotic systems and defining appropriate mitigation strategies is challenging due to the inherent complexity of self-adaptive robots and the lack of comprehensive knowledge about the various factors influencing uncertainty. Hence, practitioners often rely on intuition and past experiences from similar systems to address uncertainties. In this article, we evaluate the potential of large language models (LLMs) in enabling a systematic and automated approach to identify uncertainties in self-adaptive robotics throughout the software engineering lifecycle. For this evaluation, we analyzed 10 advanced LLMs with varying capabilities across four industrial-sized robotics case studies, gathering the practitioners' perspectives on the LLM-generated responses related to uncertainties. Results showed that practitioners agreed with 63-88% of the LLM responses and expressed strong interest in the practicality of LLMs for this purpose.
Image-Based Visual Servoing for Enhanced Cooperation of Dual-Arm Manipulation
The cooperation of a pair of robot manipulators is required to manipulate a target object without any fixtures. The conventional control methods coordinate the end-effector pose of each manipulator with that of the other using their kinematics and joint coordinate measurements. Yet, the manipulators' inaccurate kinematics and joint coordinate measurements can cause significant pose synchronization errors in practice. This paper thus proposes an image-based visual servoing approach for enhancing the cooperation of a dual-arm manipulation system. On top of the classical control, the visual servoing controller lets each manipulator use its carried camera to measure the image features of the other's marker and adapt its end-effector pose with the counterpart on the move. Because visual measurements are robust to kinematic errors, the proposed control can reduce the end-effector pose synchronization errors and the fluctuations of the interaction forces of the pair of manipulators on the move. Theoretical analyses have rigorously proven the stability of the closed-loop system. Comparative experiments on real robots have substantiated the effectiveness of the proposed control.
comment: 8 pages, 7 figures. Project website: https://zizhe.io/ral-ibvs-enhanced/. This work has been accepted to the IEEE Robotics and Automation Letters in Feb 2025
Interpreting Behaviors and Geometric Constraints as Knowledge Graphs for Robot Manipulation Control
In this paper, we investigate the feasibility of using knowledge graphs to interpret actions and behaviors for robot manipulation control. Equipped with an uncalibrated visual servoing controller, we propose to use robot knowledge graphs to unify behavior trees and geometric constraints, conceptualizing robot manipulation control as semantic events. The robot knowledge graphs not only preserve the advantages of behavior trees in scripting actions and behaviors, but also offer additional benefits of mapping natural interactions between concepts and events, which enable knowledgeable explanations of the manipulation contexts. Through real-world evaluations, we demonstrate the flexibility of the robot knowledge graphs to support explainable robot manipulation control.
RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Interactive Environmental Learning in Physical Embodied Systems
Embodied agents face persistent challenges in real-world environments, including partial observability, limited spatial reasoning, and high-latency multi-memory integration. We present RoboMemory, a brain-inspired framework that unifies Spatial, Temporal, Episodic, and Semantic memory under a parallelized architecture for efficient long-horizon planning and interactive environmental learning. A dynamic spatial knowledge graph (KG) ensures scalable and consistent memory updates, while a closed-loop planner with a critic module supports adaptive decision-making in dynamic settings. Experiments on EmbodiedBench show that RoboMemory, built on Qwen2.5-VL-72B-Ins, improves average success rates by 25% over its baseline and exceeds the closed-source state-of-the-art (SOTA) Gemini-1.5-Pro by 3%. Real-world trials further confirm its capacity for cumulative learning, with performance improving across repeated tasks. These results highlight RoboMemory as a scalable foundation for memory-augmented embodied intelligence, bridging the gap between cognitive neuroscience and robotic autonomy.
Self-Supervised Representation Learning with Joint Embedding Predictive Architecture for Automotive LiDAR Object Detection
Recently, self-supervised representation learning relying on vast amounts of unlabeled data has been explored as a pre-training method for autonomous driving. However, directly applying popular contrastive or generative methods to this problem is insufficient and may even lead to negative transfer. In this paper, we present AD-L-JEPA, a novel self-supervised pre-training framework with a joint embedding predictive architecture (JEPA) for automotive LiDAR object detection. Unlike existing methods, AD-L-JEPA is neither generative nor contrastive. Instead of explicitly generating masked regions, our method predicts Bird's-Eye-View embeddings to capture the diverse nature of driving scenes. Furthermore, our approach eliminates the need to manually form contrastive pairs by employing explicit variance regularization to avoid representation collapse. Experimental results demonstrate consistent improvements on the LiDAR 3D object detection downstream task across the KITTI3D, Waymo, and ONCE datasets, while reducing GPU hours by 1.9x-2.7x and GPU memory by 2.8x-4x compared with the state-of-the-art method Occupancy-MAE. Notably, on the largest ONCE dataset, pre-training on 100K frames yields a 1.61 mAP gain, better than all other methods pre-trained on either 100K or 500K frames, and pre-training on 500K frames yields a 2.98 mAP gain, better than all other methods pre-trained on either 500K or 1M frames. AD-L-JEPA constitutes the first JEPA-based pre-training method for autonomous driving. It offers better quality, faster, and more GPU-memory-efficient self-supervised representation learning. The source code of AD-L-JEPA is ready to be released.
Multiagent Systems
Improved High-probability Convergence Guarantees of Decentralized SGD
Convergence in high-probability (HP) has been receiving increasing interest, due to its attractive properties, such as exponentially decaying tail bounds and strong guarantees for each individual run of an algorithm. While HP guarantees are extensively studied in centralized settings, much less is understood in the decentralized, networked setup. Existing HP studies in decentralized settings impose strong assumptions, like uniformly bounded gradients, or asymptotically vanishing noise, resulting in a significant gap between assumptions used to establish convergence in the HP and the mean-squared error (MSE) sense, even for vanilla Decentralized Stochastic Gradient Descent ($\mathtt{DSGD}$) algorithm. This is contrary to centralized settings, where it is known that $\mathtt{SGD}$ converges in HP under the same conditions on the cost function as needed to guarantee MSE convergence. Motivated by this observation, we revisit HP guarantees for $\mathtt{DSGD}$ in the presence of light-tailed noise. We show that $\mathtt{DSGD}$ converges in HP under the same conditions on the cost as in the MSE sense, removing uniformly bounded gradients and other restrictive assumptions, while simultaneously achieving order-optimal rates for both non-convex and strongly convex costs. Moreover, our improved analysis yields linear speed-up in the number of users, demonstrating that $\mathtt{DSGD}$ maintains strong performance in the HP sense and matches existing MSE guarantees. Our improved results stem from a careful analysis of the MGF of quantities of interest (norm-squared of gradient or optimality gap) and the MGF of the consensus gap between users' models. To achieve linear speed-up, we provide a novel result on the variance-reduction effect of decentralized methods in the HP sense and more fine-grained bounds on the MGF for strongly convex costs, which are both of independent interest.
comment: 39 pages
A Timed Obstruction Logic for Dynamic Game Models
Real-time cybersecurity and privacy applications require reliable verification methods and system design tools to ensure their correctness. Many of these reactive real-time applications embedded in various infrastructures, such as airports, hospitals, and oil pipelines, are potentially vulnerable to malicious cyber-attacks. Recently, a growing literature has recognized Timed Game Theory as a sound theoretical foundation for modeling strategic interactions between attackers and defenders. This paper proposes Timed Obstruction Logic (TOL), an extension of Obstruction Logic (OL), a formalism for verifying specific timed games with real-time objectives unfolding in dynamic models. These timed games involve players whose discrete and continuous actions can impact the underlying timed game model. We show that TOL can be used to describe important timed properties of real-time cybersecurity games. Finally, in addition to introducing our new logic and adapting it to specify properties in the context of cybersecurity, we provide a verification procedure for TOL and show that its complexity is PSPACE-complete, meaning that it is not higher than that of classical timed temporal logics like TCTL. Thus, we increase the expressiveness of properties without incurring any cost in terms of complexity.
Agent+P: Guiding UI Agents via Symbolic Planning
Large Language Model (LLM)-based UI agents show great promise for UI automation but often hallucinate in long-horizon tasks due to their lack of understanding of the global UI transition structure. To address this, we introduce AGENT+P, a novel framework that leverages symbolic planning to guide LLM-based UI agents. Specifically, we model an app's UI transition structure as a UI Transition Graph (UTG), which allows us to reformulate the UI automation task as a pathfinding problem on the UTG. This further enables an off-the-shelf symbolic planner to generate a provably correct and optimal high-level plan, preventing the agent from redundant exploration and guiding the agent to achieve the automation goals. AGENT+P is designed as a plug-and-play framework to enhance existing UI agents. Evaluation on the AndroidWorld benchmark demonstrates that AGENT+P improves the success rates of state-of-the-art UI agents by up to 14% and reduces the action steps by 37.7%.
Emergent Directedness in Social Contagion
An enduring challenge in contagion theory is that the pathways contagions follow through social networks exhibit emergent complexities that are difficult to predict using network structure. Here, we address this challenge by developing a causal modeling framework that (i) simulates the possible network pathways that emerge as contagions spread and (ii) identifies which edges and nodes are most impactful on diffusion across these possible pathways. This yields a surprising discovery. If people require exposure to multiple peers to adopt a contagion (a.k.a., 'complex contagions'), the pathways that emerge often only work in one direction. In fact, the more complex a contagion is, the more asymmetric its paths become. This emergent directedness problematizes canonical theories of how networks mediate contagion. Weak ties spanning network regions - widely thought to facilitate mutual influence and integration - prove to privilege the spread contagions from one community to the other. Emergent directedness also disproportionately channels complex contagions from the network periphery to the core, inverting standard centrality models. We demonstrate two practical applications. We show that emergent directedness accounts for unexplained nonlinearity in the effects of tie strength in a recent study of job diffusion over LinkedIn. Lastly, we show that network evolution is biased toward growing directed paths, but that cultural factors (e.g., triadic closure) can curtail this bias, with strategic implications for network building and behavioral interventions.
comment: 36 pages, 6 figures, plus 30-page appendix with 15 figures
RareAgent: Self-Evolving Reasoning for Drug Repurposing in Rare Diseases
Computational drug repurposing for rare diseases is especially challenging when no prior associations exist between drugs and target diseases. Therefore, knowledge graph completion and message-passing GNNs have little reliable signal to learn and propagate, resulting in poor performance. We present RareAgent, a self-evolving multi-agent system that reframes this task from passive pattern recognition to active evidence-seeking reasoning. RareAgent organizes task-specific adversarial debates in which agents dynamically construct evidence graphs from diverse perspectives to support, refute, or entail hypotheses. The reasoning strategies are analyzed post hoc in a self-evolutionary loop, producing textual feedback that refines agent policies, while successful reasoning paths are distilled into transferable heuristics to accelerate future investigations. Comprehensive evaluations reveal that RareAgent improves the indication AUPRC by 18.1% over reasoning baselines and provides a transparent reasoning chain consistent with clinical evidence.
Federated Split Learning for Resource-Constrained Robots in Industrial IoT: Framework Comparison, Optimization Strategies, and Future Directions
Federated split learning (FedSL) has emerged as a promising paradigm for enabling collaborative intelligence in industrial Internet of Things (IoT) systems, particularly in smart factories where data privacy, communication efficiency, and device heterogeneity are critical concerns. In this article, we present a comprehensive study of FedSL frameworks tailored for resource-constrained robots in industrial scenarios. We compare synchronous, asynchronous, hierarchical, and heterogeneous FedSL frameworks in terms of workflow, scalability, adaptability, and limitations under dynamic industrial conditions. Furthermore, we systematically categorize token fusion strategies into three paradigms: input-level (pre-fusion), intermediate-level (intra-fusion), and output-level (post-fusion), and summarize their respective strengths in industrial applications. We also provide adaptive optimization techniques to enhance the efficiency and feasibility of FedSL implementation, including model compression, split layer selection, computing frequency allocation, and wireless resource management. Simulation results validate the performance of these frameworks under industrial detection scenarios. Finally, we outline open issues and research directions of FedSL in future smart manufacturing systems.
comment: 9 pages, 5 figures, submitted to the IEEE magazine
Generative AI-Driven Hierarchical Multi-Agent Framework for Zero-Touch Optical Networks
The rapid development of Generative Artificial Intelligence (GenAI) has catalyzed a transformative technological revolution across all walks of life. As the backbone of wideband communication, optical networks are expecting high-level autonomous operation and zero-touch management to accommodate their expanding network scales and escalating transmission bandwidth. The integration of GenAI is deemed as the pivotal solution for realizing zero-touch optical networks. However, the lifecycle management of optical networks involves a multitude of tasks and necessitates seamless collaboration across multiple layers, which poses significant challenges to the existing single-agent GenAI systems. In this paper, we propose a GenAI-driven hierarchical multi-agent framework designed to streamline multi-task autonomous execution for zero-touch optical networks. We present the architecture, implementation, and applications of this framework. A field-deployed mesh network is utilized to demonstrate three typical scenarios throughout the lifecycle of optical network: quality of transmission estimation in the planning stage, dynamic channel adding/dropping in the operation stage, and system capacity increase in the upgrade stage. The case studies, illustrate the capabilities of multi-agent framework in multi-task allocation, coordination, execution, evaluation, and summarization. This work provides a promising approach for the future development of intelligent, efficient, and collaborative network management solutions, paving the way for more specialized and adaptive zero-touch optical networks.
comment: 7 pages,6 figures, Accepted by lEEE Communications Magazine, Open call
Decoupling Correctness from Policy: A Deterministic Causal Structure for Multi-Agent Systems
In distributed multi-agent systems, correctness is often entangled with operational policies such as scheduling, batching, or routing, which makes systems brittle since performance-driven policy evolution may break integrity guarantees. This paper introduces the Deterministic Causal Structure (DCS), a formal foundation that decouples correctness from policy. We develop a minimal axiomatic theory and prove four results: existence and uniqueness, policy-agnostic invariance, observational equivalence, and axiom minimality. These results show that DCS resolves causal ambiguities that value-centric convergence models such as CRDTs cannot address, and that removing any axiom collapses determinism into ambiguity. DCS thus emerges as a boundary principle of asynchronous computation, analogous to CAP and FLP: correctness is preserved only within the expressive power of a join-semilattice. All guarantees are established by axioms and proofs, with only minimal illustrative constructions included to aid intuition. This work establishes correctness as a fixed, policy-agnostic substrate, a Correctness-as-a-Chassis paradigm, on which distributed intelligent systems can be built modularly, safely, and evolvably.
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Outcome-driven reinforcement learning has advanced reasoning in large language models (LLMs), but prevailing tool-augmented approaches train a single, monolithic policy that interleaves thoughts and tool calls under full context; this scales poorly with long horizons and diverse tools and generalizes weakly to new scenarios. Agentic systems offer a promising alternative by decomposing work across specialized modules, yet most remain training-free or rely on offline training decoupled from the live dynamics of multi-turn interaction. We introduce AgentFlow, a trainable, in-the-flow agentic framework that coordinates four modules (planner, executor, verifier, generator) through an evolving memory and directly optimizes its planner inside the multi-turn loop. To train on-policy in live environments, we propose Flow-based Group Refined Policy Optimization (Flow-GRPO), which tackles long-horizon, sparse-reward credit assignment by converting multi-turn optimization into a sequence of tractable single-turn policy updates. It broadcasts a single, verifiable trajectory-level outcome to every turn to align local planner decisions with global success and stabilizes learning with group-normalized advantages. Across ten benchmarks, AgentFlow with a 7B-scale backbone outperforms top-performing baselines with average accuracy gains of 14.9% on search, 14.0% on agentic, 14.5% on mathematical, and 4.1% on scientific tasks, even surpassing larger proprietary models like GPT-4o. Further analyses confirm the benefits of in-the-flow optimization, showing improved planning, enhanced tool-calling reliability, and positive scaling with model size and reasoning turns.
comment: 45 pages, 12 figures. Project website: https://agentflow.stanford.edu/
R3R: Decentralized Multi-Agent Collision Avoidance with Infinite-Horizon Safety
Existing decentralized methods for multi-agent motion planning lack formal, infinite-horizon safety guarantees, especially for communication-constrained systems. We present R3R, to our knowledge the first decentralized and asynchronous framework for multi-agent motion planning under distance-based communication constraints with infinite-horizon safety guarantees for systems of nonlinear agents. R3R's novelty lies in combining our gatekeeper safety framework with a geometric constraint called R-Boundedness, which together establish a formal link between an agent's communication radius and its ability to plan safely. We constrain trajectories to within a fixed planning radius that is a function of the agent's communication radius, which enables trajectories to be shown provably safe for all time, using only local information. Our algorithm is fully asynchronous, and ensures the forward invariance of these guarantees even in time-varying networks where agents asynchronously join, leave, and replan. We validate our approach in simulations of up to 128 Dubins vehicles, demonstrating 100% safety in dense, obstacle rich scenarios. Our results demonstrate that R3R's performance scales with agent density rather than problem size, providing a practical solution for scalable and provably safe multi-agent systems.
comment: 8 pages, LaTeX; submitted to the American Control Conference (ACC) 2026
Three-dimensional Integrated Guidance and Control for Leader-Follower Flexible Formation of Fixed Wing UAVs
This paper presents a nonlinear integrated guidance and control (IGC) approach for flexible leader-follower formation flight of fixed-wing unmanned aerial vehicles (UAVs) while accounting for high-fidelity aerodynamics and thrust dynamics. Unlike conventional leader-follower schemes that fix the follower's position relative to the leader, the follower is steered to maintain range and bearing angles (which is the angle between its velocity vector and its line-of-sight (LOS) with respect to the leader) arbitrarily close to the prescribed values, enabling the follower to maintain formation on a hemispherical region behind the leader. The proposed IGC framework directly maps leader-follower relative range dynamics to throttle commands, and the follower's velocity orientation relative to the LOS to aerodynamic control surface deflections. This enables synergism between guidance and control subsystems. The control design uses a dynamic surface control-based backstepping approach to achieve convergence to the desired formation set, where Lyapunov barrier functions are incorporated to ensure the follower's bearing angle is constrained within specified bounds. Rigorous stability analysis guarantees uniform ultimate boundedness of all error states and strict constraint satisfaction in the presence of aerodynamic nonlinearities. The proposed flexible formation scheme allows the follower to have an orientation mismatch relative to the leader to execute anticipatory reconfiguration by transitioning between the relative positions in the admissible formation set when the leader aggressively maneuvers. The proposed IGC law relies only on relative information and onboard sensors without the information about the leader's maneuver, making it suitable for GPS-denied or non-cooperative scenarios. Finally, we present simulation results to vindicate the effectiveness and robustness of our approach.
Flexible Swarm Learning May Outpace Foundation Models in Essential Tasks
Foundation models have rapidly advanced AI, raising the question of whether their decisions will ultimately surpass human strategies in real-world domains. The exponential, and possibly super-exponential, pace of AI development makes such analysis elusive. Nevertheless, many application areas that matter for daily life and society show only modest gains so far; a prominent case is diagnosing and treating dynamically evolving disease in intensive care. The common challenge is adapting complex systems to dynamic environments. Effective strategies must optimize outcomes in systems composed of strongly interacting functions while avoiding shared side effects; this requires reliable, self-adaptive modeling. These tasks align with building digital twins of highly complex systems whose mechanisms are not fully or quantitatively understood. It is therefore essential to develop methods for self-adapting AI models with minimal data and limited mechanistic knowledge. As this challenge extends beyond medicine, AI should demonstrate clear superiority in these settings before assuming broader decision-making roles. We identify the curse of dimensionality as a fundamental barrier to efficient self-adaptation and argue that monolithic foundation models face conceptual limits in overcoming it. As an alternative, we propose a decentralized architecture of interacting small agent networks (SANs). We focus on agents representing the specialized substructure of the system, where each agent covers only a subset of the full system functions. Drawing on mathematical results on the learning behavior of SANs and evidence from existing applications, we argue that swarm-learning in diverse swarms can enable self-adaptive SANs to deliver superior decision-making in dynamic environments compared with monolithic foundation models, though at the cost of reduced reproducibility in detail.
GRPO-GCC: Enhancing Cooperation in Spatial Public Goods Games via Group Relative Policy Optimization with Global Cooperation Constraint
Inspired by the principle of self-regulating cooperation in collective institutions, we propose the Group Relative Policy Optimization with Global Cooperation Constraint (GRPO-GCC) framework. This work is the first to introduce GRPO into spatial public goods games, establishing a new deep reinforcement learning baseline for structured populations. GRPO-GCC integrates group relative policy optimization with a global cooperation constraint that strengthens incentives at intermediate cooperation levels while weakening them at extremes. This mechanism aligns local decision making with sustainable collective outcomes and prevents collapse into either universal defection or unconditional cooperation. The framework advances beyond existing approaches by combining group-normalized advantage estimation, a reference-anchored KL penalty, and a global incentive term that dynamically adjusts cooperative payoffs. As a result, it achieves accelerated cooperation onset, stabilized policy adaptation, and long-term sustainability. GRPO-GCC demonstrates how a simple yet global signal can reshape incentives toward resilient cooperation, and provides a new paradigm for multi-agent reinforcement learning in socio-technical systems.
Mnemosyne: An Unsupervised, Human-Inspired Long-Term Memory Architecture for Edge-Based LLMs
Long-term memory is essential for natural, realistic dialogue. However, current large language model (LLM) memory systems rely on either brute-force context expansion or static retrieval pipelines that fail on edge-constrained devices. We introduce Mnemosyne, an unsupervised, human-inspired long-term memory architecture designed for edge-based LLMs. Our approach uses graph-structured storage, modular substance and redundancy filters, memory committing and pruning mechanisms, and probabilistic recall with temporal decay and refresh processes modeled after human memory. Mnemosyne also introduces a concentrated "core summary" efficiently derived from a fixed-length subset of the memory graph to capture the user's personality and other domain-specific long-term details such as, using healthcare application as an example, post-recovery ambitions and attitude towards care. Unlike existing retrieval-augmented methods, Mnemosyne is designed for use in longitudinal healthcare assistants, where repetitive and semantically similar but temporally distinct conversations are limited by naive retrieval. In experiments with longitudinal healthcare dialogues, Mnemosyne demonstrates the highest win rate of 65.8% in blind human evaluations of realism and long-term memory capability compared to a baseline RAG win rate of 31.1%. Mnemosyne also achieves current highest LoCoMo benchmark scores in temporal reasoning and single-hop retrieval compared to other same-backboned techniques. Further, the average overall score of 54.6% was second highest across all methods, beating commonly used Mem0 and OpenAI baselines among others. This demonstrates that improved factual recall, enhanced temporal reasoning, and much more natural user-facing responses can be feasible with an edge-compatible and easily transferable unsupervised memory architecture.
comment: 12 pages, 4 figures
QLLM: Do We Really Need a Mixing Network for Credit Assignment in Multi-Agent Reinforcement Learning?
Credit assignment has remained a fundamental challenge in multi-agent reinforcement learning (MARL). Previous studies have primarily addressed this issue through value decomposition methods under the centralized training with decentralized execution paradigm, where neural networks are utilized to approximate the nonlinear relationship between individual Q-values and the global Q-value. Although these approaches have achieved considerable success in various benchmark tasks, they still suffer from several limitations, including imprecise attribution of contributions, limited interpretability, and poor scalability in high-dimensional state spaces. To address these challenges, we propose a novel algorithm, \textbf{QLLM}, which facilitates the automatic construction of credit assignment functions using large language models (LLMs). Specifically, the concept of \textbf{TFCAF} is introduced, wherein the credit allocation process is represented as a direct and expressive nonlinear functional formulation. A custom-designed \textit{coder-evaluator} framework is further employed to guide the generation, verification, and refinement of executable code by LLMs, significantly mitigating issues such as hallucination and shallow reasoning during inference. Extensive experiments conducted on several standard MARL benchmarks demonstrate that the proposed method consistently outperforms existing state-of-the-art baselines. Moreover, QLLM exhibits strong generalization capability and maintains compatibility with a wide range of MARL algorithms that utilize mixing networks, positioning it as a promising and versatile solution for complex multi-agent scenarios.
comment: We are withdrawing this manuscript due to experimental errors and mistakes in data preprocessing. These issues materially affect the results and could mislead subsequent studies
Decentralized Collective World Model for Emergent Communication and Coordination
We propose a fully decentralized multi-agent world model that enables both symbol emergence for communication and coordinated behavior through temporal extension of collective predictive coding. Unlike previous research that focuses on either communication or coordination separately, our approach achieves both simultaneously. Our method integrates world models with communication channels, enabling agents to predict environmental dynamics, estimate states from partial observations, and share critical information through bidirectional message exchange with contrastive learning for message alignment. Using a two-agent trajectory drawing task, we demonstrate that our communication-based approach outperforms non-communicative models when agents have divergent perceptual capabilities, achieving the second-best coordination after centralized models. Importantly, our decentralized approach with constraints preventing direct access to other agents' internal states facilitates the emergence of more meaningful symbol systems that accurately reflect environmental states. These findings demonstrate the effectiveness of decentralized communication for supporting coordination while developing shared representations of the environment.
comment: Accepted at IEEE ICDL 2025
Dynamic Strategy Adaptation in Multi-Agent Environments with Large Language Models
Large language models (LLMs) demonstrate strong reasoning abilities across mathematical, strategic, and linguistic tasks, yet little is known about how well they reason in dynamic, real-time, multi-agent scenarios, such as collaborative environments in which agents continuously adapt to each other's behavior, as in cooperative gameplay settings. In this paper, we bridge this gap by combining LLM-driven agents with strategic reasoning and real-time adaptation in cooperative, multi-agent environments grounded in game-theoretic principles such as belief consistency and Nash equilibrium. The proposed framework applies broadly to dynamic scenarios in which agents coordinate, communicate, and make decisions in response to continuously changing conditions. We provide real-time strategy refinement and adaptive feedback mechanisms that enable agents to dynamically adjust policies based on immediate contextual interactions, in contrast to previous efforts that evaluate LLM capabilities in static or turn-based settings. Empirical results show that our method achieves up to a 26\% improvement in return over PPO baselines in high-noise environments, while maintaining real-time latency under 1.05 milliseconds. Our approach improves collaboration efficiency, task completion rates, and flexibility, illustrating that game-theoretic guidance integrated with real-time feedback enhances LLM performance, ultimately fostering more resilient and flexible strategic multi-agent systems.
Generalizing Liquid Democracy to multi-agent delegation: A Voting Weight Measure and Equilibrium Analysis
In this study, we propose a generalization of the classic model of liquid democracy that allows fractional delegation of voting weight, while simultaneously allowing for the existence of equilibrium states. Our approach empowers agents to partition and delegate their votes to multiple representatives, all while retaining a fraction of the voting power for themselves. We introduce a penalty mechanism for the length of delegation chains. We discuss the desirable properties of a reasonable generalization of the classic model, and prove that smaller penalty factors bring the model closer to satisfying these properties. In the subsequent section, we explore the presence of equilibrium states in a general delegation game utilizing the proposed voting measure. In contrast to the classical model, we demonstrate that this game exhibits pure strategy Nash equilibria, contingent upon the imposition of a penalty on the length of delegation chains.
KramaBench: A Benchmark for AI Systems on Data-to-Insight Pipelines over Data Lakes
Constructing real-world data-to-insight pipelines often involves data extraction from data lakes, data integration across heterogeneous data sources, and diverse operations from data cleaning to analysis. The design and implementation of data science pipelines require domain knowledge, technical expertise, and even project-specific insights. AI systems have shown remarkable reasoning, coding, and understanding capabilities. However, it remains unclear to what extent these capabilities translate into successful design and execution of such complex pipelines. We introduce KRAMABENCH: a benchmark composed of 104 manually-curated real-world data science pipelines spanning 1700 data files from 24 data sources in 6 different domains. We show that these pipelines test the end-to-end capabilities of AI systems on data processing, requiring data discovery, wrangling and cleaning, efficient processing, statistical reasoning, and orchestrating data processing steps given a high-level task. Our evaluation tests 5 general models and 3 code generation models using our reference framework, DS-GURU, which instructs the AI model to decompose a question into a sequence of subtasks, reason through each step, and synthesize Python code that implements the proposed design. Our results on KRAMABENCH show that, although the models are sufficiently capable of solving well-specified data science code generation tasks, when extensive data processing and domain knowledge are required to construct real-world data science pipelines, existing out-of-box models fall short. Progress on KramaBench represents crucial steps towards developing autonomous data science agents for real-world applications. Our code, reference framework, and data are available at https://github.com/mitdbg/KramaBench.
Robotics
ResMimic: From General Motion Tracking to Humanoid Whole-body Loco-Manipulation via Residual Learning
Humanoid whole-body loco-manipulation promises transformative capabilities for daily service and warehouse tasks. While recent advances in general motion tracking (GMT) have enabled humanoids to reproduce diverse human motions, these policies lack the precision and object awareness required for loco-manipulation. To this end, we introduce ResMimic, a two-stage residual learning framework for precise and expressive humanoid control from human motion data. First, a GMT policy, trained on large-scale human-only motion, serves as a task-agnostic base for generating human-like whole-body movements. An efficient but precise residual policy is then learned to refine the GMT outputs to improve locomotion and incorporate object interaction. To further facilitate efficient training, we design (i) a point-cloud-based object tracking reward for smoother optimization, (ii) a contact reward that encourages accurate humanoid body-object interactions, and (iii) a curriculum-based virtual object controller to stabilize early training. We evaluate ResMimic in both simulation and on a real Unitree G1 humanoid. Results show substantial gains in task success, training efficiency, and robustness over strong baselines. Videos are available at https://resmimic.github.io/ .
comment: 9 pages, 8 figures
Automaton Constrained Q-Learning NeurIPS 2025
Real-world robotic tasks often require agents to achieve sequences of goals while respecting time-varying safety constraints. However, standard Reinforcement Learning (RL) paradigms are fundamentally limited in these settings. A natural approach to these problems is to combine RL with Linear-time Temporal Logic (LTL), a formal language for specifying complex, temporally extended tasks and safety constraints. Yet, existing RL methods for LTL objectives exhibit poor empirical performance in complex and continuous environments. As a result, no scalable methods support both temporally ordered goals and safety simultaneously, making them ill-suited for realistic robotics scenarios. We propose Automaton Constrained Q-Learning (ACQL), an algorithm that addresses this gap by combining goal-conditioned value learning with automaton-guided reinforcement. ACQL supports most LTL task specifications and leverages their automaton representation to explicitly encode stage-wise goal progression and both stationary and non-stationary safety constraints. We show that ACQL outperforms existing methods across a range of continuous control tasks, including cases where prior methods fail to satisfy either goal-reaching or safety constraints. We further validate its real-world applicability by deploying ACQL on a 6-DOF robotic arm performing a goal-reaching task in a cluttered, cabinet-like space with safety constraints. Our results demonstrate that ACQL is a robust and scalable solution for learning robotic behaviors according to rich temporal specifications.
comment: 9 pages, 4 figures, 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
StaMo: Unsupervised Learning of Generalizable Robot Motion from Compact State Representation
A fundamental challenge in embodied intelligence is developing expressive and compact state representations for efficient world modeling and decision making. However, existing methods often fail to achieve this balance, yielding representations that are either overly redundant or lacking in task-critical information. We propose an unsupervised approach that learns a highly compressed two-token state representation using a lightweight encoder and a pre-trained Diffusion Transformer (DiT) decoder, capitalizing on its strong generative prior. Our representation is efficient, interpretable, and integrates seamlessly into existing VLA-based models, improving performance by 14.3% on LIBERO and 30% in real-world task success with minimal inference overhead. More importantly, we find that the difference between these tokens, obtained via latent interpolation, naturally serves as a highly effective latent action, which can be further decoded into executable robot actions. This emergent capability reveals that our representation captures structured dynamics without explicit supervision. We name our method StaMo for its ability to learn generalizable robotic Motion from compact State representation, which is encoded from static images, challenging the prevalent dependence to learning latent action on complex architectures and video data. The resulting latent actions also enhance policy co-training, outperforming prior methods by 10.4% with improved interpretability. Moreover, our approach scales effectively across diverse data sources, including real-world robot data, simulation, and human egocentric video.
Walking, Rolling, and Beyond: First-Principles and RL Locomotion on a TARS-Inspired Robot
Robotic locomotion research typically draws from biologically inspired leg designs, yet many human-engineered settings can benefit from non-anthropomorphic forms. TARS3D translates the block-shaped 'TARS' robot from Interstellar into a 0.25 m, 0.99 kg research platform with seven actuated degrees of freedom. The film shows two primary gaits: a bipedal-like walk and a high-speed rolling mode. For TARS3D, we build reduced-order models for each, derive closed-form limit-cycle conditions, and validate the predictions on hardware. Experiments confirm that the robot respects its +/-150 degree hip limits, alternates left-right contacts without interference, and maintains an eight-step hybrid limit cycle in rolling mode. Because each telescopic leg provides four contact corners, the rolling gait is modeled as an eight-spoke double rimless wheel. The robot's telescopic leg redundancy implies a far richer gait repertoire than the two limit cycles treated analytically. So, we used deep reinforcement learning (DRL) in simulation to search the unexplored space. We observed that the learned policy can recover the analytic gaits under the right priors and discover novel behaviors as well. Our findings show that TARS3D's fiction-inspired bio-transcending morphology can realize multiple previously unexplored locomotion modes and that further learning-driven search is likely to reveal more. This combination of analytic synthesis and reinforcement learning opens a promising pathway for multimodal robotics.
comment: 6 pages, 10 figures. Presented at IEEE-RAS International Conference on Humanoid Robots (Humanoids) 2025
Efficient Navigation in Unknown Indoor Environments with Vision-Language Models
We present a novel high-level planning framework that leverages vision-language models (VLMs) to improve autonomous navigation in unknown indoor environments with many dead ends. Traditional exploration methods often take inefficient routes due to limited global reasoning and reliance on local heuristics. In contrast, our approach enables a VLM to reason directly about an occupancy map in a zero-shot manner, selecting subgoals that are likely to lead to more efficient paths. At each planning step, we convert a 3D occupancy grid into a partial 2D map of the environment, and generate candidate subgoals. Each subgoal is then evaluated and ranked against other candidates by the model. We integrate this planning scheme into DYNUS \cite{kondo2025dynus}, a state-of-the-art trajectory planner, and demonstrate improved navigation efficiency in simulation. The VLM infers structural patterns (e.g., rooms, corridors) from incomplete maps and balances the need to make progress toward a goal against the risk of entering unknown space. This reduces common greedy failures (e.g., detouring into small rooms) and achieves about 10\% shorter paths on average.
comment: 8 pages, 4 figures
HyperVLA: Efficient Inference in Vision-Language-Action Models via Hypernetworks
Built upon language and vision foundation models with strong generalization ability and trained on large-scale robotic data, Vision-Language-Action (VLA) models have recently emerged as a promising approach to learning generalist robotic policies. However, a key drawback of existing VLAs is their extremely high inference costs. In this paper, we propose HyperVLA to address this problem. Unlike existing monolithic VLAs that activate the whole model during both training and inference, HyperVLA uses a novel hypernetwork (HN)-based architecture that activates only a small task-specific policy during inference, while still retaining the high model capacity needed to accommodate diverse multi-task behaviors during training. Successfully training an HN-based VLA is nontrivial so HyperVLA contains several key algorithm design features that improve its performance, including properly utilizing the prior knowledge from existing vision foundation models, HN normalization, and an action generation strategy. Compared to monolithic VLAs, HyperVLA achieves a similar or even higher success rate for both zero-shot generalization and few-shot adaptation, while significantly reducing inference costs. Compared to OpenVLA, a state-of-the-art VLA model, HyperVLA reduces the number of activated parameters at test time by $90\times$, and accelerates inference speed by $120\times$. Code is publicly available at https://github.com/MasterXiong/HyperVLA
CLEAR-IR: Clarity-Enhanced Active Reconstruction of Infrared Imagery
This paper presents a novel approach for enabling robust robotic perception in dark environments using infrared (IR) stream. IR stream is less susceptible to noise than RGB in low-light conditions. However, it is dominated by active emitter patterns that hinder high-level tasks such as object detection, tracking and localisation. To address this, a U-Net-based architecture is proposed that reconstructs clean IR images from emitter-populated input, improving both image quality and downstream robotic performance. This approach outperforms existing enhancement techniques and enables reliable operation of vision-driven robotic systems across illumination conditions from well-lit to extreme low-light scenes.
comment: 8 pages, 8 figures
TAG-K: Tail-Averaged Greedy Kaczmarz for Computationally Efficient and Performant Online Inertial Parameter Estimation
Accurate online inertial parameter estimation is essential for adaptive robotic control, enabling real-time adjustment to payload changes, environmental interactions, and system wear. Traditional methods such as Recursive Least Squares (RLS) and the Kalman Filter (KF) often struggle to track abrupt parameter shifts or incur high computational costs, limiting their effectiveness in dynamic environments and for computationally constrained robotic systems. As such, we introduce TAG-K, a lightweight extension of the Kaczmarz method that combines greedy randomized row selection for rapid convergence with tail averaging for robustness under noise and inconsistency. This design enables fast, stable parameter adaptation while retaining the low per-iteration complexity inherent to the Kaczmarz framework. We evaluate TAG-K in synthetic benchmarks and quadrotor tracking tasks against RLS, KF, and other Kaczmarz variants. TAG-K achieves 1.5x-1.9x faster solve times on laptop-class CPUs and 4.8x-20.7x faster solve times on embedded microcontrollers. More importantly, these speedups are paired with improved resilience to measurement noise and a 25% reduction in estimation error, leading to nearly 2x better end-to-end tracking performance.
Efficient Probabilistic Planning with Maximum-Coverage Distributionally Robust Backward Reachable Trees
This paper presents a new multi-query motion planning algorithm for linear Gaussian systems with the goal of reaching a Euclidean ball with high probability. We develop a new formulation for ball-shaped ambiguity sets of Gaussian distributions and leverage it to develop a distributionally robust belief roadmap construction algorithm. This algorithm synthe- sizes robust controllers which are certified to be safe for maximal size ball-shaped ambiguity sets of Gaussian distributions. Our algorithm achieves better coverage than the maximal coverage algorithm for planning over Gaussian distributions [1], and we identify mild conditions under which our algorithm achieves strictly better coverage. For the special case of no process noise or state constraints, we formally prove that our algorithm achieves maximal coverage. In addition, we present a second multi-query motion planning algorithm for linear Gaussian systems with the goal of reaching a region parameterized by the Minkowski sum of an ellipsoid and a Euclidean ball with high probability. This algorithm plans over ellipsoidal sets of maximal size ball-shaped ambiguity sets of Gaussian distributions, and provably achieves equal or better coverage than the best-known algorithm for planning over ellipsoidal ambiguity sets of Gaussian distributions [2]. We demonstrate the efficacy of both methods in a wide range of conditions via extensive simulation experiments.
Online automatic code generation for robot swarms: LLMs and self-organizing hierarchy
Our recently introduced self-organizing nervous system (SoNS) provides robot swarms with 1) ease of behavior design and 2) global estimation of the swarm configuration and its collective environment, facilitating the implementation of online automatic code generation for robot swarms. In a demonstration with 6 real robots and simulation trials with >30 robots, we show that when a SoNS-enhanced robot swarm gets stuck, it can automatically solicit and run code generated by an external LLM on the fly, completing its mission with an 85% success rate.
Performance-guided Task-specific Optimization for Multirotor Design
This paper introduces a methodology for task-specific design optimization of multirotor Micro Aerial Vehicles. By leveraging reinforcement learning, Bayesian optimization, and covariance matrix adaptation evolution strategy, we optimize aerial robot designs guided exclusively by their closed-loop performance in a considered task. Our approach systematically explores the design space of motor pose configurations while ensuring manufacturability constraints and minimal aerodynamic interference. Results demonstrate that optimized designs achieve superior performance compared to conventional multirotor configurations in agile waypoint navigation tasks, including against fully actuated designs from the literature. We build and test one of the optimized designs in the real world to validate the sim2real transferability of our approach.
Building Gradient by Gradient: Decentralised Energy Functions for Bimanual Robot Assembly
There are many challenges in bimanual assembly, including high-level sequencing, multi-robot coordination, and low-level, contact-rich operations such as component mating. Task and motion planning (TAMP) methods, while effective in this domain, may be prohibitively slow to converge when adapting to disturbances that require new task sequencing and optimisation. These events are common during tight-tolerance assembly, where difficult-to-model dynamics such as friction or deformation require rapid replanning and reattempts. Moreover, defining explicit task sequences for assembly can be cumbersome, limiting flexibility when task replanning is required. To simplify this planning, we introduce a decentralised gradient-based framework that uses a piecewise continuous energy function through the automatic composition of adaptive potential functions. This approach generates sub-goals using only myopic optimisation, rather than long-horizon planning. It demonstrates effectiveness at solving long-horizon tasks due to the structure and adaptivity of the energy function. We show that our approach scales to physical bimanual assembly tasks for constructing tight-tolerance assemblies. In these experiments, we discover that our gradient-based rapid replanning framework generates automatic retries, coordinated motions and autonomous handovers in an emergent fashion.
comment: 8 pages, 6 figures, 1 table
Bio-Inspired Robotic Houbara: From Development to Field Deployment for Behavioral Studies
Biomimetic intelligence and robotics are transforming field ecology by enabling lifelike robotic surrogates that interact naturally with animals under real world conditions. Studying avian behavior in the wild remains challenging due to the need for highly realistic morphology, durable outdoor operation, and intelligent perception that can adapt to uncontrolled environments. We present a next generation bio inspired robotic platform that replicates the morphology and visual appearance of the female Houbara bustard to support controlled ethological studies and conservation oriented field research. The system introduces a fully digitally replicable fabrication workflow that combines high resolution structured light 3D scanning, parametric CAD modelling, articulated 3D printing, and photorealistic UV textured vinyl finishing to achieve anatomically accurate and durable robotic surrogates. A six wheeled rocker bogie chassis ensures stable mobility on sand and irregular terrain, while an embedded NVIDIA Jetson module enables real time RGB and thermal perception, lightweight YOLO based detection, and an autonomous visual servoing loop that aligns the robot's head toward detected targets without human intervention. A lightweight thermal visible fusion module enhances perception in low light conditions. Field trials in desert aviaries demonstrated reliable real time operation at 15 to 22 FPS with latency under 100 ms and confirmed that the platform elicits natural recognition and interactive responses from live Houbara bustards under harsh outdoor conditions. This integrated framework advances biomimetic field robotics by uniting reproducible digital fabrication, embodied visual intelligence, and ecological validation, providing a transferable blueprint for animal robot interaction research, conservation robotics, and public engagement.
Learning a Shape-adaptive Assist-as-needed Rehabilitation Policy from Therapist-informed Input
Therapist-in-the-loop robotic rehabilitation has shown great promise in enhancing rehabilitation outcomes by integrating the strengths of therapists and robotic systems. However, its broader adoption remains limited due to insufficient safe interaction and limited adaptation capability. This article proposes a novel telerobotics-mediated framework that enables therapists to intuitively and safely deliver assist-as-needed~(AAN) therapy based on two primary contributions. First, our framework encodes the therapist-informed corrective force into via-points in a latent space, allowing the therapist to provide only minimal assistance while encouraging patient maintaining own motion preferences. Second, a shape-adaptive ANN rehabilitation policy is learned to partially and progressively deform the reference trajectory for movement therapy based on encoded patient motion preferences and therapist-informed via-points. The effectiveness of the proposed shape-adaptive AAN strategy was validated on a telerobotic rehabilitation system using two representative tasks. The results demonstrate its practicality for remote AAN therapy and its superiority over two state-of-the-art methods in reducing corrective force and improving movement smoothness.
OKVIS2-X: Open Keyframe-based Visual-Inertial SLAM Configurable with Dense Depth or LiDAR, and GNSS
To empower mobile robots with usable maps as well as highest state estimation accuracy and robustness, we present OKVIS2-X: a state-of-the-art multi-sensor Simultaneous Localization and Mapping (SLAM) system building dense volumetric occupancy maps, while scalable to large environments and operating in realtime. Our unified SLAM framework seamlessly integrates different sensor modalities: visual, inertial, measured or learned depth, LiDAR and Global Navigation Satellite System (GNSS) measurements. Unlike most state-of-the-art SLAM systems, we advocate using dense volumetric map representations when leveraging depth or range-sensing capabilities. We employ an efficient submapping strategy that allows our system to scale to large environments, showcased in sequences of up to 9 kilometers. OKVIS2-X enhances its accuracy and robustness by tightly-coupling the estimator and submaps through map alignment factors. Our system provides globally consistent maps, directly usable for autonomous navigation. To further improve the accuracy of OKVIS2-X, we also incorporate the option of performing online calibration of camera extrinsics. Our system achieves the highest trajectory accuracy in EuRoC against state-of-the-art alternatives, outperforms all competitors in the Hilti22 VI-only benchmark, while also proving competitive in the LiDAR version, and showcases state of the art accuracy in the diverse and large-scale sequences from the VBR dataset.
comment: IEEE Transactions on Robotics (T-RO) - Special Issue: Visual SLAM
MobRT: A Digital Twin-Based Framework for Scalable Learning in Mobile Manipulation
Recent advances in robotics have been largely driven by imitation learning, which depends critically on large-scale, high-quality demonstration data. However, collecting such data remains a significant challenge-particularly for mobile manipulators, which must coordinate base locomotion and arm manipulation in high-dimensional, dynamic, and partially observable environments. Consequently, most existing research remains focused on simpler tabletop scenarios, leaving mobile manipulation relatively underexplored. To bridge this gap, we present \textit{MobRT}, a digital twin-based framework designed to simulate two primary categories of complex, whole-body tasks: interaction with articulated objects (e.g., opening doors and drawers) and mobile-base pick-and-place operations. \textit{MobRT} autonomously generates diverse and realistic demonstrations through the integration of virtual kinematic control and whole-body motion planning, enabling coherent and physically consistent execution. We evaluate the quality of \textit{MobRT}-generated data across multiple baseline algorithms, establishing a comprehensive benchmark and demonstrating a strong correlation between task success and the number of generated trajectories. Experiments integrating both simulated and real-world demonstrations confirm that our approach markedly improves policy generalization and performance, achieving robust results in both simulated and real-world environments.
Everything-Grasping (EG) Gripper: A Universal Gripper with Synergistic Suction-Grasping Capabilities for Cross-Scale and Cross-State Manipulation
Grasping objects across vastly different sizes and physical states-including both solids and liquids-with a single robotic gripper remains a fundamental challenge in soft robotics. We present the Everything-Grasping (EG) Gripper, a soft end-effector that synergistically integrates distributed surface suction with internal granular jamming, enabling cross-scale and cross-state manipulation without requiring airtight sealing at the contact interface with target objects. The EG Gripper can handle objects with surface areas ranging from sub-millimeter scale 0.2 mm2 (glass bead) to over 62,000 mm2 (A4 sized paper and woven bag), enabling manipulation of objects nearly 3,500X smaller and 88X larger than its own contact area (approximated at 707 mm2 for a 30 mm-diameter base). We further introduce a tactile sensing framework that combines liquid detection and pressure-based suction feedback, enabling real-time differentiation between solid and liquid targets. Guided by the actile-Inferred Grasping Mode Selection (TIGMS) algorithm, the gripper autonomously selects grasping modes based on distributed pressure and voltage signals. Experiments across diverse tasks-including underwater grasping, fragile object handling, and liquid capture-demonstrate robust and repeatable performance. To our knowledge, this is the first soft gripper to reliably grasp both solid and liquid objects across scales using a unified compliant architecture.
comment: 19 pages, 10 figures, journal
More Than Meets the Eye? Uncovering the Reasoning-Planning Disconnect in Training Vision-Language Driving Models
Vision-Language Model (VLM) driving agents promise explainable end-to-end autonomy by first producing natural-language reasoning and then predicting trajectory planning. However, whether planning is causally driven by this reasoning remains a critical but unverified assumption. To investigate this, we build DriveMind, a large-scale driving Visual Question Answering (VQA) corpus with plan-aligned Chain-of-Thought (CoT), automatically generated from nuPlan. Our data generation process converts sensors and annotations into structured inputs and, crucially, separates priors from to-be-reasoned signals, enabling clean information ablations. Using DriveMind, we train representative VLM agents with Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO) and evaluate them with nuPlan's metrics. Our results, unfortunately, indicate a consistent causal disconnect in reasoning-planning: removing ego/navigation priors causes large drops in planning scores, whereas removing CoT produces only minor changes. Attention analysis further shows that planning primarily focuses on priors rather than the CoT. Based on this evidence, we propose the Reasoning-Planning Decoupling Hypothesis, positing that the training-yielded reasoning is an ancillary byproduct rather than a causal mediator. To enable efficient diagnosis, we also introduce a novel, training-free probe that measures an agent's reliance on priors by evaluating its planning robustness against minor input perturbations. In summary, we provide the community with a new dataset and a diagnostic tool to evaluate the causal fidelity of future models.
comment: The dataset will be released publicly once the paper is accepted for publication
Velocity-Form Data-Enabled Predictive Control of Soft Robots under Unknown External Payloads
Data-driven control methods such as data-enabled predictive control (DeePC) have shown strong potential in efficient control of soft robots without explicit parametric models. However, in object manipulation tasks, unknown external payloads and disturbances can significantly alter the system dynamics and behavior, leading to offset error and degraded control performance. In this paper, we present a novel velocity-form DeePC framework that achieves robust and optimal control of soft robots under unknown payloads. The proposed framework leverages input-output data in an incremental representation to mitigate performance degradation induced by unknown payloads, eliminating the need for weighted datasets or disturbance estimators. We validate the method experimentally on a planar soft robot and demonstrate its superior performance compared to standard DeePC in scenarios involving unknown payloads.
PAD-TRO: Projection-Augmented Diffusion for Direct Trajectory Optimization
Recently, diffusion models have gained popularity and attention in trajectory optimization due to their capability of modeling multi-modal probability distributions. However, addressing nonlinear equality constraints, i.e, dynamic feasi- bility, remains a great challenge in diffusion-based trajectory optimization. Recent diffusion-based trajectory optimization frameworks rely on a single-shooting style approach where the denoised control sequence is applied to forward propagate the dynamical system, which cannot explicitly enforce constraints on the states and frequently leads to sub-optimal solutions. In this work, we propose a novel direct trajectory optimization approach via model-based diffusion, which directly generates a sequence of states. To ensure dynamic feasibility, we propose a gradient-free projection mechanism that is incorporated into the reverse diffusion process. Our results show that, compared to a recent state-of-the-art baseline, our approach leads to zero dynamic feasibility error and approximately 4x higher success rate in a quadrotor waypoint navigation scenario involving dense static obstacles.
AD-NODE: Adaptive Dynamics Learning with Neural ODEs for Mobile Robots Control
Mobile robots, such as ground vehicles and quadrotors, are becoming increasingly important in various fields, from logistics to agriculture, where they automate processes in environments that are difficult to access for humans. However, to perform effectively in uncertain environments using model-based controllers, these systems require dynamics models capable of responding to environmental variations, especially when direct access to environmental information is limited. To enable such adaptivity and facilitate integration with model predictive control, we propose an adaptive dynamics model which bypasses the need for direct environmental knowledge by inferring operational environments from state-action history. The dynamics model is based on neural ordinary equations, and a two-phase training procedure is used to learn latent environment representations. We demonstrate the effectiveness of our approach through goal-reaching and path-tracking tasks on three robotic platforms of increasing complexity: a 2D differential wheeled robot with changing wheel contact conditions, a 3D quadrotor in variational wind fields, and the Sphero BOLT robot under two contact conditions for real-world deployment. Empirical results corroborate that our method can handle temporally and spatially varying environmental changes in both simulation and real-world systems.
Safety-Critical Control with Bounded Inputs: A Closed-Form Solution for Backup Control Barrier Functions
Verifying the safety of controllers is critical for many applications, but is especially challenging for systems with bounded inputs. Backup control barrier functions (bCBFs) offer a structured approach to synthesizing safe controllers that are guaranteed to satisfy input bounds by leveraging the knowledge of a backup controller. While powerful, bCBFs require solving a high-dimensional quadratic program at run-time, which may be too costly for computationally-constrained systems such as aerospace vehicles. We propose an approach that optimally interpolates between a nominal controller and the backup controller, and we derive the solution to this optimization problem in closed form. We prove that this closed-form controller is guaranteed to be safe while obeying input bounds. We demonstrate the effectiveness of the approach on a double integrator and a nonlinear fixed-wing aircraft example.
comment: 8 pages, 6 figures. Code available at https://github.com/davidvwijk/OI-CBF
Active Semantic Perception
We develop an approach for active semantic perception which refers to using the semantics of the scene for tasks such as exploration. We build a compact, hierarchical multi-layer scene graph that can represent large, complex indoor environments at various levels of abstraction, e.g., nodes corresponding to rooms, objects, walls, windows etc. as well as fine-grained details of their geometry. We develop a procedure based on large language models (LLMs) to sample plausible scene graphs of unobserved regions that are consistent with partial observations of the scene. These samples are used to compute an information gain of a potential waypoint for sophisticated spatial reasoning, e.g., the two doors in the living room can lead to either a kitchen or a bedroom. We evaluate this approach in complex, realistic 3D indoor environments in simulation. We show using qualitative and quantitative experiments that our approach can pin down the semantics of the environment quicker and more accurately than baseline approaches.
Towards Online Robot Interaction Adaptation to Human Upper-limb Mobility Impairments in Return-to-Work Scenarios
Work environments are often inadequate and lack inclusivity for individuals with upper-body disabilities. This paper presents a novel online framework for adaptive human-robot interaction (HRI) that accommodates users' arm mobility impairments, ultimately aiming to promote active work participation. Unlike traditional human-robot collaboration approaches that assume able-bodied users, our method integrates a mobility model for specific joint limitations into a hierarchical optimal controller. This allows the robot to generate reactive, mobility-aware behaviour online and guides the user's impaired limb to exploit residual functional mobility. The framework was tested in handover tasks involving different upper-limb mobility impairments (i.e., emulated elbow and shoulder arthritis, and wrist blockage), under both standing and seated configurations with task constraints using a mobile manipulator, and complemented by quantitative and qualitative comparisons with state-of-the-art ergonomic HRI approaches. Preliminary results indicated that the framework can personalise the interaction to fit within the user's impaired range of motion and encourage joint usage based on the severity of their functional limitations.
A multi-modal tactile fingertip design for robotic hands to enhance dexterous manipulation
Tactile sensing holds great promise for enhancing manipulation precision and versatility, but its adoption in robotic hands remains limited due to high sensor costs, manufacturing and integration challenges, and difficulties in extracting expressive and reliable information from signals. In this work, we present a low-cost, easy-to-make, adaptable, and compact fingertip design for robotic hands that integrates multi-modal tactile sensors. We use strain gauge sensors to capture static forces and a contact microphone sensor to measure high-frequency vibrations during contact. These tactile sensors are integrated into a compact design with a minimal sensor footprint, and all sensors are internal to the fingertip and therefore not susceptible to direct wear and tear from interactions. From sensor characterization, we show that strain gauge sensors provide repeatable 2D planar force measurements in the 0-5 N range and the contact microphone sensor has the capability to distinguish contact material properties. We apply our design to three dexterous manipulation tasks that range from zero to full visual occlusion. Given the expressiveness and reliability of tactile sensor readings, we show that different tactile sensing modalities can be used flexibly in different stages of manipulation, solely or together with visual observations to achieve improved task performance. For instance, we can precisely count and unstack a desired number of paper cups from a stack with 100\% success rate which is hard to achieve with vision only.
Adaptive Dynamics Planning for Robot Navigation
Autonomous robot navigation systems often rely on hierarchical planning, where global planners compute collision-free paths without considering dynamics, and local planners enforce dynamics constraints to produce executable commands. This discontinuity in dynamics often leads to trajectory tracking failure in highly constrained environments. Recent approaches integrate dynamics within the entire planning process by gradually decreasing its fidelity, e.g., increasing integration steps and reducing collision checking resolution, for real-time planning efficiency. However, they assume that the fidelity of the dynamics should decrease according to a manually designed scheme. Such static settings fail to adapt to environmental complexity variations, resulting in computational overhead in simple environments or insufficient dynamics consideration in obstacle-rich scenarios. To overcome this limitation, we propose Adaptive Dynamics Planning (ADP), a learning-augmented paradigm that uses reinforcement learning to dynamically adjust robot dynamics properties, enabling planners to adapt across diverse environments. We integrate ADP into three different planners and further design a standalone ADP-based navigation system, benchmarking them against other baselines. Experiments in both simulation and real-world tests show that ADP consistently improves navigation success, safety, and efficiency.
comment: 8 pages, 4 figures
VER: Vision Expert Transformer for Robot Learning via Foundation Distillation and Dynamic Routing
Pretrained vision foundation models (VFMs) advance robotic learning via rich visual representations, yet individual VFMs typically excel only in specific domains, limiting generality across tasks. Distilling multiple VFMs into a unified representation for policy can mitigate this limitation but often yields inflexible task-specific feature selection and requires costly full re-training to incorporate robot-domain knowledge. We propose VER, a Vision Expert transformer for Robot learning. During pretraining, VER distills multiple VFMs into a vision expert library. It then fine-tunes only a lightweight routing network (fewer than 0.4% of parameters) to dynamically select task-relevant experts from the pretrained library for downstream robot tasks. We further introduce Patchwise Expert Routing with Curriculum Top-K Annealing to improve both flexibility and precision of dynamic expert selection. Moreover, VER supports parameter-efficient finetuning for scalable expert utilization and adaptive robot-domain knowledge integration. Across 17 diverse robotic tasks and multiple policy heads, VER achieves state-of-the-art performance. We find that VER reduces large-norm outliers in task-irrelevant regions (e.g., background) and concentrates on task-critical regions. Visualizations and codes can be found in https://yixiaowang7.github.io/ver_page/.
RowDetr: End-to-End Crop Row Detection Using Polynomials
Crop row detection enables autonomous robots to navigate in gps denied environments. Vision based strategies often struggle in the environments due to gaps, curved crop rows and require post-processing steps. Furthermore, labeling crop rows in under the canopy environments accurately is very difficult due to occlusions. This study introduces RowDetr, an efficient end-to-end transformer-based neural network for crop row detection in precision agriculture. RowDetr leverages a lightweight backbone and a hybrid encoder to model straight, curved, or occluded crop rows with high precision. Central to the architecture is a novel polynomial representation that enables direct parameterization of crop rows, eliminating computationally expensive post-processing. Key innovations include a PolySampler module and multi-scale deformable attention, which work together with PolyOptLoss, an energy-based loss function designed to optimize geometric alignment between predicted and the annotated crop rows, while also enhancing robustness against labeling noise. RowDetr was evaluated against other state-of-the-art end-to-end crop row detection methods like AgroNav and RolColAttention on a diverse dataset of 6,962 high-resolution images, used for training, validation, and testing across multiple crop types with annotated crop rows. The system demonstrated superior performance, achieved an F1 score up to 0.74 and a lane position deviation as low as 0.405. Furthermore, RowDetr achieves a real-time inference latency of 6.7ms, which was optimized to 3.5ms with INT8 quantization on an NVIDIA Jetson Orin AGX. This work highlighted the critical efficiency of polynomial parameterization, making RowDetr particularly suitable for deployment on edge computing devices in agricultural robotics and autonomous farming equipment. Index terms > Crop Row Detection, Under Canopy Navigation, Transformers, RT-DETR, RT-DETRv2
comment: Code will be open sourced upon publication
ReLI: A Language-Agnostic Approach to Human-Robot Interaction
Adapting autonomous agents for real-world industrial, domestic, and other daily tasks is currently gaining momentum. However, in global or cross-lingual application contexts, ensuring effective interaction with the environment and executing unrestricted human-specified tasks regardless of the language remains an unsolved problem. To address this, we propose ReLI, a language-agnostic approach that enables autonomous agents to converse naturally, semantically reason about their environment, and perform downstream tasks, regardless of the task instruction's modality or linguistic origin. First, we ground large-scale pre-trained foundation models and transform them into language-to-action models that can directly provide common-sense reasoning and high-level robot control through natural, free-flow conversational interactions. Further, we perform cross-lingual adaptation of the models to ensure that ReLI generalises across the global languages. To demonstrate ReLI's robustness, we conducted extensive experiments on various short- and long-horizon tasks, including zero- and few-shot spatial navigation, scene information retrieval, and query-oriented tasks. We benchmarked the performance on $140$ languages involving $70K+$ multi-turn conversations. On average, ReLI achieved over $90\%\pm0.2$ accuracy in cross-lingual instruction parsing and task execution success. These results demonstrate its potential to advance natural human-agent interaction in the real world while championing inclusive and linguistic diversity. Demos and resources will be public at: https://linusnep.github.io/ReLI/.
Pragmatic Embodied Spoken Instruction Following in Human-Robot Collaboration with Theory of Mind
Spoken language instructions are ubiquitous in agent collaboration. However, in real-world human-robot collaboration, following human spoken instructions can be challenging due to various speaker and environmental factors, such as background noise or mispronunciation. When faced with noisy auditory inputs, humans can leverage the collaborative context in the embodied environment to interpret noisy spoken instructions and take pragmatic assistive actions. In this paper, we present a cognitively inspired neurosymbolic model, Spoken Instruction Following through Theory of Mind (SIFToM), which leverages a Vision-Language Model with model-based mental inference to enable robots to pragmatically follow human instructions under diverse speech conditions. We test SIFToM in both simulated environments (VirtualHome) and real-world human-robot collaborative settings with human evaluations. Results show that SIFToM can significantly improve the performance of a lightweight base VLM (Gemini 2.5 Flash), outperforming state-of-the-art VLMs (Gemini 2.5 Pro) and approaching human-level accuracy on challenging spoken instruction following tasks.
comment: 8 pages, 7 figures
HeLoM: Hierarchical Learning for Whole-Body Loco-Manipulation in Hexapod Robot
Robots in real-world environments are often required to move/manipulate objects comparable in weight to their own bodies. Compared to grasping and carrying, pushing provides a more straightforward and efficient non-prehensile manipulation strategy, avoiding complex grasp design while leveraging direct contact to regulate an object's pose. Achieving effective pushing, however, demands both sufficient manipulation forces and the ability to maintain stability, which is particularly challenging when dealing with heavy or irregular objects. To address these challenges, we propose HeLoM, a learning-based hierarchical whole-body manipulation framework for a hexapod robot that exploits coordinated multi-limb control. Inspired by the cooperative strategies of multi-legged insects, our framework leverages redundant contact points and high degrees of freedom to enable dynamic redistribution of contact forces. HeLoM's high-level planner plans pushing behaviors and target object poses, while its low-level controller maintains locomotion stability and generates dynamically consistent joint actions. Our policies trained in simulation are directly deployed on real robots without additional fine-tuning. This design allows the robot to maintain balance while exerting continuous and controllable pushing forces through coordinated foreleg interaction and supportive hind-leg propulsion. We validate the effectiveness of HeLoM through both simulation and real-world experiments. Results show that our framework can stably push boxes of varying sizes and unknown physical properties to designated goal poses in the real world.
KiVi: Kinesthetic-Visuospatial Integration for Dynamic and Safe Egocentric Legged Locomotion
Vision-based locomotion has shown great promise in enabling legged robots to perceive and adapt to complex environments. However, visual information is inherently fragile, being vulnerable to occlusions, reflections, and lighting changes, which often cause instability in locomotion. Inspired by animal sensorimotor integration, we propose KiVi, a Kinesthetic-Visuospatial integration framework, where kinesthetics encodes proprioceptive sensing of body motion and visuospatial reasoning captures visual perception of surrounding terrain. Specifically, KiVi separates these pathways, leveraging proprioception as a stable backbone while selectively incorporating vision for terrain awareness and obstacle avoidance. This modality-balanced, yet integrative design, combined with memory-enhanced attention, allows the robot to robustly interpret visual cues while maintaining fallback stability through proprioception. Extensive experiments show that our method enables quadruped robots to stably traverse diverse terrains and operate reliably in unstructured outdoor environments, remaining robust to out-of-distribution (OOD) visual noise and occlusion unseen during training, thereby highlighting its effectiveness and applicability to real-world legged locomotion.
Novel Object 6D Pose Estimation with a Single Reference View
Existing novel object 6D pose estimation methods typically rely on CAD models or dense reference views, which are both difficult to acquire. Using only a single reference view is more scalable, but challenging due to large pose discrepancies and limited geometric and spatial information. To address these issues, we propose a Single-Reference-based novel object 6D (SinRef-6D) pose estimation method. Our key idea is to iteratively establish point-wise alignment in a common coordinate system based on state space models (SSMs). Specifically, iterative object-space point-wise alignment can effectively handle large pose discrepancies, while our proposed RGB and Points SSMs can capture long-range dependencies and spatial information from a single view, offering linear complexity and superior spatial modeling capability. Once pre-trained on synthetic data, SinRef-6D can estimate the 6D pose of a novel object using only a single reference view, without requiring retraining or a CAD model. Extensive experiments on six popular datasets and real-world robotic scenes demonstrate that we achieve on-par performance with CAD-based and dense reference view-based methods, despite operating in the more challenging single reference setting. Code will be released at https://github.com/CNJianLiu/SinRef-6D.
comment: 17 pages, 12 figures (including supplementary material)
LIAM: Multimodal Transformer for Language Instructions, Images, Actions and Semantic Maps
The availability of large language models and open-vocabulary object perception methods enables more flexibility for domestic service robots. The large variability of domestic tasks can be addressed without implementing each task individually by providing the robot with a task description along with appropriate environment information. In this work, we propose LIAM - an end-to-end model that predicts action transcripts based on language, image, action, and map inputs. Language and image inputs are encoded with a CLIP backbone, for which we designed two pre-training tasks to fine-tune its weights and pre-align the latent spaces. We evaluate our method on the ALFRED dataset, a simulator-generated benchmark for domestic tasks. Our results demonstrate the importance of pre-aligning embedding spaces from different modalities and the efficacy of incorporating semantic maps.
comment: 12 pages, 4 figures, 2 tables, 19th International Conference on Intelligent Autonomous Systems (IAS), Genoa, Italy, June 2025
Tele-rehabilitation with online skill transfer and adaptation in $\mathbb{R}^3 \times \mathit{S}^3$
This paper proposes a tele-teaching framework for the domain of robot-assisted tele-rehabilitation. The system connects two robotic manipulators on therapist and patient side via bilateral teleoperation, enabling a therapist to remotely demonstrate rehabilitation exercises that are executed by the patient-side robot. A 6-DoF Dynamical Movement Primitives formulation is employed to jointly encode translational and rotational motions in $\mathbb{R}^3 \times \mathit{S}^3$ space, ensuring accurate trajectory reproduction. The framework supports smooth transitions between therapist-led guidance and patient passive training, while allowing adaptive adjustment of motion. Experiments with 7-DoF manipulators demonstrate the feasibility of the approach, highlighting its potential for personalized and remotely supervised rehabilitation.
Digital-physical testbed for ship autonomy studies in the Marine Cybernetics Laboratory basin
The algorithms developed for Maritime Autonomous Surface Ships (MASS) are often challenging to test on actual vessels due to high operational costs and safety considerations. Simulations offer a cost-effective alternative and eliminate risks, but they may not accurately represent real-world dynamics for the given tasks. Utilizing small-scale model ships and robotic vessels in conjunction with a laboratory basin provides an accessible testing environment for the early stages of validation processes. However, designing and developing a model vessel for a single test can be costly and cumbersome, and researchers often lack access to such infrastructure. To address these challenges and enable streamlined testing, we have developed an in-house testbed that facilitates the development, testing, verification, and validation of MASS algorithms in a digital-physical laboratory. This infrastructure includes a set of small-scale model vessels, a simulation environment for each vessel, a comprehensive testbed environment, and a digital twin in Unity. With this, we aim to establish a full design and verification pipeline that starts with high-fidelity simulation models of each model vessel, to the model-scale testing in the laboratory basin, allowing possibilities for moving towards semi-fullscale validation with R/V milliAmpere1 and full-scale validation with R/V Gunnerus. In this work, we present our progress on the development of this testbed environment and its components, demonstrating its effectiveness in enabling ship guidance, navigation, and control (GNC), including autonomy.
Neural Brain: A Neuroscience-inspired Framework for Embodied Agents
The rapid evolution of artificial intelligence (AI) has shifted from static, data-driven models to dynamic systems capable of perceiving and interacting with real-world environments. Despite advancements in pattern recognition and symbolic reasoning, current AI systems, such as large language models, remain disembodied, unable to physically engage with the world. This limitation has driven the rise of embodied AI, where autonomous agents, such as humanoid robots, must navigate and manipulate unstructured environments with human-like adaptability. At the core of this challenge lies the concept of Neural Brain, a central intelligence system designed to drive embodied agents with human-like adaptability. A Neural Brain must seamlessly integrate multimodal sensing and perception with cognitive capabilities. Achieving this also requires an adaptive memory system and energy-efficient hardware-software co-design, enabling real-time action in dynamic environments. This paper introduces a unified framework for the Neural Brain of embodied agents, addressing two fundamental challenges: (1) defining the core components of Neural Brain and (2) bridging the gap between static AI models and the dynamic adaptability required for real-world deployment. To this end, we propose a biologically inspired architecture that integrates multimodal active sensing, perception-cognition-action function, neuroplasticity-based memory storage and updating, and neuromorphic hardware/software optimization. Furthermore, we also review the latest research on embodied agents across these four aspects and analyze the gap between current AI systems and human intelligence. By synthesizing insights from neuroscience, we outline a roadmap towards the development of generalizable, autonomous agents capable of human-level intelligence in real-world scenarios.
comment: 51 pages, 17 figures, 9 tables
Learning to Play Piano in the Real World
Towards the grand challenge of achieving human-level manipulation in robots, playing piano is a compelling testbed that requires strategic, precise, and flowing movements. Over the years, several works demonstrated hand-designed controllers on real world piano playing, while other works evaluated robot learning approaches on simulated piano scenarios. In this paper, we develop the first piano playing robotic system that makes use of learning approaches while also being deployed on a real world dexterous robot. Specifically, we make use of Sim2Real to train a policy in simulation using reinforcement learning before deploying the learned policy on a real world dexterous robot. In our experiments, we thoroughly evaluate the interplay between domain randomization and the accuracy of the dynamics model used in simulation. Moreover, we evaluate the robot's performance across multiple songs with varying complexity to study the generalization of our learned policy. By providing a proof-of-concept of learning to play piano in the real world, we want to encourage the community to adopt piano playing as a compelling benchmark towards human-level manipulation. We open-source our code and show additional videos at https://lasr.org/research/learning-to-play-piano .
A Hierarchical Control Architecture for Space Robots in On-Orbit Servicing Operations
In-Orbit Servicing and Active Debris Removal require advanced robotic capabilities for capturing and detumbling uncooperative targets. This work presents a hierarchical control framework for autonomous robotic capture of tumbling objects in space. A simulation environment is developed, incorporating sloshing dynamics of the chaser, a rarely studied effect in space robotics. The proposed controller combines an inner Lyapunov-based robust control loop for multi-body dynamics with an outer loop addressing an extended inverse kinematics problem. Simulation results show improved robustness and adaptability compared to existing control schemes.
NDOB-Based Control of a UAV with Delta-Arm Considering Manipulator Dynamics
Aerial Manipulators (AMs) provide a versatile platform for various applications, including 3D printing, architecture, and aerial grasping missions. However, their operational speed is often sacrificed to uphold precision. Existing control strategies for AMs often regard the manipulator as a disturbance and employ robust control methods to mitigate its influence. This research focuses on elevating the precision of the end-effector and enhancing the agility of aerial manipulator movements. We present a composite control scheme to address these challenges. Initially, a Nonlinear Disturbance Observer (NDOB) is utilized to compensate for internal coupling effects and external disturbances. Subsequently, manipulator dynamics are processed through a high pass filter to facilitate agile movements. By integrating the proposed control method into a fully autonomous delta-arm-based AM system, we substantiate the controller's efficacy through extensive real-world experiments. The outcomes illustrate that the end-effector can achieve accuracy at the millimeter level.
comment: 7 pages,6 figures
BiDexHand: Design and Evaluation of an Open-Source 16-DoF Biomimetic Dexterous Hand ICRA 2025
Achieving human-level dexterity in robotic hands remains a fundamental challenge for enabling versatile manipulation across diverse applications. This extended abstract presents BiDexHand, a cable-driven biomimetic robotic hand that combines human-like dexterity with accessible and efficient mechanical design. The robotic hand features 16 independently actuated degrees of freedom and 5 mechanically coupled joints through novel phalange designs that replicate natural finger motion. Performance validation demonstrated success across all 33 grasp types in the GRASP Taxonomy, 9 of 11 positions in the Kapandji thumb opposition test, a measured fingertip force of 2.14\,N, and the capability to lift a 10\,lb weight. As an open-source platform supporting multiple control modes including vision-based teleoperation, BiDexHand aims to democratize access to advanced manipulation capabilities for the broader robotics research community.
comment: ICRA 2025 Dexterity Workshop, Spotlight Presentation
Humanoid Agent via Embodied Chain-of-Action Reasoning with Multimodal Foundation Models for Zero-Shot Loco-Manipulation
Humanoid loco-manipulation, which integrates whole-body locomotion with dexterous manipulation, remains a fundamental challenge in robotics. Beyond whole-body coordination and balance, a central difficulty lies in understanding human instructions and translating them into coherent sequences of embodied actions. Recent advances in foundation models provide transferable multimodal representations and reasoning capabilities, yet existing efforts remain largely restricted to either locomotion or manipulation in isolation, with limited applicability to humanoid settings. In this paper, we propose Humanoid-COA, the first humanoid agent framework that integrates foundation model reasoning with an Embodied Chain-of-Action (CoA) mechanism for zero-shot loco-manipulation. Within the perception--reasoning--action paradigm, our key contribution lies in the reasoning stage, where the proposed CoA mechanism decomposes high-level human instructions into structured sequences of locomotion and manipulation primitives through affordance analysis, spatial inference, and whole-body action reasoning. Extensive experiments on two humanoid robots, Unitree H1-2 and G1, in both an open test area and an apartment environment, demonstrate that our framework substantially outperforms prior baselines across manipulation, locomotion, and loco-manipulation tasks, achieving robust generalization to long-horizon and unstructured scenarios. Project page: https://humanoid-coa.github.io/
comment: website link: https://humanoid-coa.github.io/
ToddlerBot: Open-Source ML-Compatible Humanoid Platform for Loco-Manipulation
Learning-based robotics research driven by data demands a new approach to robot hardware design-one that serves as both a platform for policy execution and a tool for embodied data collection to train policies. We introduce ToddlerBot, a low-cost, open-source humanoid robot platform designed for scalable policy learning and research in robotics and AI. ToddlerBot enables seamless acquisition of high-quality simulation and real-world data. The plug-and-play zero-point calibration and transferable motor system identification ensure a high-fidelity digital twin, enabling zero-shot policy transfer from simulation to the real world. A user-friendly teleoperation interface facilitates streamlined real-world data collection for learning motor skills from human demonstrations. Utilizing its data collection ability and anthropomorphic design, ToddlerBot is an ideal platform to perform whole-body loco-manipulation. Additionally, ToddlerBot's compact size (0.56m, 3.4kg) ensures safe operation in real-world environments. Reproducibility is achieved with an entirely 3D-printed, open-source design and commercially available components, keeping the total cost under 6,000 USD. Comprehensive documentation allows assembly and maintenance with basic technical expertise, as validated by a successful independent replication of the system. We demonstrate ToddlerBot's capabilities through arm span, payload, endurance tests, loco-manipulation tasks, and a collaborative long-horizon scenario where two robots tidy a toy session together. By advancing ML-compatibility, capability, and reproducibility, ToddlerBot provides a robust platform for scalable learning and dynamic policy execution in robotics research.
comment: Project website: https://toddlerbot.github.io/
SafeFlowMatcher: Safe and Fast Planning using Flow Matching with Control Barrier Functions
Generative planners based on flow matching (FM) can produce high-quality paths in one or a few ODE steps, but their sampling dynamics offer no formal safety guarantees and can yield incomplete paths near constraints. We present SafeFlowMatcher, a planning framework that couples FM with control barrier functions (CBFs) to achieve both real-time efficiency and certified safety. SafeFlowMatcher uses a two-phase prediction-correction (PC) integrator: (i) a prediction phase integrates the learned FM once (or a few steps) to obtain a candidate path without intervention; (ii) a correction phase refines this path with a vanishing time-scaled vector field and a CBF-based quadratic program that minimally perturbs the vector field. We prove a barrier certificate for the resulting flow system, establishing forward invariance of a robust safe set and finite-time convergence to the safe set. By enforcing safety only on the executed path (rather than on all intermediate latent paths), SafeFlowMatcher avoids distributional drift and mitigates local trap problems. Across maze navigation and locomotion benchmarks, SafeFlowMatcher attains faster, smoother, and safer paths than diffusion- and FM-based baselines. Extensive ablations corroborate the contributions of the PC integrator and the barrier certificate.
comment: 10 pages, 7 figures, 4 tables
In-Hand Manipulation of Articulated Tools with Dexterous Robot Hands with Sim-to-Real Transfer
Reinforcement learning (RL) and sim-to-real transfer have advanced robotic manipulation of rigid objects. Yet, policies remain brittle when applied to articulated mechanisms due to contact-rich dynamics and under-modeled joint phenomena such as friction, stiction, backlash, and clearances. We address this challenge through dexterous in-hand manipulation of articulated tools using a robotic hand with reduced articulation and kinematic redundancy relative to the human hand. Our controller augments a simulation-trained base policy with a sensor-driven refinement learned from hardware demonstrations, conditioning on proprioception and target articulation states while fusing whole-hand tactile and force feedback with the policy's internal action intent via cross-attention-based integration. This design enables online adaptation to instance-specific articulation properties, stabilizes contact interactions, regulates internal forces, and coordinates coupled-link motion under perturbations. We validate our approach across a diversity of real-world examples, including scissors, pliers, minimally invasive surgical tools, and staplers. We achieve robust transfer from simulation to hardware, improved disturbance resilience, and generalization to previously unseen articulated tools, thereby reducing reliance on precise physical modeling in contact-rich settings.
PolySim: Bridging the Sim-to-Real Gap for Humanoid Control via Multi-Simulator Dynamics Randomization
Humanoid whole-body control (WBC) policies trained in simulation often suffer from the sim-to-real gap, which fundamentally arises from simulator inductive bias, the inherent assumptions and limitations of any single simulator. These biases lead to nontrivial discrepancies both across simulators and between simulation and the real world. To mitigate the effect of simulator inductive bias, the key idea is to train policies jointly across multiple simulators, encouraging the learned controller to capture dynamics that generalize beyond any single simulator's assumptions. We thus introduce PolySim, a WBC training platform that integrates multiple heterogeneous simulators. PolySim can launch parallel environments from different engines simultaneously within a single training run, thereby realizing dynamics-level domain randomization. Theoretically, we show that PolySim yields a tighter upper bound on simulator inductive bias than single-simulator training. In experiments, PolySim substantially reduces motion-tracking error in sim-to-sim evaluations; for example, on MuJoCo, it improves execution success by 52.8 over an IsaacSim baseline. PolySim further enables zero-shot deployment on a real Unitree G1 without additional fine-tuning, showing effective transfer from simulation to the real world. We will release the PolySim code upon acceptance of this work.
comment: 8 pages, 5 figures
Code Generation and Conic Constraints for Model-Predictive Control on Microcontrollers with Conic-TinyMPC
Model-predictive control (MPC) is a powerful framework for controlling dynamic systems under constraints, but it remains challenging to deploy on resource-constrained platforms, especially for problems involving conic constraints. To address this, we extend recent work developing fast, structure-exploiting, cached ADMM solvers for embedded applications, to provide support for second-order cones, as well as C++ code generation from Python, MATLAB, and Julia for easy deployment. Microcontroller benchmarks show that our solver provides up to a two-order-of-magnitude speedup, ranging from 10.6x to 142.7x, over state-of-the-art embedded solvers on QP and SOCP problems, and enables us to fit order-of-magnitude larger problems in memory. We validate our solver's deployed performance through simulation and hardware experiments, including conically-constrained trajectory tracking on a 27g Crazyflie quadrotor. To get started with Conic-TinyMPC, visit our documentation, examples, and the open-source codebase at https://tinympc.org.
comment: First three authors contributed equally
Learning Closed-Loop Parametric Nash Equilibria of Multi-Agent Collaborative Field Coverage
Multi-agent reinforcement learning is a challenging and active field of research due to the inherent nonstationary property and coupling between agents. A popular approach to modeling the multi-agent interactions underlying the multi-agent RL problem is the Markov Game. There is a special type of Markov Game, termed Markov Potential Game, which allows us to reduce the Markov Game to a single-objective optimal control problem where the objective function is a potential function. In this work, we prove that a multi-agent collaborative field coverage problem, which is found in many engineering applications, can be formulated as a Markov Potential Game, and we can learn a parameterized closed-loop Nash Equilibrium by solving an equivalent single-objective optimal control problem. As a result, our algorithm is 10x faster during training compared to a game-theoretic baseline and converges faster during policy execution.
comment: Updated license
A Knowledge-Driven Diffusion Policy for End-to-End Autonomous Driving Based on Expert Routing
End-to-end autonomous driving remains constrained by the difficulty of producing adaptive, robust, and interpretable decision-making across diverse scenarios. Existing methods often collapse diverse driving behaviors, lack long-horizon consistency, or require task-specific engineering that limits generalization. This paper presents KDP, a knowledge-driven diffusion policy that integrates generative diffusion modeling with a sparse mixture-of-experts routing mechanism. The diffusion component generates temporally coherent action sequences, while the expert routing mechanism activates specialized and reusable experts according to context, enabling modular knowledge composition. Extensive experiments across representative driving scenarios demonstrate that KDP achieves consistently higher success rates, reduced collision risk, and smoother control compared to prevailing paradigms. Ablation studies highlight the effectiveness of sparse expert activation and the Transformer backbone, and activation analyses reveal structured specialization and cross-scenario reuse of experts. These results establish diffusion with expert routing as a scalable and interpretable paradigm for knowledge-driven end-to-end autonomous driving.
comment: https://perfectxu88.github.io/KDP-AD/
Neural-Rendezvous: Provably Robust Guidance and Control to Encounter Interstellar Objects
Interstellar objects (ISOs) are likely representatives of primitive materials invaluable in understanding exoplanetary star systems. Due to their poorly constrained orbits with generally high inclinations and relative velocities, however, exploring ISOs with conventional human-in-the-loop approaches is significantly challenging. This paper presents Neural-Rendezvous -- a deep learning-based guidance and control framework for encountering fast-moving objects, including ISOs, robustly, accurately, and autonomously in real time. It uses pointwise minimum norm tracking control on top of a guidance policy modeled by a spectrally-normalized deep neural network, where its hyperparameters are tuned with a loss function directly penalizing the MPC state trajectory tracking error. We show that Neural-Rendezvous provides a high probability exponential bound on the expected spacecraft delivery error, the proof of which leverages stochastic incremental stability analysis. In particular, it is used to construct a non-negative function with a supermartingale property, explicitly accounting for the ISO state uncertainty and the local nature of nonlinear state estimation guarantees. In numerical simulations, Neural-Rendezvous is demonstrated to satisfy the expected error bound for 100 ISO candidates. This performance is also empirically validated using our spacecraft simulator and in high-conflict and distributed UAV swarm reconfiguration with up to 20 UAVs.
comment: Preprint Version, Accepted: October, 2024 (One-minute YouTube summary: https://youtu.be/q3e0LYS2IYQ, DOI: https://doi.org/10.2514/1.G007671)
Contraction Theory for Nonlinear Stability Analysis and Learning-based Control: A Tutorial Overview
Contraction theory is an analytical tool to study differential dynamics of a non-autonomous (i.e., time-varying) nonlinear system under a contraction metric defined with a uniformly positive definite matrix, the existence of which results in a necessary and sufficient characterization of incremental exponential stability of multiple solution trajectories with respect to each other. By using a squared differential length as a Lyapunov-like function, its nonlinear stability analysis boils down to finding a suitable contraction metric that satisfies a stability condition expressed as a linear matrix inequality, indicating that many parallels can be drawn between well-known linear systems theory and contraction theory for nonlinear systems. Furthermore, contraction theory takes advantage of a superior robustness property of exponential stability used in conjunction with the comparison lemma. This yields much-needed safety and stability guarantees for neural network-based control and estimation schemes, without resorting to a more involved method of using uniform asymptotic stability for input-to-state stability. Such distinctive features permit the systematic construction of a contraction metric via convex optimization, thereby obtaining an explicit exponential bound on the distance between a time-varying target trajectory and solution trajectories perturbed externally due to disturbances and learning errors. The objective of this paper is, therefore, to present a tutorial overview of contraction theory and its advantages in nonlinear stability analysis of deterministic and stochastic systems, with an emphasis on deriving formal robustness and stability guarantees for various learning-based and data-driven automatic control methods. In particular, we provide a detailed review of techniques for finding contraction metrics and associated control and estimation laws using deep neural networks.
comment: Annual Reviews in Control, Preprint Version, Accepted, Oct. 1st
Distilling On-device Language Models for Robot Planning with Minimal Human Intervention
Large language models (LLMs) provide robots with powerful contextual reasoning abilities and a natural human interface. Yet, current LLM-enabled robots typically depend on cloud-hosted models, limiting their usability in environments with unreliable communication infrastructure, such as outdoor or industrial settings. We present PRISM, a framework for distilling small language model (SLM)-enabled robot planners that run on-device with minimal human supervision. Starting from an existing LLM-enabled planner, PRISM automatically synthesizes diverse tasks and environments, elicits plans from the LLM, and uses this synthetic dataset to distill a compact SLM as a drop-in replacement of the source model. We apply PRISM to three LLM-enabled planners for mapping and exploration, manipulation, and household assistance, and we demonstrate that PRISM improves the performance of Llama-3.2-3B from 10-20% of GPT-4o's performance to over 93% - using only synthetic data. We further demonstrate that the distilled planners generalize across heterogeneous robotic platforms (ground and aerial) and diverse environments (indoor and outdoor). We release all software, trained models, and datasets at https://zacravichandran.github.io/PRISM.
comment: Accepted to the Conference on Robot Learning (CoRL) 2025
Decremental Dynamics Planning for Robot Navigation IROS 2025
Most, if not all, robot navigation systems employ a decomposed planning framework that includes global and local planning. To trade-off onboard computation and plan quality, current systems have to limit all robot dynamics considerations only within the local planner, while leveraging an extremely simplified robot representation (e.g., a point-mass holonomic model without dynamics) in the global level. However, such an artificial decomposition based on either full or zero consideration of robot dynamics can lead to gaps between the two levels, e.g., a global path based on a holonomic point-mass model may not be realizable by a non-holonomic robot, especially in highly constrained obstacle environments. Motivated by such a limitation, we propose a novel paradigm, Decremental Dynamics Planning that integrates dynamic constraints into the entire planning process, with a focus on high-fidelity dynamics modeling at the beginning and a gradual fidelity reduction as the planning progresses. To validate the effectiveness of this paradigm, we augment three different planners with DDP and show overall improved planning performance. We also develop a new DDP-based navigation system, which achieves first place in the simulation phase of the 2025 BARN Challenge. Both simulated and physical experiments validate DDP's hypothesized benefits.
comment: 7 pages. Accepted by IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025)
Ego-to-Exo: Interfacing Third Person Visuals from Egocentric Views in Real-time for Improved ROV Teleoperation
Underwater ROVs (Remotely Operated Vehicles) are unmanned submersible vehicles designed for exploring and operating in the depths of the ocean. Despite using high-end cameras, typical teleoperation engines based on first-person (egocentric) views limit a surface operator's ability to maneuver the ROV in complex deep-water missions. In this paper, we present an interactive teleoperation interface that enhances the operational capabilities via increased situational awareness. This is accomplished by (i) offering on-demand "third"-person (exocentric) visuals from past egocentric views, and (ii) facilitating enhanced peripheral information with augmented ROV pose information in real-time. We achieve this by integrating a 3D geometry-based Ego-to-Exo view synthesis algorithm into a monocular SLAM system for accurate trajectory estimation. The proposed closed-form solution only uses past egocentric views from the ROV and a SLAM backbone for pose estimation, which makes it portable to existing ROV platforms. Unlike data-driven solutions, it is invariant to applications and waterbody-specific scenes. We validate the geometric accuracy of the proposed framework through extensive experiments of 2-DOF indoor navigation and 6-DOF underwater cave exploration in challenging low-light conditions. A subjective evaluation on 15 human teleoperators further confirms the effectiveness of the integrated features for improved teleoperation. We demonstrate the benefits of dynamic Ego-to-Exo view generation and real-time pose rendering for remote ROV teleoperation by following navigation guides such as cavelines inside underwater caves. This new way of interactive ROV teleoperation opens up promising opportunities for future research in subsea telerobotics.
comment: EgoExo++ (Journal extension), V4, 12 pages
CoTaP: Compliant Task Pipeline and Reinforcement Learning of Its Controller with Compliance Modulation
Humanoid whole-body locomotion control is a critical approach for humanoid robots to leverage their inherent advantages. Learning-based control methods derived from retargeted human motion data provide an effective means of addressing this issue. However, because most current human datasets lack measured force data, and learning-based robot control is largely position-based, achieving appropriate compliance during interaction with real environments remains challenging. This paper presents Compliant Task Pipeline (CoTaP): a pipeline that leverages compliance information in the learning-based structure of humanoid robots. A two-stage dual-agent reinforcement learning framework combined with model-based compliance control for humanoid robots is proposed. In the training process, first a base policy with a position-based controller is trained; then in the distillation, the upper-body policy is combined with model-based compliance control, and the lower-body agent is guided by the base policy. In the upper-body control, adjustable task-space compliance can be specified and integrated with other controllers through compliance modulation on the symmetric positive definite (SPD) manifold, ensuring system stability. We validated the feasibility of the proposed strategy in simulation, primarily comparing the responses to external disturbances under different compliance settings.
comment: Submitted to IEEE for possible publication, under review
IMPACT: Intelligent Motion Planning with Acceptable Contact Trajectories via Vision-Language Models
Motion planning involves determining a sequence of robot configurations to reach a desired pose, subject to movement and safety constraints. Traditional motion planning finds collision-free paths, but this is overly restrictive in clutter, where it may not be possible for a robot to accomplish a task without contact. In addition, contacts range from relatively benign (e.g. brushing a soft pillow) to more dangerous (e.g. toppling a glass vase), making it difficult to characterize which may be acceptable. In this paper, we propose IMPACT, a novel motion planning framework that uses Vision-Language Models (VLMs) to infer environment semantics, identifying which parts of the environment can best tolerate contact based on object properties and locations. Our approach generates an anisotropic cost map that encodes directional push safety. We pair this map with a contact-aware A* planner to find stable contact-rich paths. We perform experiments using 20 simulation and 10 real-world scenes and assess using task success rate, object displacements, and feedback from human evaluators. Our results over 3200 simulation and 200 real-world trials suggest that IMPACT enables efficient contact-rich motion planning in cluttered settings while outperforming alternative methods and ablations. Our project website is available at https://impact-planning.github.io/.
TreeIRL: Safe Urban Driving with Tree Search and Inverse Reinforcement Learning
We present TreeIRL, a novel planner for autonomous driving that combines Monte Carlo tree search (MCTS) and inverse reinforcement learning (IRL) to achieve state-of-the-art performance in simulation and in real-world driving. The core idea is to use MCTS to find a promising set of safe candidate trajectories and a deep IRL scoring function to select the most human-like among them. We evaluate TreeIRL against both classical and state-of-the-art planners in large-scale simulations and on 500+ miles of real-world autonomous driving in the Las Vegas metropolitan area. Test scenarios include dense urban traffic, adaptive cruise control, cut-ins, and traffic lights. TreeIRL achieves the best overall performance, striking a balance between safety, progress, comfort, and human-likeness. To our knowledge, our work is the first demonstration of MCTS-based planning on public roads and underscores the importance of evaluating planners across a diverse set of metrics and in real-world environments. TreeIRL is highly extensible and could be further improved with reinforcement learning and imitation learning, providing a framework for exploring different combinations of classical and learning-based approaches to solve the planning bottleneck in autonomous driving.
Multiagent Systems
Paper2Video: Automatic Video Generation from Scientific Papers
Academic presentation videos have become an essential medium for research communication, yet producing them remains highly labor-intensive, often requiring hours of slide design, recording, and editing for a short 2 to 10 minutes video. Unlike natural video, presentation video generation involves distinctive challenges: inputs from research papers, dense multi-modal information (text, figures, tables), and the need to coordinate multiple aligned channels such as slides, subtitles, speech, and human talker. To address these challenges, we introduce PaperTalker, the first benchmark of 101 research papers paired with author-created presentation videos, slides, and speaker metadata. We further design four tailored evaluation metrics--Meta Similarity, PresentArena, PresentQuiz, and IP Memory--to measure how videos convey the paper's information to the audience. Building on this foundation, we propose PaperTalker, the first multi-agent framework for academic presentation video generation. It integrates slide generation with effective layout refinement by a novel effective tree search visual choice, cursor grounding, subtitling, speech synthesis, and talking-head rendering, while parallelizing slide-wise generation for efficiency. Experiments on Paper2Video demonstrate that the presentation videos produced by our approach are more faithful and informative than existing baselines, establishing a practical step toward automated and ready-to-use academic video generation. Our dataset, agent, and code are available at https://github.com/showlab/Paper2Video.
comment: 20 pages, 8 figures
A Fixed Point Framework for the Existence of EFX Allocations
We consider the problem of the existence of an envy-free allocation up to any good (EFX) for linear valuations and establish new results by connecting this problem to a fixed point framework. Specifically, we first use randomized rounding to extend the discrete EFX constraints into a continuous space and show that an EFX allocation exists if and only if the optimal value of the continuously extended objective function is nonpositive. In particular, we demonstrate that this optimization problem can be formulated as an unconstrained difference of convex (DC) program, which can be further simplified to the minimization of a piecewise linear concave function over a polytope. Leveraging this connection, we show that the proposed DC program has a nonpositive optimal objective value if and only if a well-defined continuous vector map admits a fixed point. Crucially, we prove that the reformulated fixed point problem satisfies all the conditions of Brouwer's fixed point theorem, except that self-containedness is violated by an arbitrarily small positive constant. To address this, we propose a slightly perturbed continuous map that always admits a fixed point. This fixed point serves as a proxy for the fixed point (if it exists) of the original map, and hence for an EFX allocation through an appropriate transformation. Our results offer a new approach to establishing the existence of EFX allocations through fixed point theorems. Moreover, the equivalence with DC programming enables a more efficient and systematic method for computing such allocations (if one exists) using tools from nonlinear optimization. Our findings bridge the discrete problem of finding an EFX allocation with two continuous frameworks: solving an unconstrained DC program and identifying a fixed point of a continuous vector map.
Where Did It All Go Wrong? A Hierarchical Look into Multi-Agent Error Attribution
Error attribution in Large Language Model (LLM) multi-agent systems presents a significant challenge in debugging and improving collaborative AI systems. Current approaches to pinpointing agent and step level failures in interaction traces - whether using all-at-once evaluation, step-by-step analysis, or binary search - fall short when analyzing complex patterns, struggling with both accuracy and consistency. We present ECHO (Error attribution through Contextual Hierarchy and Objective consensus analysis), a novel algorithm that combines hierarchical context representation, objective analysis-based evaluation, and consensus voting to improve error attribution accuracy. Our approach leverages a positional-based leveling of contextual understanding while maintaining objective evaluation criteria, ultimately reaching conclusions through a consensus mechanism. Experimental results demonstrate that ECHO outperforms existing methods across various multi-agent interaction scenarios, showing particular strength in cases involving subtle reasoning errors and complex interdependencies. Our findings suggest that leveraging these concepts of structured, hierarchical context representation combined with consensus-based objective decision-making, provides a more robust framework for error attribution in multi-agent systems.
Video Game Level Design as a Multi-Agent Reinforcement Learning Problem AAAI
Procedural Content Generation via Reinforcement Learning (PCGRL) offers a method for training controllable level designer agents without the need for human datasets, using metrics that serve as proxies for level quality as rewards. Existing PCGRL research focuses on single generator agents, but are bottlenecked by the need to frequently recalculate heuristics of level quality and the agent's need to navigate around potentially large maps. By framing level generation as a multi-agent problem, we mitigate the efficiency bottleneck of single-agent PCGRL by reducing the number of reward calculations relative to the number of agent actions. We also find that multi-agent level generators are better able to generalize to out-of-distribution map shapes, which we argue is due to the generators' learning more local, modular design policies. We conclude that treating content generation as a distributed, multi-agent task is beneficial for generating functional artifacts at scale.
comment: 11 pages, 7 tables, 5 figures, published as full technical paper at the AAAI conference on Artificial Intelligence and Interactive Digital Entertainment 2025
LEGOMem: Modular Procedural Memory for Multi-agent LLM Systems for Workflow Automation
We introduce LEGOMem, a modular procedural memory framework for multi-agent large language model (LLM) systems in workflow automation. LEGOMem decomposes past task trajectories into reusable memory units and flexibly allocates them across orchestrators and task agents to support planning and execution. To explore the design space of memory in multi-agent systems, we use LEGOMem as a lens and conduct a systematic study of procedural memory in multi-agent systems, examining where memory should be placed, how it should be retrieved, and which agents benefit most. Experiments on the OfficeBench benchmark show that orchestrator memory is critical for effective task decomposition and delegation, while fine-grained agent memory improves execution accuracy. We find that even teams composed of smaller language models can benefit substantially from procedural memory, narrowing the performance gap with stronger agents by leveraging prior execution traces for more accurate planning and tool use. These results position LEGOMem as both a practical framework for memory-augmented agent systems and a research tool for understanding memory design in multi-agent workflow automation.
Trade in Minutes! Rationality-Driven Agentic System for Quantitative Financial Trading
Recent advancements in large language models (LLMs) and agentic systems have shown exceptional decision-making capabilities, revealing significant potential for autonomic finance. Current financial trading agents predominantly simulate anthropomorphic roles that inadvertently introduce emotional biases and rely on peripheral information, while being constrained by the necessity for continuous inference during deployment. In this paper, we pioneer the harmonization of strategic depth in agents with the mechanical rationality essential for quantitative trading. Consequently, we present TiMi (Trade in Minutes), a rationality-driven multi-agent system that architecturally decouples strategy development from minute-level deployment. TiMi leverages specialized LLM capabilities of semantic analysis, code programming, and mathematical reasoning within a comprehensive policy-optimization-deployment chain. Specifically, we propose a two-tier analytical paradigm from macro patterns to micro customization, layered programming design for trading bot implementation, and closed-loop optimization driven by mathematical reflection. Extensive evaluations across 200+ trading pairs in stock and cryptocurrency markets empirically validate the efficacy of TiMi in stable profitability, action efficiency, and risk control under volatile market dynamics.
comment: 16 pages, 6 figures
Online automatic code generation for robot swarms: LLMs and self-organizing hierarchy
Our recently introduced self-organizing nervous system (SoNS) provides robot swarms with 1) ease of behavior design and 2) global estimation of the swarm configuration and its collective environment, facilitating the implementation of online automatic code generation for robot swarms. In a demonstration with 6 real robots and simulation trials with >30 robots, we show that when a SoNS-enhanced robot swarm gets stuck, it can automatically solicit and run code generated by an external LLM on the fly, completing its mission with an 85% success rate.
Fairness in Repeated Matching: A Maximin Perspective
We study a sequential decision-making model where a set of items is repeatedly matched to the same set of agents over multiple rounds. The objective is to determine a sequence of matchings that either maximizes the utility of the least advantaged agent at the end of all rounds (optimal) or at the end of every individual round (anytime optimal). We investigate the computational challenges associated with finding (anytime) optimal outcomes and demonstrate that these problems are generally computationally intractable. However, we provide approximation algorithms, fixed-parameter tractable algorithms, and identify several special cases whereby the problem(s) can be solved efficiently. Along the way, we also establish characterizations of Pareto-optimal/maximum matchings, which may be of independent interest to works in matching theory and house allocation.
Multi-Turn Human-LLM Interaction Through the Lens of a Two-Way Intelligibility Protocol NeurIPS 2025
Our interest is in the design of software systems involving a human-expert interacting -- using natural language -- with a large language model (LLM) on data analysis tasks. For complex problems, it is possible that LLMs can harness human expertise and creativity to find solutions that were otherwise elusive. On one level, this interaction takes place through multiple turns of prompts from the human and responses from the LLM. Here we investigate a more structured approach based on an abstract protocol described in [3] for interaction between agents. The protocol is motivated by a notion of "two-way intelligibility" and is modelled by a pair of communicating finite-state machines. We provide an implementation of the protocol, and provide empirical evidence of using the implementation to mediate interactions between an LLM and a human-agent in two areas of scientific interest (radiology and drug design). We conduct controlled experiments with a human proxy (a database), and uncontrolled experiments with human subjects. The results provide evidence in support of the protocol's capability of capturing one- and two-way intelligibility in human-LLM interaction; and for the utility of two-way intelligibility in the design of human-machine systems. Our code is available at https://github.com/karannb/interact.
comment: Multi-Turn Interactions in Large Language Models (MTI-LLM) Workshop at NeurIPS 2025
Pragmatic Embodied Spoken Instruction Following in Human-Robot Collaboration with Theory of Mind
Spoken language instructions are ubiquitous in agent collaboration. However, in real-world human-robot collaboration, following human spoken instructions can be challenging due to various speaker and environmental factors, such as background noise or mispronunciation. When faced with noisy auditory inputs, humans can leverage the collaborative context in the embodied environment to interpret noisy spoken instructions and take pragmatic assistive actions. In this paper, we present a cognitively inspired neurosymbolic model, Spoken Instruction Following through Theory of Mind (SIFToM), which leverages a Vision-Language Model with model-based mental inference to enable robots to pragmatically follow human instructions under diverse speech conditions. We test SIFToM in both simulated environments (VirtualHome) and real-world human-robot collaborative settings with human evaluations. Results show that SIFToM can significantly improve the performance of a lightweight base VLM (Gemini 2.5 Flash), outperforming state-of-the-art VLMs (Gemini 2.5 Pro) and approaching human-level accuracy on challenging spoken instruction following tasks.
comment: 8 pages, 7 figures
Who's the Mole? Modeling and Detecting Intention-Hiding Malicious Agents in LLM-Based Multi-Agent Systems
Multi-agent systems powered by Large Language Models (LLM-MAS) have demonstrated remarkable capabilities in collaborative problem-solving. However, their deployment also introduces new security risks. Existing research on LLM-based agents has primarily examined single-agent scenarios, while the security of multi-agent systems remains largely unexplored. To address this gap, we present a systematic study of intention-hiding threats in LLM-MAS. We design four representative attack paradigms that subtly disrupt task completion while maintaining a high degree of stealth, and evaluate them under centralized, decentralized, and layered communication structures. Experimental results show that these attacks are highly disruptive and can easily evade existing defense mechanisms. To counter these threats, we propose AgentXposed, a psychology-inspired detection framework. AgentXposed draws on the HEXACO personality model, which characterizes agents through psychological trait dimensions, and the Reid interrogation technique, a structured method for eliciting concealed intentions. By combining progressive questionnaire probing with behavior-based inter-agent monitoring, the framework enables the proactive identification of malicious agents before harmful actions are carried out. Extensive experiments across six datasets against both our proposed attacks and two baseline threats demonstrate that AgentXposed effectively detects diverse forms of malicious behavior, achieving strong robustness across multiple communication settings.
Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models
Conventional language model (LM) safety alignment relies on a reactive, disjoint procedure: attackers exploit a static model, followed by defensive fine-tuning to patch exposed vulnerabilities. This sequential approach creates a mismatch -- attackers overfit to obsolete defenses, while defenders perpetually lag behind emerging threats. To address this, we propose Self-RedTeam, an online self-play reinforcement learning algorithm where an attacker and defender agent co-evolve through continuous interaction. We cast safety alignment as a two-player zero-sum game, where a single model alternates between attacker and defender roles -- generating adversarial prompts and safeguarding against them -- while a reward LM adjudicates outcomes. This enables dynamic co-adaptation. Grounded in the game-theoretic framework of zero-sum games, we establish a theoretical safety guarantee which motivates the design of our method: if self-play converges to a Nash Equilibrium, the defender will reliably produce safe responses to any adversarial input. Empirically, Self-RedTeam uncovers more diverse attacks (+21.8% SBERT) compared to attackers trained against static defenders and achieves higher robustness on safety benchmarks (e.g., +65.5% on WildJailBreak) than defenders trained against static attackers. We further propose hidden Chain-of-Thought, allowing agents to plan privately, which boosts adversarial diversity and reduces over-refusals. Our results motivate a shift from reactive patching to proactive co-evolution in LM safety training, enabling scalable, autonomous, and robust self-improvement of LMs via multi-agent reinforcement learning (MARL).
The Hive Mind is a Single Reinforcement Learning Agent
Decision-making is an essential attribute of any intelligent agent or group. Natural systems are known to converge to optimal strategies through at least two distinct mechanisms: collective decision-making via imitation of others, and individual trial-and-error. This paper establishes an equivalence between these two paradigms by drawing from the well-established collective decision-making model of nest-hunting in swarms of honey bees. We show that the emergent distributed cognition (sometimes referred to as the $\textit{hive mind}$) arising from individual bees following simple, local imitation-based rules is that of a single online reinforcement learning (RL) agent interacting with many parallel environments. The update rule through which this macro-agent learns is a bandit algorithm that we coin $\textit{Maynard-Cross Learning}$. Our analysis implies that a group of cognition-limited organisms can be equivalent to a more complex, reinforcement-enabled entity, substantiating the idea that group-level intelligence may explain how seemingly simple and blind individual behaviors are selected in nature. From a biological perspective, this analysis suggests how such imitation strategies evolved: they constitute a scalable form of reinforcement learning at the group level, aligning with theories of kin and group selection. Beyond biology, the framework offers new tools for analyzing economic and social systems where individuals imitate successful strategies, effectively participating in a collective learning process. In swarm intelligence, our findings will inform the design of scalable collective systems in artificial domains, enabling RL-inspired mechanisms for coordination and adaptability at scale.
Learning Closed-Loop Parametric Nash Equilibria of Multi-Agent Collaborative Field Coverage
Multi-agent reinforcement learning is a challenging and active field of research due to the inherent nonstationary property and coupling between agents. A popular approach to modeling the multi-agent interactions underlying the multi-agent RL problem is the Markov Game. There is a special type of Markov Game, termed Markov Potential Game, which allows us to reduce the Markov Game to a single-objective optimal control problem where the objective function is a potential function. In this work, we prove that a multi-agent collaborative field coverage problem, which is found in many engineering applications, can be formulated as a Markov Potential Game, and we can learn a parameterized closed-loop Nash Equilibrium by solving an equivalent single-objective optimal control problem. As a result, our algorithm is 10x faster during training compared to a game-theoretic baseline and converges faster during policy execution.
comment: Updated license
Systems and Control (CS)
PowerPlots: An Open Source Power Grid Visualization and Data Analysis Framework for Academic Research
Data visualization is important for developing an understanding of a complex system. PowerPlots.jl is a data visualization tool for power grids, one of the most complex systems in the world. The design of PowerPlots.jl is intended to facilitate exploration of power grid data while performing research and to facilitate communication of research findings to an audience. Several tools created to support this software also facilitate analysis of power grid data by transforming the data into graph topology or data-frame data formats that are more compatible for some applications. The high level of flexibility in PowerPlots.jl enables researchers who are developing and analyzing methods for solving novel power grid problems to better understand and communicate the complexities of their research.
Multi-Loop Design of Virtual Synchronous Machine Control for DFIG-Based Wind Farms
The displacement of synchronous generators by converter-interfaced renewable energy sources obliges wind farms to provide inertia, damping, and voltage support, above all in increasingly weak grid conditions. This paper presents a co-ordinated frequency-domain methodology for tuning all control layers of doubly-fed induction generators (DFIGs) within a wind farm operated as a Virtual Synchronous Machine (VSM). Starting from a full small-signal linearisation that preserves loop-to-loop and machine-to-machine couplings, the procedure reshapes every local open loop to explicit phase-margin targets through a single, prioritised iteration. The resulting controllers provide a step response and stability margins close to those programmed at the design stage, in spite of the cross coupling between control loops. Since controller synthesis relies exclusively on classical loop-shaping tools available in commercial simulation suites, it is readily applicable to industrial-scale projects.
comment: Submitted for evaluation to Journal of Modern Power Systems and Clean Energy
Optimal participation of energy communities in electricity markets under uncertainty. A multi-stage stochastic programming approach
We propose a multi-stage stochastic programming model for the optimal participation of energy communities in electricity markets. The multi-stage aspect captures the different times at which variable renewable generation and electricity prices are observed. This results in large-scale optimization problem instances containing large scenario trees with 34 stages, to which scenario reduction techniques are applied. Case studies with real data are discussed to analyse proposed regulatory frameworks in Europe. The added value of considering stochasticity is also analysed.
Robust Cislunar Navigation via LFT-Based $\mathcal{H}_\infty$ Filtering with Bearing-Only Measurements
This paper develops a robust estimation framework for cislunar navigation that embeds the Circular Restricted Three-Body Problem (CR3BP) dynamics and bearing-only optical measurements within a Linear Fractional Transformation (LFT) representation. A full-order $\mathcal{H}_\infty$ observer is synthesized with explicit $\mathcal{L}_2$ performance bounds. The formulation yields a nonlinear estimator that operates directly on the governing equations and avoids reliance on local linearizations. Dominant nonlinearities are expressed as structured real uncertainties, while measurement fidelity is represented through range-dependent weighting with Earth-Moon distances reconstructed from line-of-sight geometry. The sensing architecture assumes passive star-tracker-class optical instruments, eliminating the need for time-of-flight ranging or precision clocks. Simulations demonstrate bounded estimation errors and smooth position tracking over multiple orbital periods, with the largest deviations observed in the out-of-plane states, consistent with the stiffness of the vertical dynamics and the limitations of angle-only observability. Application to a Near Rectilinear Halo Orbit (NRHO) illustrates that the framework can achieve robust onboard navigation with bounded estimation errors with flight-representative sensors.
Steady-State Spread Bounds for Graph Diffusion via Laplacian Regularisation
We study how far a diffusion process on a graph can drift from a designed starting pattern when that pattern is produced using Laplacian regularisation. Under standard stability conditions for undirected, entrywise nonnegative graphs, we give a closed-form, instance-specific upper bound on the steady-state spread, measured as the relative change between the final and initial profiles. The bound separates two effects: (i) an irreducible term determined by the graph's maximum node degree, and (ii) a design-controlled term that shrinks as the regularisation strength increases (following an inverse square-root law). This leads to a simple design rule: given any target limit on spread, one can choose a sufficient regularisation strength in closed form. Although one motivating application is array beamforming, where the initial pattern is the squared magnitude of the beamformer weights, the result applies to any scenario that first enforces Laplacian smoothness and then evolves by linear diffusion on a graph. Overall, the guarantee is non-asymptotic, easy to compute, and certifies how much steady-state deviation can occur.
A Fixed Point Framework for the Existence of EFX Allocations
We consider the problem of the existence of an envy-free allocation up to any good (EFX) for linear valuations and establish new results by connecting this problem to a fixed point framework. Specifically, we first use randomized rounding to extend the discrete EFX constraints into a continuous space and show that an EFX allocation exists if and only if the optimal value of the continuously extended objective function is nonpositive. In particular, we demonstrate that this optimization problem can be formulated as an unconstrained difference of convex (DC) program, which can be further simplified to the minimization of a piecewise linear concave function over a polytope. Leveraging this connection, we show that the proposed DC program has a nonpositive optimal objective value if and only if a well-defined continuous vector map admits a fixed point. Crucially, we prove that the reformulated fixed point problem satisfies all the conditions of Brouwer's fixed point theorem, except that self-containedness is violated by an arbitrarily small positive constant. To address this, we propose a slightly perturbed continuous map that always admits a fixed point. This fixed point serves as a proxy for the fixed point (if it exists) of the original map, and hence for an EFX allocation through an appropriate transformation. Our results offer a new approach to establishing the existence of EFX allocations through fixed point theorems. Moreover, the equivalence with DC programming enables a more efficient and systematic method for computing such allocations (if one exists) using tools from nonlinear optimization. Our findings bridge the discrete problem of finding an EFX allocation with two continuous frameworks: solving an unconstrained DC program and identifying a fixed point of a continuous vector map.
Benchmarking M-LTSF: Frequency and Noise-Based Evaluation of Multivariate Long Time Series Forecasting Models
Understanding the robustness of deep learning models for multivariate long-term time series forecasting (M-LTSF) remains challenging, as evaluations typically rely on real-world datasets with unknown noise properties. We propose a simulation-based evaluation framework that generates parameterizable synthetic datasets, where each dataset instance corresponds to a different configuration of signal components, noise types, signal-to-noise ratios, and frequency characteristics. These configurable components aim to model real-world multivariate time series data without the ambiguity of unknown noise. This framework enables fine-grained, systematic evaluation of M-LTSF models under controlled and diverse scenarios. We benchmark four representative architectures S-Mamba (state-space), iTransformer (transformer-based), R-Linear (linear), and Autoformer (decomposition-based). Our analysis reveals that all models degrade severely when lookback windows cannot capture complete periods of seasonal patters in the data. S-Mamba and Autoformer perform best on sawtooth patterns, while R-Linear and iTransformer favor sinusoidal signals. White and Brownian noise universally degrade performance with lower signal-to-noise ratio while S-Mamba shows specific trend-noise and iTransformer shows seasonal-noise vulnerability. Further spectral analysis shows that S-Mamba and iTransformer achieve superior frequency reconstruction. This controlled approach, based on our synthetic and principle-driven testbed, offers deeper insights into model-specific strengths and limitations through the aggregation of MSE scores and provides concrete guidance for model selection based on signal characteristics and noise conditions.
comment: Number of pages: 13 Number of figures: 16 Number of Tables: 1 Submitted to: IEEE Transactions on Signal Processing
Rapid stabilization for a wave equation with boundary disturbance
In this paper, we study the rapid stabilization of an unstable wave equation, in which an unknown disturbance is located at the boundary condition. We address two different boundary conditions: Dirichlet- Dirichlet and Dirichlet-Neumann. In both cases, we design a feedback law, located at the same place as the unknown disturbance, that forces the exponential decay of the energy for any desired decay rate while suppressing the effects of the unknown disturbance. For the feedback design, we employ the backstepping method, Lyapunov techniques and the sign multivalued operator. The well-posedness of the closed-loop system, which is a differential inclusion, is shown with the maximal monotone operator theory.
Model Predictive Control-Guided Reinforcement Learning for Implicit Balancing
In Europe, profit-seeking balance responsible parties can deviate in real time from their day-ahead nominations to assist transmission system operators in maintaining the supply-demand balance. Model predictive control (MPC) strategies to exploit these implicit balancing strategies capture arbitrage opportunities, but fail to accurately capture the price-formation process in the European imbalance markets and face high computational costs. Model-free reinforcement learning (RL) methods are fast to execute, but require data-intensive training and usually rely on real-time and historical data for decision-making. This paper proposes an MPC-guided RL method that combines the complementary strengths of both MPC and RL. The proposed method can effectively incorporate forecasts into the decision-making process (as in MPC), while maintaining the fast inference capability of RL. The performance of the proposed method is evaluated on the implicit balancing battery control problem using Belgian balancing data from 2023. First, we analyze the performance of the standalone state-of-the-art RL and MPC methods from various angles, to highlight their individual strengths and limitations. Next, we show an arbitrage profit benefit of the proposed MPC-guided RL method of 16.15% and 54.36%, compared to standalone RL and MPC.
An Active Fault-Tolerant Online Control Allocation Scheme for a Dual-System UAV in Transition Flight
A novel active fault-tolerant control (AFTC) scheme for a dual-system vertical takeoff and landing (VTOL) unmanned aerial vehicle (UAV) during transition flight is proposed in this paper. The AFTC scheme is composed of a baseline control law and an online control reallocation module. First, the structured $H_{\infty}$ baseline control law is able to guarantee the stability of closed-loop systems without being reconfigured under simultaneous actuator fault conditions. Second, compared to the existing mainstream method of sliding mode control that is a discontinuous control strategy, the AFTC scheme can effectively avoid control chattering problem by adopting the structured $H_{\infty}$ baseline control law. Third, an online control allocation (CA) module is implemented to carry out a unified CA for all the available actuators. When actuator faults/failures occur, the CA matrix is updated according to fault information and real-time airspeed, which is able to redistribute the virtual control signals to the remaining healthy actuators, avoiding significant performance degradation. Based on the developed AFTC scheme, symmetric and non-symmetric actuator fault scenarios are simulated on a nonlinear six-degree-of-freedom simulator, where the cases of merely structured $H_{\infty}$ control and structured $H_{\infty}$ based AFTC are compared and analyzed. The results show that the proposed structured $H_{\infty}$ based AFTC system is capable of handling more complicated fault scenarios and model uncertainties with no need to reconfigure the baseline control law. The proposed AFTC scheme significantly improves the safety and reliability of the transition flight of dual-system VTOL UAVs.
comment: 40 pages
Power Reserve Capacity from Virtual Power Plants with Reliability and Cost Guarantees
The growing penetration of renewable energy sources is expected to drive higher demand for power reserve ancillary services (AS). One solution is to increase the supply by integrating distributed energy resources (DERs) into the AS market through virtual power plants (VPPs). Several methods have been developed to assess the potential of VPPs to provide services. However, the existing approaches fail to account for AS products' requirements (reliability and technical specifications) and to provide accurate cost estimations. Here, we propose a new method to assess VPPs' potential to deliver power reserve capacity products under forecasting uncertainty. First, the maximum feasible reserve quantity is determined using a novel formulation of subset simulation for efficient uncertainty quantification. Second, the supply curve is characterized by considering explicit and opportunity costs. The method is applied to a VPP based on a representative Swiss low-voltage network with a diversified DER portfolio. We find that VPPs can reliably offer reserve products and that opportunity costs drive product pricing. Additionally, we show that the product's requirements strongly impact the reserve capacity provision capability. This approach aims to support VPP managers in developing market strategies and policymakers in designing DER-focused AS products.
comment: Submitted to IEEE Transactions on Power Systems
Robust stability of event-triggered nonlinear moving horizon estimation
In this work, we propose an event-triggered moving horizon estimation (ET-MHE) scheme for the remote state estimation of general nonlinear systems. In the presented method, whenever an event is triggered, a single measurement is transmitted and the nonlinear MHE optimization problem is subsequently solved. If no event is triggered, the current state estimate is updated using an open-loop prediction based on the system dynamics. Moreover, we introduce a novel event-triggering rule under which we demonstrate robust global exponential stability of the ET-MHE scheme, assuming a suitable detectability condition is met. In addition, we show that with the adoption of a varying horizon length, a tighter bound on the estimation error can be achieved. Finally, we validate the effectiveness of the proposed method through two illustrative examples.
Efficient Probabilistic Planning with Maximum-Coverage Distributionally Robust Backward Reachable Trees
This paper presents a new multi-query motion planning algorithm for linear Gaussian systems with the goal of reaching a Euclidean ball with high probability. We develop a new formulation for ball-shaped ambiguity sets of Gaussian distributions and leverage it to develop a distributionally robust belief roadmap construction algorithm. This algorithm synthe- sizes robust controllers which are certified to be safe for maximal size ball-shaped ambiguity sets of Gaussian distributions. Our algorithm achieves better coverage than the maximal coverage algorithm for planning over Gaussian distributions [1], and we identify mild conditions under which our algorithm achieves strictly better coverage. For the special case of no process noise or state constraints, we formally prove that our algorithm achieves maximal coverage. In addition, we present a second multi-query motion planning algorithm for linear Gaussian systems with the goal of reaching a region parameterized by the Minkowski sum of an ellipsoid and a Euclidean ball with high probability. This algorithm plans over ellipsoidal sets of maximal size ball-shaped ambiguity sets of Gaussian distributions, and provably achieves equal or better coverage than the best-known algorithm for planning over ellipsoidal ambiguity sets of Gaussian distributions [2]. We demonstrate the efficacy of both methods in a wide range of conditions via extensive simulation experiments.
MPC strategies for density profile control with pellet fueling in nuclear fusion tokamaks under uncertainty
Control of the density profile based on pellet fueling for the ITER nuclear fusion tokamak involves a multi-rate nonlinear system with safety-critical constraints, input delays, and discrete actuators with parametric uncertainty. To address this challenging problem, we propose a multi-stage MPC (msMPC) approach to handle uncertainty in the presence of mixed-integer inputs. While the scenario tree of msMPC accounts for uncertainty, it also adds complexity to an already computationally intensive mixed-integer MPC (MI-MPC) problem. To achieve real-time density profile controller with discrete pellets and uncertainty handling, we systematically reduce the problem complexity by (1) reducing the identified prediction model size through dynamic mode decomposition with control, (2) applying principal component analysis to reduce the number of scenarios needed to capture the parametric uncertainty in msMPC, and (3) utilizing the penalty term homotopy for MPC (PTH-MPC) algorithm to reduce the computational burden caused by the presence of mixed-integer inputs. We compare the performance and safety of the msMPC strategy against a nominal MI-MPC in plant simulations, demonstrating the first predictive density control strategy with uncertainty handling, viable for real-time pellet fueling in ITER.
comment: IEEE CDC 2025
Learning a Shape-adaptive Assist-as-needed Rehabilitation Policy from Therapist-informed Input
Therapist-in-the-loop robotic rehabilitation has shown great promise in enhancing rehabilitation outcomes by integrating the strengths of therapists and robotic systems. However, its broader adoption remains limited due to insufficient safe interaction and limited adaptation capability. This article proposes a novel telerobotics-mediated framework that enables therapists to intuitively and safely deliver assist-as-needed~(AAN) therapy based on two primary contributions. First, our framework encodes the therapist-informed corrective force into via-points in a latent space, allowing the therapist to provide only minimal assistance while encouraging patient maintaining own motion preferences. Second, a shape-adaptive ANN rehabilitation policy is learned to partially and progressively deform the reference trajectory for movement therapy based on encoded patient motion preferences and therapist-informed via-points. The effectiveness of the proposed shape-adaptive AAN strategy was validated on a telerobotic rehabilitation system using two representative tasks. The results demonstrate its practicality for remote AAN therapy and its superiority over two state-of-the-art methods in reducing corrective force and improving movement smoothness.
Modeling and Managing Temporal Obligations in GUCON Using SPARQL-star and RDF-star
In the digital age, data frequently crosses organizational and jurisdictional boundaries, making effective governance essential. Usage control policies have emerged as a key paradigm for regulating data usage, safeguarding privacy, protecting intellectual property, and ensuring compliance with regulations. A central mechanism for usage control is the handling of obligations, which arise as a side effect of using and sharing data. Effective monitoring of obligations requires capturing usage traces and accounting for temporal aspects such as start times and deadlines, as obligations may evolve over times into different states, such as fulfilled, violated, or expired. While several solutions have been proposed for obligation monitoring, they often lack formal semantics or provide limited support for reasoning over obligation states. To address these limitations, we extend GUCON, a policy framework grounded in the formal semantics of SPAQRL graph patterns, to explicitly model the temporal aspects of an obligation. This extension enables the expressing of temporal obligations and supports continuous monitoring of their evolving states based on usage traces stored in temporal knowledge graphs. We demonstrate how this extended model can be represented using RDF-star and SPARQL-star and propose an Obligation State Manager that monitors obligation states and assess their compliance with respect to usage traces. Finally, we evaluate both the extended model and its prototype implementation.
On Prediction-Based Properties of Discrete-Event Systems: Notions, Applications and Supervisor Synthesis
In this work, we investigate the problem of synthesizing property-enforcing supervisors for partially-observed discrete-event systems (DES). Unlike most existing approaches, where the enforced property depends solely on the executed behavior of the system, here we consider a more challenging scenario in which the property relies on predicted future behaviors that have not yet occurred. This problem arises naturally in applications involving future information, such as active prediction or intention protection. To formalize the problem, we introduce the notion of prediction-based properties, a new class of observational properties tied to the system's future information. We demonstrate that this notion is very generic and can model various practical properties, including predictability in fault prognosis and pre-opacity in intention security. We then present an effective approach for synthesizing supervisors that enforce prediction-based properties. Our method relies on a novel information structure that addresses the fundamental challenge arising from the dependency between current predictions and the control policy. The key idea is to first borrow information from future instants and then ensure information consistency. This reduces the supervisor synthesis problem to a safety game in the information space. We prove that the proposed algorithm is both sound and complete, and the resulting supervisor is maximally permissive.
Design Process of a Self Adaptive Smart Serious Games Ecosystem
This paper outlines the design vision and planned evolution of Blexer v3, a modular and AI-driven rehabilitation ecosystem based on serious games. Building on insights from previous versions of the system, we propose a new architecture that aims to integrate multimodal sensing, real-time reasoning, and intelligent control. The envisioned system will include distinct modules for data collection, user state inference, and gameplay adaptation. Key features such as dynamic difficulty adjustment (DDA) and procedural content generation (PCG) are also considered to support personalized interventions. We present the complete conceptual framework of Blexer v3, which defines the modular structure and data flow of the system. This serves as the foundation for the next phase: the development of a functional prototype and its integration into clinical rehabilitation scenarios.
Data-Driven Adaptive PID Control Based on Physics-Informed Neural Networks
This article proposes a data-driven PID controller design based on the principle of adaptive gain optimization, leveraging Physics-Informed Neural Networks (PINNs) generated for predictive modeling purposes. The proposed control design method utilizes gradients of the PID gain optimization, achieved through the automatic differentiation of PINNs, to apply model predictive control using a cost function based on tracking error and control inputs. By optimizing PINNs-based PID gains, the method achieves adaptive gain tuning that ensures stability while accounting for system nonlinearities. The proposed method features a systematic framework for integrating PINNs-based models of dynamical control systems into closed-loop control systems, enabling direct application to PID control design. A series of numerical experiments is conducted to demonstrate the effectiveness of the proposed method from the control perspectives based on both time and frequency domains.
comment: This work has been submitted to the IEEE Transactions on Control Systems Technology for possible publication
On properties of hydraulic equilibria in district heating networks
District heating networks are an integral part of the energy system in many countries. In future smart energy systems, they are expected to enhance energy flexibility and support the integration of renewable and waste energy sources. An important aspect of these networks is the control of flow rates, which dictates the heat delivered to consumers. This paper concerns the properties of flow rates in tree-structured district heating networks. We show that under mild assumptions of monotonicity in the hydraulic network components, statements regarding the stationary flow rate distribution can be made. In particular, when all consumers in a network incrementally open their valves, an increase in total flow rate throughput is guaranteed, while if one consumer does not open their valve when others do, they will receive a reduced flow rate. These properties are illustrated numerically on a small 2-consumer network as well as on a larger 22-consumer network. Previous works have shown that these properties allow the design and use of efficient control strategies for optimal heat distribution.
comment: Accepted for presentation at the 64th IEEE Conference on Decision and Control (CDC), 2025. 6 pages, 5 figures
Velocity-Form Data-Enabled Predictive Control of Soft Robots under Unknown External Payloads
Data-driven control methods such as data-enabled predictive control (DeePC) have shown strong potential in efficient control of soft robots without explicit parametric models. However, in object manipulation tasks, unknown external payloads and disturbances can significantly alter the system dynamics and behavior, leading to offset error and degraded control performance. In this paper, we present a novel velocity-form DeePC framework that achieves robust and optimal control of soft robots under unknown payloads. The proposed framework leverages input-output data in an incremental representation to mitigate performance degradation induced by unknown payloads, eliminating the need for weighted datasets or disturbance estimators. We validate the method experimentally on a planar soft robot and demonstrate its superior performance compared to standard DeePC in scenarios involving unknown payloads.
A Diffusion-based Generative Machine Learning Paradigm for Contingency Screening
Contingency screening is a crucial part of electric power systems all the time. Power systems frequently encounter multiple challenging operational dilemmas that could lead to the instability of power systems. Contingency analysis is effort-consuming by utilizing traditional numerical analysis methods. It is commonly addressed by generating a whopping number of possible contingencies or manipulating network parameters to determine the worst scenarios. This paper proposes a novel approach that diverts the nature of contingency analysis from pre-defined scenario screening to proactive-unsupervised screening. The potentially risky scenarios of power systems are generated from learning how the previous ones occurred. In other words, the internal perturbation that initiates contingencies is learned prior to being self-replicated for rendering the worst scenarios. By leveraging the perturbation diffusion technique, a proposed model is built to point out the worst scenarios instead of repeatedly simulating one-by-one scenarios to define the highest-risk ones. Empirical experiments are implemented on the IEEE systems to test and validate the proposed solution.
PAD-TRO: Projection-Augmented Diffusion for Direct Trajectory Optimization
Recently, diffusion models have gained popularity and attention in trajectory optimization due to their capability of modeling multi-modal probability distributions. However, addressing nonlinear equality constraints, i.e, dynamic feasi- bility, remains a great challenge in diffusion-based trajectory optimization. Recent diffusion-based trajectory optimization frameworks rely on a single-shooting style approach where the denoised control sequence is applied to forward propagate the dynamical system, which cannot explicitly enforce constraints on the states and frequently leads to sub-optimal solutions. In this work, we propose a novel direct trajectory optimization approach via model-based diffusion, which directly generates a sequence of states. To ensure dynamic feasibility, we propose a gradient-free projection mechanism that is incorporated into the reverse diffusion process. Our results show that, compared to a recent state-of-the-art baseline, our approach leads to zero dynamic feasibility error and approximately 4x higher success rate in a quadrotor waypoint navigation scenario involving dense static obstacles.
A Predictive and Sampled-Data Barrier Method for Safe and Efficient Quadrotor Control
This paper proposes a cascaded control framework for quadrotor trajectory tracking with formal safety guarantees. First, we design a controller consisting of an outer-loop position model predictive control (MPC) and an inner-loop nonlinear attitude control, enabling decoupling of position safety and yaw orientation. Second, since quadrotor safety constraints often involve high relative degree, we adopt high order control barrier functions (HOCBFs) to guarantee safety. To employ HOCBFs in the MPC formulation that has formal guarantees, we extend HOCBFs to sampled-data HOCBF (SdHOCBFs) by introducing compensation terms, ensuring safety over the entire sampling interval. We show that embedding SdHOCBFs as control-affine constraints into the MPC formulation guarantees both safety and optimality while preserving convexity for real-time implementations. Finally, comprehensive simulations are conducted to demonstrate the safety guarantee and high efficiency of the proposed method compared to existing methods.
comment: 6 pages, 3 figures
Optimization via a Control-Centric Framework
Optimization plays a central role in intelligent systems and cyber-physical technologies, where the speed and reliability of convergence directly impact performance. In control theory, optimization-centric methods are standard: controllers are designed by repeatedly solving optimization problems, as in linear quadratic regulation, $H_\infty$ control, and model predictive control. In contrast, this paper develops a control-centric framework for optimization itself, where algorithms are constructed directly from Lyapunov stability principles rather than being proposed first and analyzed afterward. A key element is the stationarity vector, which encodes first-order optimality conditions and enables Lyapunov-based convergence analysis. By pairing a Lyapunov function with a selectable decay law, we obtain continuous-time dynamics with guaranteed exponential, finite-time, fixed-time, or prescribed-time convergence. Within this framework, we introduce three feedback realizations of increasing restrictiveness: the Hessian-gradient, Newton, and gradient dynamics. Each shapes the decay of the stationarity vector to achieve the desired rate. These constructions unify unconstrained optimization, extend naturally to constrained problems via Lyapunov-consistent primal-dual dynamics, and broaden the results for minimax and generalized Nash equilibrium seeking problems beyond exponential stability. The framework provides systematic design tools for optimization algorithms in control and game-theoretic problems.
comment: This work has been submitted to the IEEE for possible publication. 12 pages, 3 figures
AD-NODE: Adaptive Dynamics Learning with Neural ODEs for Mobile Robots Control
Mobile robots, such as ground vehicles and quadrotors, are becoming increasingly important in various fields, from logistics to agriculture, where they automate processes in environments that are difficult to access for humans. However, to perform effectively in uncertain environments using model-based controllers, these systems require dynamics models capable of responding to environmental variations, especially when direct access to environmental information is limited. To enable such adaptivity and facilitate integration with model predictive control, we propose an adaptive dynamics model which bypasses the need for direct environmental knowledge by inferring operational environments from state-action history. The dynamics model is based on neural ordinary equations, and a two-phase training procedure is used to learn latent environment representations. We demonstrate the effectiveness of our approach through goal-reaching and path-tracking tasks on three robotic platforms of increasing complexity: a 2D differential wheeled robot with changing wheel contact conditions, a 3D quadrotor in variational wind fields, and the Sphero BOLT robot under two contact conditions for real-world deployment. Empirical results corroborate that our method can handle temporally and spatially varying environmental changes in both simulation and real-world systems.
Operational Risks in Grid Integration of Large Data Center Loads: Characteristics, Stability Assessments, and Sensitivity Studies
This paper investigates the dynamic interactions between large-scale data centers and the power grid, focusing on reliability challenges arising from sudden fluctuations in demand. With the rapid growth of AI-driven workloads, such fluctuations, along with fast ramp patterns, are expected to exacerbate stressed grid conditions and system instabilities. We consider a few open-source AI data center consumption profiles from the MIT supercloud datasets, along with generating a few experimental HPC job-distribution-based inference profiles. Subsequently, we develop analytical methodologies for real-time assessment of grid stability, focusing on both transient and small-signal stability assessments. Energy-flow-like metrics for nonlinear transient stability, formulated by computing localized data center bus kinetic-like flows and coupling interactions with neighboring buses over varying time windows, help provide operators a real-time assessments of the regional grid stress in the data center hubs. On the other hand, small-signal stability metrics, constructed from analytical state matrices under variable operating conditions during a fast ramping period, enable snapshot-based assessments of data center load fluctuations, provide enhanced observability into evolving grid conditions. By quantifying the stability impacts of large data center clusters, studies conducted in the modified IEEE benchmark $68-$bus model support improved operator situational awareness to capture risks in reliable integration of large data center loads.
comment: 11 pages, 7 figures
Safety-Critical Control with Bounded Inputs: A Closed-Form Solution for Backup Control Barrier Functions
Verifying the safety of controllers is critical for many applications, but is especially challenging for systems with bounded inputs. Backup control barrier functions (bCBFs) offer a structured approach to synthesizing safe controllers that are guaranteed to satisfy input bounds by leveraging the knowledge of a backup controller. While powerful, bCBFs require solving a high-dimensional quadratic program at run-time, which may be too costly for computationally-constrained systems such as aerospace vehicles. We propose an approach that optimally interpolates between a nominal controller and the backup controller, and we derive the solution to this optimization problem in closed form. We prove that this closed-form controller is guaranteed to be safe while obeying input bounds. We demonstrate the effectiveness of the approach on a double integrator and a nonlinear fixed-wing aircraft example.
comment: 8 pages, 6 figures. Code available at https://github.com/davidvwijk/OI-CBF
Digital Twins for Intelligent Intersections: A Literature Review
Intelligent intersections play a pivotal role in urban mobility, demanding innovative solutions such as digital twins to enhance safety and efficiency. This literature review investigates the integration and application of digital twins for intelligent intersections, a critical area within smart urban traffic systems. The review systematically categorizes existing research into five key thematic areas: (i) Digital Twin Architectures and Frameworks; (ii) Data Processing and Simulation Techniques; (iii) Artificial Intelligence and Machine Learning for Adaptive Traffic Control; (iv) Safety and Protection of Vulnerable Road Users; and (v) Scaling from Localized Intersections to Citywide Traffic Networks. Each theme is explored comprehensively, highlighting significant advancements, current challenges, and critical insights. The findings reveal that multi-layered digital twin architectures incorporating real-time data fusion and AI-driven decision-making enhances traffic efficiency and safety. Advanced simulation techniques combined with sophisticated AI/ML algorithms demonstrate notable improvements in real-time responsiveness and predictive accuracy for traffic management. Additionally, the integration of digital twins has shown substantial promise in safeguarding vulnerable road users through proactive and adaptive safety strategies. Despite these advancements, key challenges persist, including interoperability of diverse data sources, scalability of digital twins for extensive traffic networks, and managing uncertainty within dynamic urban environments. Addressing these challenges will be essential for the future development and deployment of intelligent, adaptive, and sustainable intersection management systems.
comment: 29 pages, 2 figures, under review at Transportation Research Interdisciplinary Perspectives journal
Koopman Control Factorization: Data-Driven Convex Controller Design for a Class of Nonlinear Systems
Although Koopman operators provide a global linearization for autonomous dynamical systems, nonautonomous systems are not globally linear in the inputs. State (or output) feedback controller design therefore remains nonconvex in typical formulations, even with approximations via bilinear control-affine terms. We address this gap by introducing the Koopman Control Factorization, a novel parameterization of control-affine dynamical systems combined with a feedback controller defined as a linear combination of nonlinear measurements. With this choice, the Koopman operator of the closed-loop system is a bilinear combination of the coefficients in two matrices: one representing the system, and the other the controller. We propose a set of sufficient conditions such that the factorization holds. Then, we present an algorithm that calculates the feedback matrix via semi-definite programming, producing a Lyapunov-stable closed-loop system with convex optimization. We evaluate the proposed controllers on two canonical examples of control-affine nonlinear systems (inverted pendulums), and show that our factorization and controller successfully stabilize both under properly-chosen basis functions. This manuscript introduces a broadly generalizable control synthesis method for stabilization of nonlinear systems that is quick-to-compute, verifiably stable, data-driven, and does not rely on approximations.
comment: 8 pages, 1 figure (with 4 subfigures). Submitted to the 2026 American Control Conference (ACC)
A System Level Approach to LQR Control of the Diffusion Equation
The continuous-time, infinite horizon LQR problem for the diffusion equation over the unit circle with fully distributed actuation is considered. It is well-known that the solution to this problem can be obtained from the solution to an operator-valued algebraic Riccati equation. Here, it is demonstrated that this solution can be equivalently obtained by solving an $H_2$ control problem through a closed-loop design procedure that is analogous to the "System Level Synthesis" methodology previously developed for systems over a discrete spatial domain and/or over a finite time horizon. The presented extension to the continuous spatial domain and continuous and infinite-horizon time setting admits analytical solutions that may complement computational approaches for discrete or finite-horizon settings. It is further illustrated that spatio-temporal constraints on the closed-loop responses can be incorporated into this new formulation in a convex manner.
comment: 9 pages, 2 figures, Submitted to IEEE American Control Conference 2026
Robust Sensor Placement for Poisson Arrivals with False Alarm Aware Spatiotemporal Sensing
This paper studies sensor placement when detection performance varies stochastically due to environmental factors over space and time and false alarms are present, but a filter is used to attenuate the effect. We introduce a unified model that couples detection and false alarms through an availability function, which captures how false alarms reduce effective sensing and filtering responses to the disturbance. Building on this model, we give a sufficient condition under which filtering improves detection. In addition, we derive a coverage-based lower bound on the void probability. Furthermore, we prove robustness guarantees showing that performance remains stable when detection probabilities are learned from limited data. We validate the approach with numerical studies using AIS vessel-traffic data and synthetic maritime scenarios. Together, these results provide theory and practical guidance for deploying sensors in dynamic, uncertain environments.
comment: Submitted to IEEE ACC
Pricing Short-Circuit Current via a Primal-Dual Formulation for Preserving Integrality Constraints SC
Synchronous Generators (SGs) currently provide important levels of Short-Circuit Current (SCC), a critical ancillary service that ensures line protections trip during short-circuit faults. Given the ongoing replacement of SGs by power-electronics-based generation, which have a hard limit for current injection, it has become relevant to optimize the procurement of SCC provided by remaining SGs. Pricing this service is however challenging due to the integrality constraints in Unit Commitment (UC). Existing methods, e.g., dispatchable pricing, restricted pricing and marginal unit pricing, attempt to address this issue but exhibit limitations in handling binary variables, resulting in SCC prices that either fail to cover the operating costs of units or lack interpretability. To overcome these pitfalls, we propose a primal-dual formulation of the SCC-constrained dispatch that preserves the binary nature of UC while effectively computing shadow prices of SCC services. Using a modified IEEE 30-bus system, a comparison is carried out between the proposed approach and the state-of-the-art pricing schemes, highlighting the advantages of the primal-dual method in preserving UC integrality for SCC pricing.
comment: 7 pages, submitted to PSCC conference 2026
Auctioning Future Services in Edge Networks with Moving Vehicles: N-Step Look-Ahead Contracts for Sustainable Resource Provision
Timely resource allocation in edge-assisted vehicular networks is essential for compute-intensive services such as autonomous driving and navigation. However, vehicle mobility leads to spatio-temporal unpredictability of resource demands, while real-time double auctions incur significant latency. To address these challenges, we propose a look-ahead contract-based auction framework that shifts decision-making from runtime to planning time. Our approach establishes N-step service contracts between edge servers (ESs) using demand forecasts and modified double auctions. The system operates in two stages: first, an LSTM-based prediction module forecasts multi-slot resource needs and determines ES roles (buyer or seller), after which a pre-double auction generates contracts specifying resource quantities, prices, and penalties. Second, these contracts are enforced in real time without rerunning auctions. The framework incorporates energy costs, transmission overhead, and contract breach risks into utility models, ensuring truthful, rational, and energy-efficient trading. Experiments on real-world (UTD19) and synthetic traces demonstrate that our method improves time efficiency, energy use, and social welfare compared with existing baselines.
comment: 17 pages, 8 figures, 1 table
Sampling-based Stochastic Data-driven Predictive Control under Data Uncertainty - Extended Version
We present a stochastic constrained output-feedback data-driven predictive control scheme for linear time-invariant systems subject to bounded additive disturbances. The approach uses data-driven predictors based on an extension of Willems' fundamental lemma and requires only a single persistently exciting input-output data trajectory. Compared to current state-of-the-art approaches, we do not rely on availability of exact disturbance data. Instead, we leverage a novel parameterization of the unknown disturbance data considering consistency with the measured data and the system class. This allows for deterministic approximation of the chance constraints in a sampling-based fashion. A robust constraint on the first predicted step enables recursive feasibility, closed-loop constraint satisfaction, and robust asymptotic stability in expectation under standard assumptions. A numerical example demonstrates the efficiency of the proposed control scheme.
On Fast Attitude Filtering Using Matrix Fisher Distributions with Stability Guarantee
This paper addresses two interrelated problems of the nonlinear filtering mechanism and fast attitude filtering with the matrix Fisher distribution (MFD) on the special orthogonal group. By analyzing the distribution evolution along Bayes' rule, we reveal two essential properties that enhance the performance of Bayesian attitude filters with MFDs, particularly in challenging conditions, from a theoretical viewpoint. Benefiting from the new understanding of the filtering mechanism associated with MFDs, two closed-form filters with MFDs is then proposed. These filters avoid the burdensome computations in previous MFD-based filters by introducing linearized error systems with right-invariant errors but retaining the two advantageous properties. Moreover, we leverage the two properties and closed-form filtering iteration to prove the almost-global exponential stability of the proposed filter with right-invariant error for the single-axis rotation, which, to our knowledge, is not achieved by existing directional statistics-based filters. Numerical simulations demonstrate that the proposed filters are significantly more accurate than the classic invariant Kalman filter. Besides, they are also as accurate as recent MFD-based Bayesian filters in challenging circumstances with large initial error and measurement uncertainty but consumes far less computation time (about 1/5 to 1/100 of previous MFD-based attitude filters).
Chaotic Noncoherent SWIPT in Multi-Functional RIS-Aided Systems
In this letter, we investigate the design of chaotic signal-based transmit waveforms in a multi-functional reconfigurable intelligent surface (MF-RIS)-aided set-up for simultaneous wireless information and power transfer. We propose a differential chaos shift keying-based MF-RIS-aided set-up, where the MF-RIS is partitioned into three non-overlapping surfaces. The elements of the first sub-surface perform energy harvesting (EH), which in turn, provide the required power to the other two sub-surfaces responsible for transmission and reflection of the incident signal. By considering a frequency selective scenario and a realistic EH model, we characterize the chaotic MF-RIS-aided system in terms of its EH performance and the associated bit error rate. Thereafter, we characterize the harvested energy-bit error rate trade-off and derive a lower bound on the number of elements required to operate in the EH mode. Accordingly, we propose novel transmit waveform designs to demonstrate the importance of the choice of appropriate system parameters in the context of achieving self-sustainability.
comment: To appear in IEEE Wireless Communications Letters
Digital-physical testbed for ship autonomy studies in the Marine Cybernetics Laboratory basin
The algorithms developed for Maritime Autonomous Surface Ships (MASS) are often challenging to test on actual vessels due to high operational costs and safety considerations. Simulations offer a cost-effective alternative and eliminate risks, but they may not accurately represent real-world dynamics for the given tasks. Utilizing small-scale model ships and robotic vessels in conjunction with a laboratory basin provides an accessible testing environment for the early stages of validation processes. However, designing and developing a model vessel for a single test can be costly and cumbersome, and researchers often lack access to such infrastructure. To address these challenges and enable streamlined testing, we have developed an in-house testbed that facilitates the development, testing, verification, and validation of MASS algorithms in a digital-physical laboratory. This infrastructure includes a set of small-scale model vessels, a simulation environment for each vessel, a comprehensive testbed environment, and a digital twin in Unity. With this, we aim to establish a full design and verification pipeline that starts with high-fidelity simulation models of each model vessel, to the model-scale testing in the laboratory basin, allowing possibilities for moving towards semi-fullscale validation with R/V milliAmpere1 and full-scale validation with R/V Gunnerus. In this work, we present our progress on the development of this testbed environment and its components, demonstrating its effectiveness in enabling ship guidance, navigation, and control (GNC), including autonomy.
A Hierarchical Control Architecture for Space Robots in On-Orbit Servicing Operations
In-Orbit Servicing and Active Debris Removal require advanced robotic capabilities for capturing and detumbling uncooperative targets. This work presents a hierarchical control framework for autonomous robotic capture of tumbling objects in space. A simulation environment is developed, incorporating sloshing dynamics of the chaser, a rarely studied effect in space robotics. The proposed controller combines an inner Lyapunov-based robust control loop for multi-body dynamics with an outer loop addressing an extended inverse kinematics problem. Simulation results show improved robustness and adaptability compared to existing control schemes.
Robust MPC for Large-scale Linear Systems
State-of-the-art approaches of Robust Model Predictive Control (MPC) are restricted to linear systems of relatively small scale, i.e., with no more than about 5 states. The main reason is the computational burden of determining a robust positively invariant (RPI) set, whose complexity suffers from the curse of dimensionality. The recently proposed approach of Deadbeat Robust Model Predictive Control (DRMPC) is the first that does not rely on an RPI set. Yet it comes with the full set of essential system theoretic guarantees. DRMPC is hence a viable option, in particular, for large-scale systems. This paper introduces a detailed design procedure for DRMPC. It is shown that the optimal control problem generated for DRMPC has exactly the same computational complexity as Nominal MPC. A numerical study validates its applicability to randomly generated large-scale linear systems of various dimensions.
A Physics-Informed Context-Aware Approach for Anomaly Detection in Tele-driving Operations Under False Data Injection Attacks
Tele-operated driving (ToD) systems are special types of cyber-physical systems (CPSs) where the operator remotely controls the steering, acceleration, and braking actions of the vehicle. Malicious actors may inject false data in communication channels to manipulate the tele-operators driving commands to cause harm. Hence, protection of this communication is necessary for the safe operation of the target vehicle. However, according to the National Institute of Standards and Technology (NIST) cybersecurity framework, protection merely is not enough and the detection of an attack is necessary. Moreover, UN R155 mandates that security incidents across vehicle fleets be detected and logged. Thus, cyber-physical threats of ToD are modeled with an attack-centric approach in this paper. Then, an attack model with false data injection (FDI) on steering control commands is created from real vehicle data. The risk of this attack model is assessed for a last-mile delivery (LMD) application. Finally, a physics-informed context-aware anomaly detection system (PCADS) is proposed to detect such false injection attacks, and preliminary experimental results are presented to validate the model.
comment: 16 pages, 13 figures, Accepted at IEEE Access
Code Generation and Conic Constraints for Model-Predictive Control on Microcontrollers with Conic-TinyMPC
Model-predictive control (MPC) is a powerful framework for controlling dynamic systems under constraints, but it remains challenging to deploy on resource-constrained platforms, especially for problems involving conic constraints. To address this, we extend recent work developing fast, structure-exploiting, cached ADMM solvers for embedded applications, to provide support for second-order cones, as well as C++ code generation from Python, MATLAB, and Julia for easy deployment. Microcontroller benchmarks show that our solver provides up to a two-order-of-magnitude speedup, ranging from 10.6x to 142.7x, over state-of-the-art embedded solvers on QP and SOCP problems, and enables us to fit order-of-magnitude larger problems in memory. We validate our solver's deployed performance through simulation and hardware experiments, including conically-constrained trajectory tracking on a 27g Crazyflie quadrotor. To get started with Conic-TinyMPC, visit our documentation, examples, and the open-source codebase at https://tinympc.org.
comment: First three authors contributed equally
On-Demand Growth of Semiconductor Heterostructures Guided by Physics-Informed Machine Learning
Developing tailored semiconductor heterostructures on demand represents a critical capability for addressing the escalating performance demands in electronic and optoelectronic devices. However, traditional fabrication methods remain constrained by simulation-based design and iterative trial-and-error optimization. Here, we introduce SemiEpi, a self-driving platform designed for molecular beam epitaxy (MBE) to perform multi-step semiconductor heterostructure growth through in-situ monitoring and on-the-fly feedback control. By integrating standard MBE reactors, physics-informed machine learning (ML) models, and parameter initialization, SemiEpi identifies optimal initial conditions and proposes experiments for heterostructure growth, eliminating the need for extensive expertise in MBE processes. As a proof of concept, we demonstrate the optimization of high-density InAs quantum dot (QD) growth with a target emission wavelength of 1240 nm, showcasing the power of SemiEpi. We achieve a QD density of 5 x 10^10 cm^-2, a 1.6-fold increase in photoluminescence (PL) intensity, and a reduced full width at half maximum (FWHM) of 29.13 meV, leveraging in-situ reflective high-energy electron diffraction monitoring with feedback control for adjusting growth temperatures. Taken together, our results highlight the potential of ML-guided systems to address challenges in multi-step heterostructure growth, facilitate the development of a hardware-independent framework, and enhance process repeatability and stability, even without exhaustive knowledge of growth parameters.
comment: 5 figures
DALNet: A Denoising Diffusion Probabilistic Model for High-Fidelity Day-Ahead Load Forecasting
Accurate probabilistic load forecasting is crucial for maintaining the safety and stability of power systems. However, the mainstream approach, multi-step prediction, is hindered by cumulative errors and forecasting lags, which limits its effectiveness in probabilistic day-ahead load forecasting (PDALF). To overcome these challenges, we introduce DALNet, a novel denoising diffusion model designed to generate load curves rather than relying on direct prediction. By shifting the focus to curve generation, DALNet captures the complex distribution of actual load time-series data under specific conditions with greater fidelity. To further enhance DALNet, we propose the temporal multi-scale attention block (TMSAB), a mechanism designed to integrate both positional and temporal information for improved forecasting precision. Furthermore, we utilize kernel density estimation (KDE) to reconstruct the distribution of generated load curves and employ Kullback-Leibler (KL) divergence to compare them with the actual data distribution. Experimental results demonstrate that DALNet excels in load forecasting accuracy and offers a novel perspective for other predictive tasks within power systems.
comment: 22pages
Can We Ignore Labels In Out of Distribution Detection?
Out-of-distribution (OOD) detection methods have recently become more prominent, serving as a core element in safety-critical autonomous systems. One major purpose of OOD detection is to reject invalid inputs that could lead to unpredictable errors and compromise safety. Due to the cost of labeled data, recent works have investigated the feasibility of self-supervised learning (SSL) OOD detection, unlabeled OOD detection, and zero shot OOD detection. In this work, we identify a set of conditions for a theoretical guarantee of failure in unlabeled OOD detection algorithms from an information-theoretic perspective. These conditions are present in all OOD tasks dealing with real-world data: I) we provide theoretical proof of unlabeled OOD detection failure when there exists zero mutual information between the learning objective and the in-distribution labels, a.k.a. 'label blindness', II) we define a new OOD task - Adjacent OOD detection - that tests for label blindness and accounts for a previously ignored safety gap in all OOD detection benchmarks, and III) we perform experiments demonstrating that existing unlabeled OOD methods fail under conditions suggested by our label blindness theory and analyze the implications for future research in unlabeled OOD methods.
Quickest Change Detection with Cost-Constrained Experiment Design
In the classical quickest change detection problem, an observer performs a single experiment to monitor a stochastic process. The goal in the classical problem is to detect a change in the statistical properties of the process, with the minimum possible delay, subject to a constraint on the rate of false alarms. This paper considers the case where, at each observation time, the decision-maker must choose between multiple experiments with varying information qualities and costs. The change can be detected using any of the experiments. The goal here is to detect the change with the minimum delay, subject to constraints on the rate of false alarms and the fraction of time each experiment is performed before the time of change. The constraint on the fraction of time can be used to control the overall cost of using the system of experiments. An algorithm called the two-experiment cumulative sum (2E-CUSUM) algorithm is first proposed to solve the problem when there are only two experiments. The algorithm for the case of multiple experiments, starting with three experiments, is then designed iteratively using the 2E-CUSUM algorithm. Two key ideas used in the design are the scaling of undershoots and the truncation of tests. The multiple-experiment algorithm can be designed to satisfy the constraints and can achieve the delay performance of the experiment with the highest quality within a constant. The important concept of data efficiency, where the observer has the choice of not performing any experiment, is explored as well.
Neural-Rendezvous: Provably Robust Guidance and Control to Encounter Interstellar Objects
Interstellar objects (ISOs) are likely representatives of primitive materials invaluable in understanding exoplanetary star systems. Due to their poorly constrained orbits with generally high inclinations and relative velocities, however, exploring ISOs with conventional human-in-the-loop approaches is significantly challenging. This paper presents Neural-Rendezvous -- a deep learning-based guidance and control framework for encountering fast-moving objects, including ISOs, robustly, accurately, and autonomously in real time. It uses pointwise minimum norm tracking control on top of a guidance policy modeled by a spectrally-normalized deep neural network, where its hyperparameters are tuned with a loss function directly penalizing the MPC state trajectory tracking error. We show that Neural-Rendezvous provides a high probability exponential bound on the expected spacecraft delivery error, the proof of which leverages stochastic incremental stability analysis. In particular, it is used to construct a non-negative function with a supermartingale property, explicitly accounting for the ISO state uncertainty and the local nature of nonlinear state estimation guarantees. In numerical simulations, Neural-Rendezvous is demonstrated to satisfy the expected error bound for 100 ISO candidates. This performance is also empirically validated using our spacecraft simulator and in high-conflict and distributed UAV swarm reconfiguration with up to 20 UAVs.
comment: Preprint Version, Accepted: October, 2024 (One-minute YouTube summary: https://youtu.be/q3e0LYS2IYQ, DOI: https://doi.org/10.2514/1.G007671)
Contraction Theory for Nonlinear Stability Analysis and Learning-based Control: A Tutorial Overview
Contraction theory is an analytical tool to study differential dynamics of a non-autonomous (i.e., time-varying) nonlinear system under a contraction metric defined with a uniformly positive definite matrix, the existence of which results in a necessary and sufficient characterization of incremental exponential stability of multiple solution trajectories with respect to each other. By using a squared differential length as a Lyapunov-like function, its nonlinear stability analysis boils down to finding a suitable contraction metric that satisfies a stability condition expressed as a linear matrix inequality, indicating that many parallels can be drawn between well-known linear systems theory and contraction theory for nonlinear systems. Furthermore, contraction theory takes advantage of a superior robustness property of exponential stability used in conjunction with the comparison lemma. This yields much-needed safety and stability guarantees for neural network-based control and estimation schemes, without resorting to a more involved method of using uniform asymptotic stability for input-to-state stability. Such distinctive features permit the systematic construction of a contraction metric via convex optimization, thereby obtaining an explicit exponential bound on the distance between a time-varying target trajectory and solution trajectories perturbed externally due to disturbances and learning errors. The objective of this paper is, therefore, to present a tutorial overview of contraction theory and its advantages in nonlinear stability analysis of deterministic and stochastic systems, with an emphasis on deriving formal robustness and stability guarantees for various learning-based and data-driven automatic control methods. In particular, we provide a detailed review of techniques for finding contraction metrics and associated control and estimation laws using deep neural networks.
comment: Annual Reviews in Control, Preprint Version, Accepted, Oct. 1st
Probabilistic Simulation of Aircraft Descent via a Hybrid Physics-Informed Machine Learning Approach
This paper presents a method for generating probabilistic descent trajectories in simulations of real-world airspace. A dataset of 116,066 trajectories harvested from Mode S radar returns in UK airspace was used to train and test the model. Thirteen aircraft types with varying performance characteristics were investigated. It was found that the error in the mean prediction of time to reach the bottom of descent for the proposed method was less than that of the the Base of Aircraft Data (BADA) model by a factor of 10. Furthermore, the method was capable of generating a range of trajectories that were similar to the held out test dataset when analysed in distribution. The proposed method is hybrid, with aircraft drag and calibrated airspeed functions generated probabilistically to parameterise the BADA equations, ensuring the physical plausibility of generated trajectories.
Systems and Control (EESS)
PowerPlots: An Open Source Power Grid Visualization and Data Analysis Framework for Academic Research
Data visualization is important for developing an understanding of a complex system. PowerPlots.jl is a data visualization tool for power grids, one of the most complex systems in the world. The design of PowerPlots.jl is intended to facilitate exploration of power grid data while performing research and to facilitate communication of research findings to an audience. Several tools created to support this software also facilitate analysis of power grid data by transforming the data into graph topology or data-frame data formats that are more compatible for some applications. The high level of flexibility in PowerPlots.jl enables researchers who are developing and analyzing methods for solving novel power grid problems to better understand and communicate the complexities of their research.
Multi-Loop Design of Virtual Synchronous Machine Control for DFIG-Based Wind Farms
The displacement of synchronous generators by converter-interfaced renewable energy sources obliges wind farms to provide inertia, damping, and voltage support, above all in increasingly weak grid conditions. This paper presents a co-ordinated frequency-domain methodology for tuning all control layers of doubly-fed induction generators (DFIGs) within a wind farm operated as a Virtual Synchronous Machine (VSM). Starting from a full small-signal linearisation that preserves loop-to-loop and machine-to-machine couplings, the procedure reshapes every local open loop to explicit phase-margin targets through a single, prioritised iteration. The resulting controllers provide a step response and stability margins close to those programmed at the design stage, in spite of the cross coupling between control loops. Since controller synthesis relies exclusively on classical loop-shaping tools available in commercial simulation suites, it is readily applicable to industrial-scale projects.
comment: Submitted for evaluation to Journal of Modern Power Systems and Clean Energy
Optimal participation of energy communities in electricity markets under uncertainty. A multi-stage stochastic programming approach
We propose a multi-stage stochastic programming model for the optimal participation of energy communities in electricity markets. The multi-stage aspect captures the different times at which variable renewable generation and electricity prices are observed. This results in large-scale optimization problem instances containing large scenario trees with 34 stages, to which scenario reduction techniques are applied. Case studies with real data are discussed to analyse proposed regulatory frameworks in Europe. The added value of considering stochasticity is also analysed.
Robust Cislunar Navigation via LFT-Based $\mathcal{H}_\infty$ Filtering with Bearing-Only Measurements
This paper develops a robust estimation framework for cislunar navigation that embeds the Circular Restricted Three-Body Problem (CR3BP) dynamics and bearing-only optical measurements within a Linear Fractional Transformation (LFT) representation. A full-order $\mathcal{H}_\infty$ observer is synthesized with explicit $\mathcal{L}_2$ performance bounds. The formulation yields a nonlinear estimator that operates directly on the governing equations and avoids reliance on local linearizations. Dominant nonlinearities are expressed as structured real uncertainties, while measurement fidelity is represented through range-dependent weighting with Earth-Moon distances reconstructed from line-of-sight geometry. The sensing architecture assumes passive star-tracker-class optical instruments, eliminating the need for time-of-flight ranging or precision clocks. Simulations demonstrate bounded estimation errors and smooth position tracking over multiple orbital periods, with the largest deviations observed in the out-of-plane states, consistent with the stiffness of the vertical dynamics and the limitations of angle-only observability. Application to a Near Rectilinear Halo Orbit (NRHO) illustrates that the framework can achieve robust onboard navigation with bounded estimation errors with flight-representative sensors.
Steady-State Spread Bounds for Graph Diffusion via Laplacian Regularisation
We study how far a diffusion process on a graph can drift from a designed starting pattern when that pattern is produced using Laplacian regularisation. Under standard stability conditions for undirected, entrywise nonnegative graphs, we give a closed-form, instance-specific upper bound on the steady-state spread, measured as the relative change between the final and initial profiles. The bound separates two effects: (i) an irreducible term determined by the graph's maximum node degree, and (ii) a design-controlled term that shrinks as the regularisation strength increases (following an inverse square-root law). This leads to a simple design rule: given any target limit on spread, one can choose a sufficient regularisation strength in closed form. Although one motivating application is array beamforming, where the initial pattern is the squared magnitude of the beamformer weights, the result applies to any scenario that first enforces Laplacian smoothness and then evolves by linear diffusion on a graph. Overall, the guarantee is non-asymptotic, easy to compute, and certifies how much steady-state deviation can occur.
A Fixed Point Framework for the Existence of EFX Allocations
We consider the problem of the existence of an envy-free allocation up to any good (EFX) for linear valuations and establish new results by connecting this problem to a fixed point framework. Specifically, we first use randomized rounding to extend the discrete EFX constraints into a continuous space and show that an EFX allocation exists if and only if the optimal value of the continuously extended objective function is nonpositive. In particular, we demonstrate that this optimization problem can be formulated as an unconstrained difference of convex (DC) program, which can be further simplified to the minimization of a piecewise linear concave function over a polytope. Leveraging this connection, we show that the proposed DC program has a nonpositive optimal objective value if and only if a well-defined continuous vector map admits a fixed point. Crucially, we prove that the reformulated fixed point problem satisfies all the conditions of Brouwer's fixed point theorem, except that self-containedness is violated by an arbitrarily small positive constant. To address this, we propose a slightly perturbed continuous map that always admits a fixed point. This fixed point serves as a proxy for the fixed point (if it exists) of the original map, and hence for an EFX allocation through an appropriate transformation. Our results offer a new approach to establishing the existence of EFX allocations through fixed point theorems. Moreover, the equivalence with DC programming enables a more efficient and systematic method for computing such allocations (if one exists) using tools from nonlinear optimization. Our findings bridge the discrete problem of finding an EFX allocation with two continuous frameworks: solving an unconstrained DC program and identifying a fixed point of a continuous vector map.
Benchmarking M-LTSF: Frequency and Noise-Based Evaluation of Multivariate Long Time Series Forecasting Models
Understanding the robustness of deep learning models for multivariate long-term time series forecasting (M-LTSF) remains challenging, as evaluations typically rely on real-world datasets with unknown noise properties. We propose a simulation-based evaluation framework that generates parameterizable synthetic datasets, where each dataset instance corresponds to a different configuration of signal components, noise types, signal-to-noise ratios, and frequency characteristics. These configurable components aim to model real-world multivariate time series data without the ambiguity of unknown noise. This framework enables fine-grained, systematic evaluation of M-LTSF models under controlled and diverse scenarios. We benchmark four representative architectures S-Mamba (state-space), iTransformer (transformer-based), R-Linear (linear), and Autoformer (decomposition-based). Our analysis reveals that all models degrade severely when lookback windows cannot capture complete periods of seasonal patters in the data. S-Mamba and Autoformer perform best on sawtooth patterns, while R-Linear and iTransformer favor sinusoidal signals. White and Brownian noise universally degrade performance with lower signal-to-noise ratio while S-Mamba shows specific trend-noise and iTransformer shows seasonal-noise vulnerability. Further spectral analysis shows that S-Mamba and iTransformer achieve superior frequency reconstruction. This controlled approach, based on our synthetic and principle-driven testbed, offers deeper insights into model-specific strengths and limitations through the aggregation of MSE scores and provides concrete guidance for model selection based on signal characteristics and noise conditions.
comment: Number of pages: 13 Number of figures: 16 Number of Tables: 1 Submitted to: IEEE Transactions on Signal Processing
Rapid stabilization for a wave equation with boundary disturbance
In this paper, we study the rapid stabilization of an unstable wave equation, in which an unknown disturbance is located at the boundary condition. We address two different boundary conditions: Dirichlet- Dirichlet and Dirichlet-Neumann. In both cases, we design a feedback law, located at the same place as the unknown disturbance, that forces the exponential decay of the energy for any desired decay rate while suppressing the effects of the unknown disturbance. For the feedback design, we employ the backstepping method, Lyapunov techniques and the sign multivalued operator. The well-posedness of the closed-loop system, which is a differential inclusion, is shown with the maximal monotone operator theory.
Model Predictive Control-Guided Reinforcement Learning for Implicit Balancing
In Europe, profit-seeking balance responsible parties can deviate in real time from their day-ahead nominations to assist transmission system operators in maintaining the supply-demand balance. Model predictive control (MPC) strategies to exploit these implicit balancing strategies capture arbitrage opportunities, but fail to accurately capture the price-formation process in the European imbalance markets and face high computational costs. Model-free reinforcement learning (RL) methods are fast to execute, but require data-intensive training and usually rely on real-time and historical data for decision-making. This paper proposes an MPC-guided RL method that combines the complementary strengths of both MPC and RL. The proposed method can effectively incorporate forecasts into the decision-making process (as in MPC), while maintaining the fast inference capability of RL. The performance of the proposed method is evaluated on the implicit balancing battery control problem using Belgian balancing data from 2023. First, we analyze the performance of the standalone state-of-the-art RL and MPC methods from various angles, to highlight their individual strengths and limitations. Next, we show an arbitrage profit benefit of the proposed MPC-guided RL method of 16.15% and 54.36%, compared to standalone RL and MPC.
An Active Fault-Tolerant Online Control Allocation Scheme for a Dual-System UAV in Transition Flight
A novel active fault-tolerant control (AFTC) scheme for a dual-system vertical takeoff and landing (VTOL) unmanned aerial vehicle (UAV) during transition flight is proposed in this paper. The AFTC scheme is composed of a baseline control law and an online control reallocation module. First, the structured $H_{\infty}$ baseline control law is able to guarantee the stability of closed-loop systems without being reconfigured under simultaneous actuator fault conditions. Second, compared to the existing mainstream method of sliding mode control that is a discontinuous control strategy, the AFTC scheme can effectively avoid control chattering problem by adopting the structured $H_{\infty}$ baseline control law. Third, an online control allocation (CA) module is implemented to carry out a unified CA for all the available actuators. When actuator faults/failures occur, the CA matrix is updated according to fault information and real-time airspeed, which is able to redistribute the virtual control signals to the remaining healthy actuators, avoiding significant performance degradation. Based on the developed AFTC scheme, symmetric and non-symmetric actuator fault scenarios are simulated on a nonlinear six-degree-of-freedom simulator, where the cases of merely structured $H_{\infty}$ control and structured $H_{\infty}$ based AFTC are compared and analyzed. The results show that the proposed structured $H_{\infty}$ based AFTC system is capable of handling more complicated fault scenarios and model uncertainties with no need to reconfigure the baseline control law. The proposed AFTC scheme significantly improves the safety and reliability of the transition flight of dual-system VTOL UAVs.
comment: 40 pages
Power Reserve Capacity from Virtual Power Plants with Reliability and Cost Guarantees
The growing penetration of renewable energy sources is expected to drive higher demand for power reserve ancillary services (AS). One solution is to increase the supply by integrating distributed energy resources (DERs) into the AS market through virtual power plants (VPPs). Several methods have been developed to assess the potential of VPPs to provide services. However, the existing approaches fail to account for AS products' requirements (reliability and technical specifications) and to provide accurate cost estimations. Here, we propose a new method to assess VPPs' potential to deliver power reserve capacity products under forecasting uncertainty. First, the maximum feasible reserve quantity is determined using a novel formulation of subset simulation for efficient uncertainty quantification. Second, the supply curve is characterized by considering explicit and opportunity costs. The method is applied to a VPP based on a representative Swiss low-voltage network with a diversified DER portfolio. We find that VPPs can reliably offer reserve products and that opportunity costs drive product pricing. Additionally, we show that the product's requirements strongly impact the reserve capacity provision capability. This approach aims to support VPP managers in developing market strategies and policymakers in designing DER-focused AS products.
comment: Submitted to IEEE Transactions on Power Systems
Robust stability of event-triggered nonlinear moving horizon estimation
In this work, we propose an event-triggered moving horizon estimation (ET-MHE) scheme for the remote state estimation of general nonlinear systems. In the presented method, whenever an event is triggered, a single measurement is transmitted and the nonlinear MHE optimization problem is subsequently solved. If no event is triggered, the current state estimate is updated using an open-loop prediction based on the system dynamics. Moreover, we introduce a novel event-triggering rule under which we demonstrate robust global exponential stability of the ET-MHE scheme, assuming a suitable detectability condition is met. In addition, we show that with the adoption of a varying horizon length, a tighter bound on the estimation error can be achieved. Finally, we validate the effectiveness of the proposed method through two illustrative examples.
Efficient Probabilistic Planning with Maximum-Coverage Distributionally Robust Backward Reachable Trees
This paper presents a new multi-query motion planning algorithm for linear Gaussian systems with the goal of reaching a Euclidean ball with high probability. We develop a new formulation for ball-shaped ambiguity sets of Gaussian distributions and leverage it to develop a distributionally robust belief roadmap construction algorithm. This algorithm synthe- sizes robust controllers which are certified to be safe for maximal size ball-shaped ambiguity sets of Gaussian distributions. Our algorithm achieves better coverage than the maximal coverage algorithm for planning over Gaussian distributions [1], and we identify mild conditions under which our algorithm achieves strictly better coverage. For the special case of no process noise or state constraints, we formally prove that our algorithm achieves maximal coverage. In addition, we present a second multi-query motion planning algorithm for linear Gaussian systems with the goal of reaching a region parameterized by the Minkowski sum of an ellipsoid and a Euclidean ball with high probability. This algorithm plans over ellipsoidal sets of maximal size ball-shaped ambiguity sets of Gaussian distributions, and provably achieves equal or better coverage than the best-known algorithm for planning over ellipsoidal ambiguity sets of Gaussian distributions [2]. We demonstrate the efficacy of both methods in a wide range of conditions via extensive simulation experiments.
MPC strategies for density profile control with pellet fueling in nuclear fusion tokamaks under uncertainty
Control of the density profile based on pellet fueling for the ITER nuclear fusion tokamak involves a multi-rate nonlinear system with safety-critical constraints, input delays, and discrete actuators with parametric uncertainty. To address this challenging problem, we propose a multi-stage MPC (msMPC) approach to handle uncertainty in the presence of mixed-integer inputs. While the scenario tree of msMPC accounts for uncertainty, it also adds complexity to an already computationally intensive mixed-integer MPC (MI-MPC) problem. To achieve real-time density profile controller with discrete pellets and uncertainty handling, we systematically reduce the problem complexity by (1) reducing the identified prediction model size through dynamic mode decomposition with control, (2) applying principal component analysis to reduce the number of scenarios needed to capture the parametric uncertainty in msMPC, and (3) utilizing the penalty term homotopy for MPC (PTH-MPC) algorithm to reduce the computational burden caused by the presence of mixed-integer inputs. We compare the performance and safety of the msMPC strategy against a nominal MI-MPC in plant simulations, demonstrating the first predictive density control strategy with uncertainty handling, viable for real-time pellet fueling in ITER.
comment: IEEE CDC 2025
Learning a Shape-adaptive Assist-as-needed Rehabilitation Policy from Therapist-informed Input
Therapist-in-the-loop robotic rehabilitation has shown great promise in enhancing rehabilitation outcomes by integrating the strengths of therapists and robotic systems. However, its broader adoption remains limited due to insufficient safe interaction and limited adaptation capability. This article proposes a novel telerobotics-mediated framework that enables therapists to intuitively and safely deliver assist-as-needed~(AAN) therapy based on two primary contributions. First, our framework encodes the therapist-informed corrective force into via-points in a latent space, allowing the therapist to provide only minimal assistance while encouraging patient maintaining own motion preferences. Second, a shape-adaptive ANN rehabilitation policy is learned to partially and progressively deform the reference trajectory for movement therapy based on encoded patient motion preferences and therapist-informed via-points. The effectiveness of the proposed shape-adaptive AAN strategy was validated on a telerobotic rehabilitation system using two representative tasks. The results demonstrate its practicality for remote AAN therapy and its superiority over two state-of-the-art methods in reducing corrective force and improving movement smoothness.
Modeling and Managing Temporal Obligations in GUCON Using SPARQL-star and RDF-star
In the digital age, data frequently crosses organizational and jurisdictional boundaries, making effective governance essential. Usage control policies have emerged as a key paradigm for regulating data usage, safeguarding privacy, protecting intellectual property, and ensuring compliance with regulations. A central mechanism for usage control is the handling of obligations, which arise as a side effect of using and sharing data. Effective monitoring of obligations requires capturing usage traces and accounting for temporal aspects such as start times and deadlines, as obligations may evolve over times into different states, such as fulfilled, violated, or expired. While several solutions have been proposed for obligation monitoring, they often lack formal semantics or provide limited support for reasoning over obligation states. To address these limitations, we extend GUCON, a policy framework grounded in the formal semantics of SPAQRL graph patterns, to explicitly model the temporal aspects of an obligation. This extension enables the expressing of temporal obligations and supports continuous monitoring of their evolving states based on usage traces stored in temporal knowledge graphs. We demonstrate how this extended model can be represented using RDF-star and SPARQL-star and propose an Obligation State Manager that monitors obligation states and assess their compliance with respect to usage traces. Finally, we evaluate both the extended model and its prototype implementation.
On Prediction-Based Properties of Discrete-Event Systems: Notions, Applications and Supervisor Synthesis
In this work, we investigate the problem of synthesizing property-enforcing supervisors for partially-observed discrete-event systems (DES). Unlike most existing approaches, where the enforced property depends solely on the executed behavior of the system, here we consider a more challenging scenario in which the property relies on predicted future behaviors that have not yet occurred. This problem arises naturally in applications involving future information, such as active prediction or intention protection. To formalize the problem, we introduce the notion of prediction-based properties, a new class of observational properties tied to the system's future information. We demonstrate that this notion is very generic and can model various practical properties, including predictability in fault prognosis and pre-opacity in intention security. We then present an effective approach for synthesizing supervisors that enforce prediction-based properties. Our method relies on a novel information structure that addresses the fundamental challenge arising from the dependency between current predictions and the control policy. The key idea is to first borrow information from future instants and then ensure information consistency. This reduces the supervisor synthesis problem to a safety game in the information space. We prove that the proposed algorithm is both sound and complete, and the resulting supervisor is maximally permissive.
Design Process of a Self Adaptive Smart Serious Games Ecosystem
This paper outlines the design vision and planned evolution of Blexer v3, a modular and AI-driven rehabilitation ecosystem based on serious games. Building on insights from previous versions of the system, we propose a new architecture that aims to integrate multimodal sensing, real-time reasoning, and intelligent control. The envisioned system will include distinct modules for data collection, user state inference, and gameplay adaptation. Key features such as dynamic difficulty adjustment (DDA) and procedural content generation (PCG) are also considered to support personalized interventions. We present the complete conceptual framework of Blexer v3, which defines the modular structure and data flow of the system. This serves as the foundation for the next phase: the development of a functional prototype and its integration into clinical rehabilitation scenarios.
Data-Driven Adaptive PID Control Based on Physics-Informed Neural Networks
This article proposes a data-driven PID controller design based on the principle of adaptive gain optimization, leveraging Physics-Informed Neural Networks (PINNs) generated for predictive modeling purposes. The proposed control design method utilizes gradients of the PID gain optimization, achieved through the automatic differentiation of PINNs, to apply model predictive control using a cost function based on tracking error and control inputs. By optimizing PINNs-based PID gains, the method achieves adaptive gain tuning that ensures stability while accounting for system nonlinearities. The proposed method features a systematic framework for integrating PINNs-based models of dynamical control systems into closed-loop control systems, enabling direct application to PID control design. A series of numerical experiments is conducted to demonstrate the effectiveness of the proposed method from the control perspectives based on both time and frequency domains.
comment: This work has been submitted to the IEEE Transactions on Control Systems Technology for possible publication
On properties of hydraulic equilibria in district heating networks
District heating networks are an integral part of the energy system in many countries. In future smart energy systems, they are expected to enhance energy flexibility and support the integration of renewable and waste energy sources. An important aspect of these networks is the control of flow rates, which dictates the heat delivered to consumers. This paper concerns the properties of flow rates in tree-structured district heating networks. We show that under mild assumptions of monotonicity in the hydraulic network components, statements regarding the stationary flow rate distribution can be made. In particular, when all consumers in a network incrementally open their valves, an increase in total flow rate throughput is guaranteed, while if one consumer does not open their valve when others do, they will receive a reduced flow rate. These properties are illustrated numerically on a small 2-consumer network as well as on a larger 22-consumer network. Previous works have shown that these properties allow the design and use of efficient control strategies for optimal heat distribution.
comment: Accepted for presentation at the 64th IEEE Conference on Decision and Control (CDC), 2025. 6 pages, 5 figures
Velocity-Form Data-Enabled Predictive Control of Soft Robots under Unknown External Payloads
Data-driven control methods such as data-enabled predictive control (DeePC) have shown strong potential in efficient control of soft robots without explicit parametric models. However, in object manipulation tasks, unknown external payloads and disturbances can significantly alter the system dynamics and behavior, leading to offset error and degraded control performance. In this paper, we present a novel velocity-form DeePC framework that achieves robust and optimal control of soft robots under unknown payloads. The proposed framework leverages input-output data in an incremental representation to mitigate performance degradation induced by unknown payloads, eliminating the need for weighted datasets or disturbance estimators. We validate the method experimentally on a planar soft robot and demonstrate its superior performance compared to standard DeePC in scenarios involving unknown payloads.
A Diffusion-based Generative Machine Learning Paradigm for Contingency Screening
Contingency screening is a crucial part of electric power systems all the time. Power systems frequently encounter multiple challenging operational dilemmas that could lead to the instability of power systems. Contingency analysis is effort-consuming by utilizing traditional numerical analysis methods. It is commonly addressed by generating a whopping number of possible contingencies or manipulating network parameters to determine the worst scenarios. This paper proposes a novel approach that diverts the nature of contingency analysis from pre-defined scenario screening to proactive-unsupervised screening. The potentially risky scenarios of power systems are generated from learning how the previous ones occurred. In other words, the internal perturbation that initiates contingencies is learned prior to being self-replicated for rendering the worst scenarios. By leveraging the perturbation diffusion technique, a proposed model is built to point out the worst scenarios instead of repeatedly simulating one-by-one scenarios to define the highest-risk ones. Empirical experiments are implemented on the IEEE systems to test and validate the proposed solution.
PAD-TRO: Projection-Augmented Diffusion for Direct Trajectory Optimization
Recently, diffusion models have gained popularity and attention in trajectory optimization due to their capability of modeling multi-modal probability distributions. However, addressing nonlinear equality constraints, i.e, dynamic feasi- bility, remains a great challenge in diffusion-based trajectory optimization. Recent diffusion-based trajectory optimization frameworks rely on a single-shooting style approach where the denoised control sequence is applied to forward propagate the dynamical system, which cannot explicitly enforce constraints on the states and frequently leads to sub-optimal solutions. In this work, we propose a novel direct trajectory optimization approach via model-based diffusion, which directly generates a sequence of states. To ensure dynamic feasibility, we propose a gradient-free projection mechanism that is incorporated into the reverse diffusion process. Our results show that, compared to a recent state-of-the-art baseline, our approach leads to zero dynamic feasibility error and approximately 4x higher success rate in a quadrotor waypoint navigation scenario involving dense static obstacles.
A Predictive and Sampled-Data Barrier Method for Safe and Efficient Quadrotor Control
This paper proposes a cascaded control framework for quadrotor trajectory tracking with formal safety guarantees. First, we design a controller consisting of an outer-loop position model predictive control (MPC) and an inner-loop nonlinear attitude control, enabling decoupling of position safety and yaw orientation. Second, since quadrotor safety constraints often involve high relative degree, we adopt high order control barrier functions (HOCBFs) to guarantee safety. To employ HOCBFs in the MPC formulation that has formal guarantees, we extend HOCBFs to sampled-data HOCBF (SdHOCBFs) by introducing compensation terms, ensuring safety over the entire sampling interval. We show that embedding SdHOCBFs as control-affine constraints into the MPC formulation guarantees both safety and optimality while preserving convexity for real-time implementations. Finally, comprehensive simulations are conducted to demonstrate the safety guarantee and high efficiency of the proposed method compared to existing methods.
comment: 6 pages, 3 figures
Optimization via a Control-Centric Framework
Optimization plays a central role in intelligent systems and cyber-physical technologies, where the speed and reliability of convergence directly impact performance. In control theory, optimization-centric methods are standard: controllers are designed by repeatedly solving optimization problems, as in linear quadratic regulation, $H_\infty$ control, and model predictive control. In contrast, this paper develops a control-centric framework for optimization itself, where algorithms are constructed directly from Lyapunov stability principles rather than being proposed first and analyzed afterward. A key element is the stationarity vector, which encodes first-order optimality conditions and enables Lyapunov-based convergence analysis. By pairing a Lyapunov function with a selectable decay law, we obtain continuous-time dynamics with guaranteed exponential, finite-time, fixed-time, or prescribed-time convergence. Within this framework, we introduce three feedback realizations of increasing restrictiveness: the Hessian-gradient, Newton, and gradient dynamics. Each shapes the decay of the stationarity vector to achieve the desired rate. These constructions unify unconstrained optimization, extend naturally to constrained problems via Lyapunov-consistent primal-dual dynamics, and broaden the results for minimax and generalized Nash equilibrium seeking problems beyond exponential stability. The framework provides systematic design tools for optimization algorithms in control and game-theoretic problems.
comment: This work has been submitted to the IEEE for possible publication. 12 pages, 3 figures
AD-NODE: Adaptive Dynamics Learning with Neural ODEs for Mobile Robots Control
Mobile robots, such as ground vehicles and quadrotors, are becoming increasingly important in various fields, from logistics to agriculture, where they automate processes in environments that are difficult to access for humans. However, to perform effectively in uncertain environments using model-based controllers, these systems require dynamics models capable of responding to environmental variations, especially when direct access to environmental information is limited. To enable such adaptivity and facilitate integration with model predictive control, we propose an adaptive dynamics model which bypasses the need for direct environmental knowledge by inferring operational environments from state-action history. The dynamics model is based on neural ordinary equations, and a two-phase training procedure is used to learn latent environment representations. We demonstrate the effectiveness of our approach through goal-reaching and path-tracking tasks on three robotic platforms of increasing complexity: a 2D differential wheeled robot with changing wheel contact conditions, a 3D quadrotor in variational wind fields, and the Sphero BOLT robot under two contact conditions for real-world deployment. Empirical results corroborate that our method can handle temporally and spatially varying environmental changes in both simulation and real-world systems.
Operational Risks in Grid Integration of Large Data Center Loads: Characteristics, Stability Assessments, and Sensitivity Studies
This paper investigates the dynamic interactions between large-scale data centers and the power grid, focusing on reliability challenges arising from sudden fluctuations in demand. With the rapid growth of AI-driven workloads, such fluctuations, along with fast ramp patterns, are expected to exacerbate stressed grid conditions and system instabilities. We consider a few open-source AI data center consumption profiles from the MIT supercloud datasets, along with generating a few experimental HPC job-distribution-based inference profiles. Subsequently, we develop analytical methodologies for real-time assessment of grid stability, focusing on both transient and small-signal stability assessments. Energy-flow-like metrics for nonlinear transient stability, formulated by computing localized data center bus kinetic-like flows and coupling interactions with neighboring buses over varying time windows, help provide operators a real-time assessments of the regional grid stress in the data center hubs. On the other hand, small-signal stability metrics, constructed from analytical state matrices under variable operating conditions during a fast ramping period, enable snapshot-based assessments of data center load fluctuations, provide enhanced observability into evolving grid conditions. By quantifying the stability impacts of large data center clusters, studies conducted in the modified IEEE benchmark $68-$bus model support improved operator situational awareness to capture risks in reliable integration of large data center loads.
comment: 11 pages, 7 figures
Safety-Critical Control with Bounded Inputs: A Closed-Form Solution for Backup Control Barrier Functions
Verifying the safety of controllers is critical for many applications, but is especially challenging for systems with bounded inputs. Backup control barrier functions (bCBFs) offer a structured approach to synthesizing safe controllers that are guaranteed to satisfy input bounds by leveraging the knowledge of a backup controller. While powerful, bCBFs require solving a high-dimensional quadratic program at run-time, which may be too costly for computationally-constrained systems such as aerospace vehicles. We propose an approach that optimally interpolates between a nominal controller and the backup controller, and we derive the solution to this optimization problem in closed form. We prove that this closed-form controller is guaranteed to be safe while obeying input bounds. We demonstrate the effectiveness of the approach on a double integrator and a nonlinear fixed-wing aircraft example.
comment: 8 pages, 6 figures. Code available at https://github.com/davidvwijk/OI-CBF
Digital Twins for Intelligent Intersections: A Literature Review
Intelligent intersections play a pivotal role in urban mobility, demanding innovative solutions such as digital twins to enhance safety and efficiency. This literature review investigates the integration and application of digital twins for intelligent intersections, a critical area within smart urban traffic systems. The review systematically categorizes existing research into five key thematic areas: (i) Digital Twin Architectures and Frameworks; (ii) Data Processing and Simulation Techniques; (iii) Artificial Intelligence and Machine Learning for Adaptive Traffic Control; (iv) Safety and Protection of Vulnerable Road Users; and (v) Scaling from Localized Intersections to Citywide Traffic Networks. Each theme is explored comprehensively, highlighting significant advancements, current challenges, and critical insights. The findings reveal that multi-layered digital twin architectures incorporating real-time data fusion and AI-driven decision-making enhances traffic efficiency and safety. Advanced simulation techniques combined with sophisticated AI/ML algorithms demonstrate notable improvements in real-time responsiveness and predictive accuracy for traffic management. Additionally, the integration of digital twins has shown substantial promise in safeguarding vulnerable road users through proactive and adaptive safety strategies. Despite these advancements, key challenges persist, including interoperability of diverse data sources, scalability of digital twins for extensive traffic networks, and managing uncertainty within dynamic urban environments. Addressing these challenges will be essential for the future development and deployment of intelligent, adaptive, and sustainable intersection management systems.
comment: 29 pages, 2 figures, under review at Transportation Research Interdisciplinary Perspectives journal
Koopman Control Factorization: Data-Driven Convex Controller Design for a Class of Nonlinear Systems
Although Koopman operators provide a global linearization for autonomous dynamical systems, nonautonomous systems are not globally linear in the inputs. State (or output) feedback controller design therefore remains nonconvex in typical formulations, even with approximations via bilinear control-affine terms. We address this gap by introducing the Koopman Control Factorization, a novel parameterization of control-affine dynamical systems combined with a feedback controller defined as a linear combination of nonlinear measurements. With this choice, the Koopman operator of the closed-loop system is a bilinear combination of the coefficients in two matrices: one representing the system, and the other the controller. We propose a set of sufficient conditions such that the factorization holds. Then, we present an algorithm that calculates the feedback matrix via semi-definite programming, producing a Lyapunov-stable closed-loop system with convex optimization. We evaluate the proposed controllers on two canonical examples of control-affine nonlinear systems (inverted pendulums), and show that our factorization and controller successfully stabilize both under properly-chosen basis functions. This manuscript introduces a broadly generalizable control synthesis method for stabilization of nonlinear systems that is quick-to-compute, verifiably stable, data-driven, and does not rely on approximations.
comment: 8 pages, 1 figure (with 4 subfigures). Submitted to the 2026 American Control Conference (ACC)
A System Level Approach to LQR Control of the Diffusion Equation
The continuous-time, infinite horizon LQR problem for the diffusion equation over the unit circle with fully distributed actuation is considered. It is well-known that the solution to this problem can be obtained from the solution to an operator-valued algebraic Riccati equation. Here, it is demonstrated that this solution can be equivalently obtained by solving an $H_2$ control problem through a closed-loop design procedure that is analogous to the "System Level Synthesis" methodology previously developed for systems over a discrete spatial domain and/or over a finite time horizon. The presented extension to the continuous spatial domain and continuous and infinite-horizon time setting admits analytical solutions that may complement computational approaches for discrete or finite-horizon settings. It is further illustrated that spatio-temporal constraints on the closed-loop responses can be incorporated into this new formulation in a convex manner.
comment: 9 pages, 2 figures, Submitted to IEEE American Control Conference 2026
Robust Sensor Placement for Poisson Arrivals with False Alarm Aware Spatiotemporal Sensing
This paper studies sensor placement when detection performance varies stochastically due to environmental factors over space and time and false alarms are present, but a filter is used to attenuate the effect. We introduce a unified model that couples detection and false alarms through an availability function, which captures how false alarms reduce effective sensing and filtering responses to the disturbance. Building on this model, we give a sufficient condition under which filtering improves detection. In addition, we derive a coverage-based lower bound on the void probability. Furthermore, we prove robustness guarantees showing that performance remains stable when detection probabilities are learned from limited data. We validate the approach with numerical studies using AIS vessel-traffic data and synthetic maritime scenarios. Together, these results provide theory and practical guidance for deploying sensors in dynamic, uncertain environments.
comment: Submitted to IEEE ACC
Pricing Short-Circuit Current via a Primal-Dual Formulation for Preserving Integrality Constraints SC
Synchronous Generators (SGs) currently provide important levels of Short-Circuit Current (SCC), a critical ancillary service that ensures line protections trip during short-circuit faults. Given the ongoing replacement of SGs by power-electronics-based generation, which have a hard limit for current injection, it has become relevant to optimize the procurement of SCC provided by remaining SGs. Pricing this service is however challenging due to the integrality constraints in Unit Commitment (UC). Existing methods, e.g., dispatchable pricing, restricted pricing and marginal unit pricing, attempt to address this issue but exhibit limitations in handling binary variables, resulting in SCC prices that either fail to cover the operating costs of units or lack interpretability. To overcome these pitfalls, we propose a primal-dual formulation of the SCC-constrained dispatch that preserves the binary nature of UC while effectively computing shadow prices of SCC services. Using a modified IEEE 30-bus system, a comparison is carried out between the proposed approach and the state-of-the-art pricing schemes, highlighting the advantages of the primal-dual method in preserving UC integrality for SCC pricing.
comment: 7 pages, submitted to PSCC conference 2026
Auctioning Future Services in Edge Networks with Moving Vehicles: N-Step Look-Ahead Contracts for Sustainable Resource Provision
Timely resource allocation in edge-assisted vehicular networks is essential for compute-intensive services such as autonomous driving and navigation. However, vehicle mobility leads to spatio-temporal unpredictability of resource demands, while real-time double auctions incur significant latency. To address these challenges, we propose a look-ahead contract-based auction framework that shifts decision-making from runtime to planning time. Our approach establishes N-step service contracts between edge servers (ESs) using demand forecasts and modified double auctions. The system operates in two stages: first, an LSTM-based prediction module forecasts multi-slot resource needs and determines ES roles (buyer or seller), after which a pre-double auction generates contracts specifying resource quantities, prices, and penalties. Second, these contracts are enforced in real time without rerunning auctions. The framework incorporates energy costs, transmission overhead, and contract breach risks into utility models, ensuring truthful, rational, and energy-efficient trading. Experiments on real-world (UTD19) and synthetic traces demonstrate that our method improves time efficiency, energy use, and social welfare compared with existing baselines.
comment: 17 pages, 8 figures, 1 table
Sampling-based Stochastic Data-driven Predictive Control under Data Uncertainty - Extended Version
We present a stochastic constrained output-feedback data-driven predictive control scheme for linear time-invariant systems subject to bounded additive disturbances. The approach uses data-driven predictors based on an extension of Willems' fundamental lemma and requires only a single persistently exciting input-output data trajectory. Compared to current state-of-the-art approaches, we do not rely on availability of exact disturbance data. Instead, we leverage a novel parameterization of the unknown disturbance data considering consistency with the measured data and the system class. This allows for deterministic approximation of the chance constraints in a sampling-based fashion. A robust constraint on the first predicted step enables recursive feasibility, closed-loop constraint satisfaction, and robust asymptotic stability in expectation under standard assumptions. A numerical example demonstrates the efficiency of the proposed control scheme.
On Fast Attitude Filtering Using Matrix Fisher Distributions with Stability Guarantee
This paper addresses two interrelated problems of the nonlinear filtering mechanism and fast attitude filtering with the matrix Fisher distribution (MFD) on the special orthogonal group. By analyzing the distribution evolution along Bayes' rule, we reveal two essential properties that enhance the performance of Bayesian attitude filters with MFDs, particularly in challenging conditions, from a theoretical viewpoint. Benefiting from the new understanding of the filtering mechanism associated with MFDs, two closed-form filters with MFDs is then proposed. These filters avoid the burdensome computations in previous MFD-based filters by introducing linearized error systems with right-invariant errors but retaining the two advantageous properties. Moreover, we leverage the two properties and closed-form filtering iteration to prove the almost-global exponential stability of the proposed filter with right-invariant error for the single-axis rotation, which, to our knowledge, is not achieved by existing directional statistics-based filters. Numerical simulations demonstrate that the proposed filters are significantly more accurate than the classic invariant Kalman filter. Besides, they are also as accurate as recent MFD-based Bayesian filters in challenging circumstances with large initial error and measurement uncertainty but consumes far less computation time (about 1/5 to 1/100 of previous MFD-based attitude filters).
Chaotic Noncoherent SWIPT in Multi-Functional RIS-Aided Systems
In this letter, we investigate the design of chaotic signal-based transmit waveforms in a multi-functional reconfigurable intelligent surface (MF-RIS)-aided set-up for simultaneous wireless information and power transfer. We propose a differential chaos shift keying-based MF-RIS-aided set-up, where the MF-RIS is partitioned into three non-overlapping surfaces. The elements of the first sub-surface perform energy harvesting (EH), which in turn, provide the required power to the other two sub-surfaces responsible for transmission and reflection of the incident signal. By considering a frequency selective scenario and a realistic EH model, we characterize the chaotic MF-RIS-aided system in terms of its EH performance and the associated bit error rate. Thereafter, we characterize the harvested energy-bit error rate trade-off and derive a lower bound on the number of elements required to operate in the EH mode. Accordingly, we propose novel transmit waveform designs to demonstrate the importance of the choice of appropriate system parameters in the context of achieving self-sustainability.
comment: To appear in IEEE Wireless Communications Letters
Digital-physical testbed for ship autonomy studies in the Marine Cybernetics Laboratory basin
The algorithms developed for Maritime Autonomous Surface Ships (MASS) are often challenging to test on actual vessels due to high operational costs and safety considerations. Simulations offer a cost-effective alternative and eliminate risks, but they may not accurately represent real-world dynamics for the given tasks. Utilizing small-scale model ships and robotic vessels in conjunction with a laboratory basin provides an accessible testing environment for the early stages of validation processes. However, designing and developing a model vessel for a single test can be costly and cumbersome, and researchers often lack access to such infrastructure. To address these challenges and enable streamlined testing, we have developed an in-house testbed that facilitates the development, testing, verification, and validation of MASS algorithms in a digital-physical laboratory. This infrastructure includes a set of small-scale model vessels, a simulation environment for each vessel, a comprehensive testbed environment, and a digital twin in Unity. With this, we aim to establish a full design and verification pipeline that starts with high-fidelity simulation models of each model vessel, to the model-scale testing in the laboratory basin, allowing possibilities for moving towards semi-fullscale validation with R/V milliAmpere1 and full-scale validation with R/V Gunnerus. In this work, we present our progress on the development of this testbed environment and its components, demonstrating its effectiveness in enabling ship guidance, navigation, and control (GNC), including autonomy.
A Hierarchical Control Architecture for Space Robots in On-Orbit Servicing Operations
In-Orbit Servicing and Active Debris Removal require advanced robotic capabilities for capturing and detumbling uncooperative targets. This work presents a hierarchical control framework for autonomous robotic capture of tumbling objects in space. A simulation environment is developed, incorporating sloshing dynamics of the chaser, a rarely studied effect in space robotics. The proposed controller combines an inner Lyapunov-based robust control loop for multi-body dynamics with an outer loop addressing an extended inverse kinematics problem. Simulation results show improved robustness and adaptability compared to existing control schemes.
Robust MPC for Large-scale Linear Systems
State-of-the-art approaches of Robust Model Predictive Control (MPC) are restricted to linear systems of relatively small scale, i.e., with no more than about 5 states. The main reason is the computational burden of determining a robust positively invariant (RPI) set, whose complexity suffers from the curse of dimensionality. The recently proposed approach of Deadbeat Robust Model Predictive Control (DRMPC) is the first that does not rely on an RPI set. Yet it comes with the full set of essential system theoretic guarantees. DRMPC is hence a viable option, in particular, for large-scale systems. This paper introduces a detailed design procedure for DRMPC. It is shown that the optimal control problem generated for DRMPC has exactly the same computational complexity as Nominal MPC. A numerical study validates its applicability to randomly generated large-scale linear systems of various dimensions.
A Physics-Informed Context-Aware Approach for Anomaly Detection in Tele-driving Operations Under False Data Injection Attacks
Tele-operated driving (ToD) systems are special types of cyber-physical systems (CPSs) where the operator remotely controls the steering, acceleration, and braking actions of the vehicle. Malicious actors may inject false data in communication channels to manipulate the tele-operators driving commands to cause harm. Hence, protection of this communication is necessary for the safe operation of the target vehicle. However, according to the National Institute of Standards and Technology (NIST) cybersecurity framework, protection merely is not enough and the detection of an attack is necessary. Moreover, UN R155 mandates that security incidents across vehicle fleets be detected and logged. Thus, cyber-physical threats of ToD are modeled with an attack-centric approach in this paper. Then, an attack model with false data injection (FDI) on steering control commands is created from real vehicle data. The risk of this attack model is assessed for a last-mile delivery (LMD) application. Finally, a physics-informed context-aware anomaly detection system (PCADS) is proposed to detect such false injection attacks, and preliminary experimental results are presented to validate the model.
comment: 16 pages, 13 figures, Accepted at IEEE Access
Code Generation and Conic Constraints for Model-Predictive Control on Microcontrollers with Conic-TinyMPC
Model-predictive control (MPC) is a powerful framework for controlling dynamic systems under constraints, but it remains challenging to deploy on resource-constrained platforms, especially for problems involving conic constraints. To address this, we extend recent work developing fast, structure-exploiting, cached ADMM solvers for embedded applications, to provide support for second-order cones, as well as C++ code generation from Python, MATLAB, and Julia for easy deployment. Microcontroller benchmarks show that our solver provides up to a two-order-of-magnitude speedup, ranging from 10.6x to 142.7x, over state-of-the-art embedded solvers on QP and SOCP problems, and enables us to fit order-of-magnitude larger problems in memory. We validate our solver's deployed performance through simulation and hardware experiments, including conically-constrained trajectory tracking on a 27g Crazyflie quadrotor. To get started with Conic-TinyMPC, visit our documentation, examples, and the open-source codebase at https://tinympc.org.
comment: First three authors contributed equally
On-Demand Growth of Semiconductor Heterostructures Guided by Physics-Informed Machine Learning
Developing tailored semiconductor heterostructures on demand represents a critical capability for addressing the escalating performance demands in electronic and optoelectronic devices. However, traditional fabrication methods remain constrained by simulation-based design and iterative trial-and-error optimization. Here, we introduce SemiEpi, a self-driving platform designed for molecular beam epitaxy (MBE) to perform multi-step semiconductor heterostructure growth through in-situ monitoring and on-the-fly feedback control. By integrating standard MBE reactors, physics-informed machine learning (ML) models, and parameter initialization, SemiEpi identifies optimal initial conditions and proposes experiments for heterostructure growth, eliminating the need for extensive expertise in MBE processes. As a proof of concept, we demonstrate the optimization of high-density InAs quantum dot (QD) growth with a target emission wavelength of 1240 nm, showcasing the power of SemiEpi. We achieve a QD density of 5 x 10^10 cm^-2, a 1.6-fold increase in photoluminescence (PL) intensity, and a reduced full width at half maximum (FWHM) of 29.13 meV, leveraging in-situ reflective high-energy electron diffraction monitoring with feedback control for adjusting growth temperatures. Taken together, our results highlight the potential of ML-guided systems to address challenges in multi-step heterostructure growth, facilitate the development of a hardware-independent framework, and enhance process repeatability and stability, even without exhaustive knowledge of growth parameters.
comment: 5 figures
DALNet: A Denoising Diffusion Probabilistic Model for High-Fidelity Day-Ahead Load Forecasting
Accurate probabilistic load forecasting is crucial for maintaining the safety and stability of power systems. However, the mainstream approach, multi-step prediction, is hindered by cumulative errors and forecasting lags, which limits its effectiveness in probabilistic day-ahead load forecasting (PDALF). To overcome these challenges, we introduce DALNet, a novel denoising diffusion model designed to generate load curves rather than relying on direct prediction. By shifting the focus to curve generation, DALNet captures the complex distribution of actual load time-series data under specific conditions with greater fidelity. To further enhance DALNet, we propose the temporal multi-scale attention block (TMSAB), a mechanism designed to integrate both positional and temporal information for improved forecasting precision. Furthermore, we utilize kernel density estimation (KDE) to reconstruct the distribution of generated load curves and employ Kullback-Leibler (KL) divergence to compare them with the actual data distribution. Experimental results demonstrate that DALNet excels in load forecasting accuracy and offers a novel perspective for other predictive tasks within power systems.
comment: 22pages
Can We Ignore Labels In Out of Distribution Detection?
Out-of-distribution (OOD) detection methods have recently become more prominent, serving as a core element in safety-critical autonomous systems. One major purpose of OOD detection is to reject invalid inputs that could lead to unpredictable errors and compromise safety. Due to the cost of labeled data, recent works have investigated the feasibility of self-supervised learning (SSL) OOD detection, unlabeled OOD detection, and zero shot OOD detection. In this work, we identify a set of conditions for a theoretical guarantee of failure in unlabeled OOD detection algorithms from an information-theoretic perspective. These conditions are present in all OOD tasks dealing with real-world data: I) we provide theoretical proof of unlabeled OOD detection failure when there exists zero mutual information between the learning objective and the in-distribution labels, a.k.a. 'label blindness', II) we define a new OOD task - Adjacent OOD detection - that tests for label blindness and accounts for a previously ignored safety gap in all OOD detection benchmarks, and III) we perform experiments demonstrating that existing unlabeled OOD methods fail under conditions suggested by our label blindness theory and analyze the implications for future research in unlabeled OOD methods.
Quickest Change Detection with Cost-Constrained Experiment Design
In the classical quickest change detection problem, an observer performs a single experiment to monitor a stochastic process. The goal in the classical problem is to detect a change in the statistical properties of the process, with the minimum possible delay, subject to a constraint on the rate of false alarms. This paper considers the case where, at each observation time, the decision-maker must choose between multiple experiments with varying information qualities and costs. The change can be detected using any of the experiments. The goal here is to detect the change with the minimum delay, subject to constraints on the rate of false alarms and the fraction of time each experiment is performed before the time of change. The constraint on the fraction of time can be used to control the overall cost of using the system of experiments. An algorithm called the two-experiment cumulative sum (2E-CUSUM) algorithm is first proposed to solve the problem when there are only two experiments. The algorithm for the case of multiple experiments, starting with three experiments, is then designed iteratively using the 2E-CUSUM algorithm. Two key ideas used in the design are the scaling of undershoots and the truncation of tests. The multiple-experiment algorithm can be designed to satisfy the constraints and can achieve the delay performance of the experiment with the highest quality within a constant. The important concept of data efficiency, where the observer has the choice of not performing any experiment, is explored as well.
Neural-Rendezvous: Provably Robust Guidance and Control to Encounter Interstellar Objects
Interstellar objects (ISOs) are likely representatives of primitive materials invaluable in understanding exoplanetary star systems. Due to their poorly constrained orbits with generally high inclinations and relative velocities, however, exploring ISOs with conventional human-in-the-loop approaches is significantly challenging. This paper presents Neural-Rendezvous -- a deep learning-based guidance and control framework for encountering fast-moving objects, including ISOs, robustly, accurately, and autonomously in real time. It uses pointwise minimum norm tracking control on top of a guidance policy modeled by a spectrally-normalized deep neural network, where its hyperparameters are tuned with a loss function directly penalizing the MPC state trajectory tracking error. We show that Neural-Rendezvous provides a high probability exponential bound on the expected spacecraft delivery error, the proof of which leverages stochastic incremental stability analysis. In particular, it is used to construct a non-negative function with a supermartingale property, explicitly accounting for the ISO state uncertainty and the local nature of nonlinear state estimation guarantees. In numerical simulations, Neural-Rendezvous is demonstrated to satisfy the expected error bound for 100 ISO candidates. This performance is also empirically validated using our spacecraft simulator and in high-conflict and distributed UAV swarm reconfiguration with up to 20 UAVs.
comment: Preprint Version, Accepted: October, 2024 (One-minute YouTube summary: https://youtu.be/q3e0LYS2IYQ, DOI: https://doi.org/10.2514/1.G007671)
Contraction Theory for Nonlinear Stability Analysis and Learning-based Control: A Tutorial Overview
Contraction theory is an analytical tool to study differential dynamics of a non-autonomous (i.e., time-varying) nonlinear system under a contraction metric defined with a uniformly positive definite matrix, the existence of which results in a necessary and sufficient characterization of incremental exponential stability of multiple solution trajectories with respect to each other. By using a squared differential length as a Lyapunov-like function, its nonlinear stability analysis boils down to finding a suitable contraction metric that satisfies a stability condition expressed as a linear matrix inequality, indicating that many parallels can be drawn between well-known linear systems theory and contraction theory for nonlinear systems. Furthermore, contraction theory takes advantage of a superior robustness property of exponential stability used in conjunction with the comparison lemma. This yields much-needed safety and stability guarantees for neural network-based control and estimation schemes, without resorting to a more involved method of using uniform asymptotic stability for input-to-state stability. Such distinctive features permit the systematic construction of a contraction metric via convex optimization, thereby obtaining an explicit exponential bound on the distance between a time-varying target trajectory and solution trajectories perturbed externally due to disturbances and learning errors. The objective of this paper is, therefore, to present a tutorial overview of contraction theory and its advantages in nonlinear stability analysis of deterministic and stochastic systems, with an emphasis on deriving formal robustness and stability guarantees for various learning-based and data-driven automatic control methods. In particular, we provide a detailed review of techniques for finding contraction metrics and associated control and estimation laws using deep neural networks.
comment: Annual Reviews in Control, Preprint Version, Accepted, Oct. 1st
Probabilistic Simulation of Aircraft Descent via a Hybrid Physics-Informed Machine Learning Approach
This paper presents a method for generating probabilistic descent trajectories in simulations of real-world airspace. A dataset of 116,066 trajectories harvested from Mode S radar returns in UK airspace was used to train and test the model. Thirteen aircraft types with varying performance characteristics were investigated. It was found that the error in the mean prediction of time to reach the bottom of descent for the proposed method was less than that of the the Base of Aircraft Data (BADA) model by a factor of 10. Furthermore, the method was capable of generating a range of trajectories that were similar to the held out test dataset when analysed in distribution. The proposed method is hybrid, with aircraft drag and calibrated airspeed functions generated probabilistically to parameterise the BADA equations, ensuring the physical plausibility of generated trajectories.
Multiagent Systems
Speculative Actions: A Lossless Framework for Faster Agentic Systems
Despite growing interest in AI agents across industry and academia, their execution in an environment is often slow, hampering training, evaluation, and deployment. For example, a game of chess between two state-of-the-art agents may take hours. A critical bottleneck is that agent behavior unfolds sequentially: each action requires an API call, and these calls can be time-consuming. Inspired by speculative execution in microprocessors and speculative decoding in LLM inference, we propose speculative actions, a lossless framework for general agentic systems that predicts likely actions using faster models, enabling multiple steps to be executed in parallel. We evaluate this framework across three agentic environments: gaming, e-commerce, web search, and a "lossy" extension for an operating systems environment. In all cases, speculative actions achieve substantial accuracy in next-action prediction (up to 55%), translating into significant reductions in end-to-end latency. Moreover, performance can be further improved through stronger guessing models, top-K action prediction, multi-step speculation, and uncertainty-aware optimization, opening a promising path toward deploying low-latency agentic systems in the real world.
NegotiationGym: Self-Optimizing Agents in a Multi-Agent Social Simulation Environment
We design and implement NegotiationGym, an API and user interface for configuring and running multi-agent social simulations focused upon negotiation and cooperation. The NegotiationGym codebase offers a user-friendly, configuration-driven API that enables easy design and customization of simulation scenarios. Agent-level utility functions encode optimization criteria for each agent, and agents can self-optimize by conducting multiple interaction rounds with other agents, observing outcomes, and modifying their strategies for future rounds.
comment: SocialSim Workshop at COLM 2025
Audit the Whisper: Detecting Steganographic Collusion in Multi-Agent LLMs
Multi-agent deployments of large language models (LLMs) are increasingly embedded in market, allocation, and governance workflows, yet covert coordination among agents can silently erode trust and social welfare. Existing audits are dominated by heuristics that lack theoretical guarantees, struggle to transfer across tasks, and seldom ship with the infrastructure needed for independent replication. We introduce \emph{Audit the Whisper}, a conference-grade research artifact that spans theory, benchmark design, detection, and reproducibility. Our contributions are: (i) a channel-capacity analysis showing how interventions such as paraphrase, rate limiting, and role permutation impose quantifiable capacity penalties -- operationalized via paired-run Kullback--Leibler diagnostics -- that tighten mutual-information thresholds with finite-sample guarantees; (ii) \textsc{ColludeBench}-v0, covering pricing, first-price auctions, and peer review with configurable covert schemes, deterministic manifests, and reward instrumentation; and (iii) a calibrated auditing pipeline that fuses cross-run mutual information, permutation invariance, watermark variance, and fairness-aware acceptance bias, each tuned to a \(10^{-3}\) false-positive budget. Across 600 audited runs spanning 12 intervention conditions, the union meta-test attains TPR~$=1$ with zero observed false alarms, while ablations surface the price-of-auditing trade-off and highlight fairness-driven colluders invisible to MI alone. We release regeneration scripts, seed-stamped manifests, and documentation so that external auditors can reproduce every figure and extend the framework with minimal effort.
comment: 8 pages, 0 figures
Small Fleet, Big Impact: Enhancing Shared Micromobility Efficiency through Minimal Autonomous Vehicle Deployment
Shared micromobility systems, such as electric scooters and bikes, have gained widespread popularity as sustainable alternatives to traditional transportation modes. However, these systems face persistent challenges due to spatio-temporal demand fluctuations, often resulting in a mismatch between vehicle supply and user demand. Existing shared micromobility vehicle scheduling methods typically redistribute vehicles once or twice per day, which makes them vulnerable to performance degradation under atypical conditions. In this work, we design to augment existing micromobility scheduling methods by integrating a small number of autonomous shared micromobility vehicles (ASMVs), which possess self-rebalancing capabilities to dynamically adapt to real-time demand. Specifically, we introduce SMART, a hierarchical reinforcement learning framework that jointly optimizes high-level initial deployment and low-level real-time rebalancing for ASMVs. We evaluate our framework based on real-world e-scooter usage data from Chicago. Our experiment results show that our framework is highly effective and possesses strong generalization capability, allowing it to seamlessly integrate with existing vehicle scheduling methods and significantly enhance overall micromobility service performance.
comment: 10 pages, 11 figures, BuildSys 2025
Cooperative Flexibility Exchange: Fair and Comfort-Aware Decentralized Resource Allocation
The growing electricity demand and increased use of smart appliances are placing new pressures on power grids, making efficient energy management more important than ever. The existing energy management systems often prioritize system efficiency (balanced energy demand and supply) at the expense of user comfort. This paper addresses this gap by proposing a novel decentralized multi-agent coordination-based demand-side management system. The proposed system enables individual agents to coordinate for demand-side energy optimization while improving the user comfort and maintaining the system efficiency. A key innovation of this work is the introduction of a slot exchange mechanism, where agents first receive optimized appliance-level energy consumption schedules and then coordinate with each other to adjust these schedules through slot exchanges. This approach improves user comfort even when agents show non-altruistic behaviour, and it scales well with large populations. The system also promotes fairness by balancing satisfaction levels across users. For performance evaluation, a real-world dataset is used, and the results demonstrate that the proposed slot exchange mechanism increases user comfort and fairness without raising system inefficiency cost, making it a practical and scalable solution for future smart grids.
AgentZero++: Modeling Fear-Based Behavior
We present AgentZero++, an agent-based model that integrates cognitive, emotional, and social mechanisms to simulate decentralized collective violence in spatially distributed systems. Building on Epstein's Agent\_Zero framework, we extend the original model with eight behavioral enhancements: age-based impulse control; memory-based risk estimation; affect-cognition coupling; endogenous destructive radius; fight-or-flight dynamics; affective homophily; retaliatory damage; and multi-agent coordination. These additions allow agents to adapt based on internal states, previous experiences, and social feedback, producing emergent dynamics such as protest asymmetries, escalation cycles, and localized retaliation. Implemented in Python using the Mesa ABM framework, AgentZero++ enables modular experimentation and visualization of how micro-level cognitive heterogeneity shapes macro-level conflict patterns. Our results highlight how small variations in memory, reactivity, and affective alignment can amplify or dampen unrest through feedback loops. By explicitly modeling emotional thresholds, identity-driven behavior, and adaptive networks, this work contributes a flexible and extensible platform for analyzing affective contagion and psychologically grounded collective action.
Emergent Coordination in Multi-Agent Language Models
When are multi-agent LLM systems merely a collection of individual agents versus an integrated collective with higher-order structure? We introduce an information-theoretic framework to test -- in a purely data-driven way -- whether multi-agent systems show signs of higher-order structure. This information decomposition lets us measure whether dynamical emergence is present in multi-agent LLM systems, localize it, and distinguish spurious temporal coupling from performance-relevant cross-agent synergy. We implement both a practical criterion and an emergence capacity criterion operationalized as partial information decomposition of time-delayed mutual information (TDMI). We apply our framework to experiments using a simple guessing game without direct agent communication and only minimal group-level feedback with three randomized interventions. Groups in the control condition exhibit strong temporal synergy but only little coordinated alignment across agents. Assigning a persona to each agent introduces stable identity-linked differentiation. Combining personas with an instruction to ``think about what other agents might do'' shows identity-linked differentiation and goal-directed complementarity across agents. Taken together, our framework establishes that multi-agent LLM systems can be steered with prompt design from mere aggregates to higher-order collectives. Our results are robust across emergence measures and entropy estimators, and not explained by coordination-free baselines or temporal dynamics alone. Without attributing human-like cognition to the agents, the patterns of interaction we observe mirror well-established principles of collective intelligence in human groups: effective performance requires both alignment on shared objectives and complementary contributions across members.
Network Formation and Dynamics Among Multi-LLMs
Social networks profoundly influence how humans form opinions, exchange information, and organize collectively. As large language models (LLMs) are increasingly embedded into social and professional environments, it is critical to understand whether their interactions approximate human-like network dynamics. We develop a framework to study the network formation behaviors of multiple LLM agents and benchmark them against human decisions. Across synthetic and real-world settings, including friendship, telecommunication, and employment networks, we find that LLMs consistently reproduce fundamental micro-level principles such as preferential attachment, triadic closure, and homophily, as well as macro-level properties including community structure and small-world effects. Importantly, the relative emphasis of these principles adapts to context: for example, LLMs favor homophily in friendship networks but heterophily in organizational settings, mirroring patterns of social mobility. A controlled human-subject survey confirms strong alignment between LLMs and human participants in link-formation decisions. These results establish that LLMs can serve as powerful tools for social simulation and synthetic data generation, while also raising critical questions about bias, fairness, and the design of AI systems that participate in human networks.
comment: Accepted at PNAS Nexus
GUIDE: Towards Scalable Advising for Research Ideas
The field of AI research is advancing at an unprecedented pace, enabling automated hypothesis generation and experimental design across diverse domains such as biology, mathematics, and artificial intelligence. Despite these advancements, there remains a significant gap in the availability of scalable advising systems capable of providing high-quality, well-reasoned feedback to refine proposed hypotheses and experimental designs. To address this challenge, we explore key factors that underlie the development of robust advising systems, including model size, context length, confidence estimation, and structured reasoning processes. Our findings reveal that a relatively small model, when equipped with a well-compressed literature database and a structured reasoning framework, can outperform powerful general-purpose language models such as Deepseek-R1 in terms of acceptance rates for self-ranked top-30% submissions to ICLR 2025. Moreover, when limited to high-confidence predictions, our system achieves an acceptance rate exceeding 90% on the ICLR 2025 test set, underscoring its potential to significantly enhance the quality and efficiency of hypothesis generation and experimental design. The code is released at https://github.com/HowardLiu0830/GUIDE-Research-Idea-Evaluation.
Systems and Control (CS)
Geometry of Distance Protection
Distance relays detect faults on transmission lines. They face uncertainty from the fault's location and resistance, as well as the current from the line's remote terminal. In this paper, we aggregate this uncertainty with the Minkowski sum. This allows us to explicitly model the power grid surrounding the relay's line, and in turn accommodate any mix of synchronous machines and inverter-based resources. To make the relay's task easier, inverters can inject perturbations, or auxiliary signals, such as negative-sequence current. We use Farkas' lemma to construct an optimization for designing inverter auxiliary signals.
Reliable and Scalable Robot Policy Evaluation with Imperfect Simulators
Rapid progress in imitation learning, foundation models, and large-scale datasets has led to robot manipulation policies that generalize to a wide-range of tasks and environments. However, rigorous evaluation of these policies remains a challenge. Typically in practice, robot policies are often evaluated on a small number of hardware trials without any statistical assurances. We present SureSim, a framework to augment large-scale simulation with relatively small-scale real-world testing to provide reliable inferences on the real-world performance of a policy. Our key idea is to formalize the problem of combining real and simulation evaluations as a prediction-powered inference problem, in which a small number of paired real and simulation evaluations are used to rectify bias in large-scale simulation. We then leverage non-asymptotic mean estimation algorithms to provide confidence intervals on mean policy performance. Using physics-based simulation, we evaluate both diffusion policy and multi-task fine-tuned \(\pi_0\) on a joint distribution of objects and initial conditions, and find that our approach saves over \(20-25\%\) of hardware evaluation effort to achieve similar bounds on policy performance.
A Hybrid GNN-IZR Framework for Fast and Empirically Robust AC Power Flow Analysis in Radial Distribution Systems
The Alternating Current Power Flow (ACPF) problem forces a trade-off between the speed of data-driven models and the reliability of analytical solvers. This paper introduces a hybrid framework that synergizes a Graph Neural Network (GNN) with the Implicit Z-Bus Recursive (IZR) method, a robust, non-iterative solver for radial distribution networks. The framework employs a physics-informed GNN for rapid initial predictions and invokes the IZR solver as a failsafe for stressed cases identified by a two-stage trigger. A failure is defined as any solution with a maximum power mismatch exceeding 0.1 p.u., a significant operational deviation. On a challenging test set of 7,500 stressed scenarios for the IEEE 33-bus system, the GNN-only model failed on 13.11 % of cases. In contrast, the hybrid framework identified all potential failures, delegating them to the IZR solver to achieve a 0.00 % failure rate, empirically matching the 100 % success rate of the analytical solver on this specific test set. An expanded ablation study confirms that both physics-informed training and Z-bus sensitivity features are critical, collaboratively reducing the GNN's failure rate from 98.72 % (data-only) to 13.11 %. The hybrid approach demonstrates a pragmatic path to achieving the empirical reliability of an analytical solver while leveraging GNN speed, enabling a significant increase in the number of scenarios analyzable in near real-time.
Adaptive Federated Learning via Dynamical System Model
Hyperparameter selection is critical for stable and efficient convergence of heterogeneous federated learning, where clients differ in computational capabilities, and data distributions are non-IID. Tuning hyperparameters is a manual and computationally expensive process as the hyperparameter space grows combinatorially with the number of clients. To address this, we introduce an end-to-end adaptive federated learning method in which both clients and central agents adaptively select their local learning rates and momentum parameters. Our approach models federated learning as a dynamical system, allowing us to draw on principles from numerical simulation and physical design. Through this perspective, selecting momentum parameters equates to critically damping the system for fast, stable convergence, while learning rates for clients and central servers are adaptively selected to satisfy accuracy properties from numerical simulation. The result is an adaptive, momentum-based federated learning algorithm in which the learning rates for clients and servers are dynamically adjusted and controlled by a single, global hyperparameter. By designing a fully integrated solution for both adaptive client updates and central agent aggregation, our method is capable of handling key challenges of heterogeneous federated learning, including objective inconsistency and client drift. Importantly, our approach achieves fast convergence while being insensitive to the choice of the global hyperparameter, making it well-suited for rapid prototyping and scalable deployment. Compared to state-of-the-art adaptive methods, our framework is shown to deliver superior convergence for heterogeneous federated learning while eliminating the need for hyperparameter tuning both client and server updates.
Learning to Capture Rocks using an Excavator: A Reinforcement Learning Approach with Guiding Reward Formulation
Rock capturing with standard excavator buckets is a challenging task typically requiring the expertise of skilled operators. Unlike soil digging, it involves manipulating large, irregular rocks in unstructured environments where complex contact interactions with granular material make model-based control impractical. Existing autonomous excavation methods focus mainly on continuous media or rely on specialized grippers, limiting their applicability to real-world construction sites. This paper introduces a fully data-driven control framework for rock capturing that eliminates the need for explicit modeling of rock or soil properties. A model-free reinforcement learning agent is trained in the AGX Dynamics simulator using the Proximal Policy Optimization (PPO) algorithm and a guiding reward formulation. The learned policy outputs joint velocity commands directly to the boom, arm, and bucket of a CAT365 excavator model. Robustness is enhanced through extensive domain randomization of rock geometry, density, and mass, as well as the initial configurations of the bucket, rock, and goal position. To the best of our knowledge, this is the first study to develop and evaluate an RL-based controller for the rock capturing task. Experimental results show that the policy generalizes well to unseen rocks and varying soil conditions, achieving high success rates comparable to those of human participants while maintaining machine stability. These findings demonstrate the feasibility of learning-based excavation strategies for discrete object manipulation without requiring specialized hardware or detailed material models.
DADS Under Unknown Input Coefficients
This short note shows that the Deadzone-Adapted Disturbance Suppression (DADS) adaptive control scheme is applicable to systems with unknown input coefficients. We study time-invariant, control-affine systems that satisfy the matching condition for which no bounds for the disturbance and the unknown parameters are known. The input coefficients can be time-varying as well as the unknown parameters. The only thing assumed for the input coefficients is their sign. The adaptive control design is Lyapunov-based and can be accomplished for every system for which a smooth globally stabilizing feedback exists when the disturbances are absent and all unknown parameters are known. The design is given by simple, explicit formulas. The proposed controllers guarantee an attenuation of the plant state to an assignable small level, despite unknown bounds on the parameters and disturbance, without a drift of the gain, state, and input.
comment: 23 pages, 10 figures
From Shadow to Light: Toward Safe and Efficient Policy Learning Across MPC, DeePC, RL, and LLM Agents
One of the main challenges in modern control applications, particularly in robot and vehicle motion control, is achieving accurate, fast, and safe movement. To address this, optimal control policies have been developed to enforce safety while ensuring high performance. Since basic first-principles models of real systems are often available, model-based controllers are widely used. Model predictive control (MPC) is a leading approach that optimizes performance while explicitly handling safety constraints. However, obtaining accurate models for complex systems is difficult, which motivates data-driven alternatives. ML-based MPC leverages learned models to reduce reliance on hand-crafted dynamics, while reinforcement learning (RL) can learn near-optimal policies directly from interaction data. Data-enabled predictive control (DeePC) goes further by bypassing modeling altogether, directly learning safe policies from raw input-output data. Recently, large language model (LLM) agents have also emerged, translating natural language instructions into structured formulations of optimal control problems. Despite these advances, data-driven policies face significant limitations. They often suffer from slow response times, high computational demands, and large memory needs, making them less practical for real-world systems with fast dynamics, limited onboard computing, or strict memory constraints. To address this, various technique, such as reduced-order modeling, function-approximated policy learning, and convex relaxations, have been proposed to reduce computational complexity. In this paper, we present eight such approaches and demonstrate their effectiveness across real-world applications, including robotic arms, soft robots, and vehicle motion control.
A Conformal Prediction-Based Chance-Constrained Programming Approach for 24/7 Carbon-Free Data Center Operation Scheduling
The rapid growth of AI applications is dramatically increasing data center energy demand, exacerbating carbon emissions, and necessitating a shift towards 24/7 carbon-free energy (CFE). Unlike traditional annual energy matching, 24/7 CFE requires matching real-time electricity consumption with clean energy generation every hour, presenting significant challenges due to the inherent variability and forecasting errors of renewable energy sources. Traditional robust and data-driven optimization methods often fail to leverage the features of the prediction model (also known as contextual or covariate information) when constructing the uncertainty set, leading to overly conservative operational decisions. This paper proposes a comprehensive approach for 24/7 CFE data center operation scheduling, focusing on robust decision-making under renewable generation uncertainty. This framework leverages covariate information through a multi-variable conformal prediction (CP) technique to construct statistically valid and adaptive uncertainty sets for renewable forecasts. The uncertainty sets directly inform the chance-constrained programming (CCP) problem, ensuring that chance constraints are met with a specified probability. We further establish theoretical underpinnings connecting the CP-generated uncertainty sets to the statistical feasibility guarantees of the CCP. Numerical results highlight the benefits of this covariate-aware approach, demonstrating up to 6.65% cost reduction and 6.96% decrease in carbon-based energy usage compared to conventional covariate-independent methods, thereby enabling data centers to progress toward 24/7 CEF.
Distributed MPC-based Coordination of Traffic Perimeter and Signal Control: A Lexicographic Optimization Approach
This paper introduces a comprehensive strategy that integrates traffic perimeter control with traffic signal control to alleviate congestion in an urban traffic network (UTN). The strategy is formulated as a lexicographic multi-objective optimization problem, starting with the regulation of traffic inflows at boundary junctions to maximize the capacity while ensuring a smooth operation of the UTN. Following this, the signal timings at internal junctions are collaboratively optimized to enhance overall traffic conditions under the regulated inflows. The use of a model predictive control (MPC) approach ensures that the control solution adheres to safety and capacity constraints within the network. To address the computational complexity of the problem, the UTN is divided into subnetworks, each managed by a local agent. A distributed solution method based on the alternating direction method of multipliers (ADMM) algorithm is employed, allowing each agent to determine its optimal control decisions using local information from its subnetwork and neighboring agents. Numerical simulations using VISSIM and MATLAB demonstrate the effectiveness of the proposed traffic control strategy.
Data-driven Practical Stabilization of Nonlinear Systems via Chain Policies: Sample Complexity and Incremental Learning
We propose a method for data-driven practical stabilization of nonlinear systems with provable guarantees, based on the concept of Nonparametric Chain Policies (NCPs). The approach employs a normalized nearest-neighbor rule to assign, at each state, a finite-duration control signal derived from stored data, after which the process repeats. Unlike recent works that model the system as linear, polynomial, or polynomial fraction, we only assume the system to be locally Lipschitz. Our analysis builds on the framework of Recurrent Lyapunov Functions (RLFs), which enable data-driven certification of practical stability using standard norm functions instead of requiring the explicit construction of a classical Lyapunov function. To extend this framework, we introduce the concept of Recurrent Control Lyapunov Functions (R-CLFs), which can certify the existence of an NCP that practically stabilizes an arbitrarily small c-neighborhood of an equilibrium point. We also provide an explicit sample complexity guarantee of O((3/rho)^d log(R/c)) number of trajectories, where R is the domain radius, d the state dimension, and rho a system-dependent constant. The proposed Chain Policies are nonparametric, thus allowing new verified data to be readily incorporated into the policy to either improve convergence rate or enlarge the certified region. Numerical experiments illustrate and validate these properties.
Multi-Agent Guided Policy Search for Non-Cooperative Dynamic Games
Multi-agent reinforcement learning (MARL) optimizes strategic interactions in non-cooperative dynamic games, where agents have misaligned objectives. However, data-driven methods such as multi-agent policy gradients (MA-PG) often suffer from instability and limit-cycle behaviors. Prior stabilization techniques typically rely on entropy-based exploration, which slows learning and increases variance. We propose a model-based approach that incorporates approximate priors into the reward function as regularization. In linear quadratic (LQ) games, we prove that such priors stabilize policy gradients and guarantee local exponential convergence to an approximate Nash equilibrium. We then extend this idea to infinite-horizon nonlinear games by introducing Multi-agent Guided Policy Search (MA-GPS), which constructs short-horizon local LQ approximations from trajectories of current policies to guide training. Experiments on nonlinear vehicle platooning and a six-player strategic basketball formation show that MA-GPS achieves faster convergence and more stable learning than existing MARL methods.
comment: We fix a few typos: 1. In the Introduction, mode-based optimization -> model-based optimization; 2. In the LQ game definition, there is an accidentally missing superscript i in equation (8). We apologize for the confusion that they may raise
Adaptive Disturbance Observer-based Full-Order Integral-Terminal Sliding Mode Control with Unknown A Priori Bound on Uncertainty
This study presents a novel, continuous finite-time control strategy for a class of nonlinear systems subject to matched uncertainties with unknown bounds. We propose an Adaptive Disturbance Observer-based Full-order Integral-Terminal Sliding Mode Control (ADO-FOITSMC) to stabilize a chain of integrators in presence of exogenous disturbances whose time derivative is bounded by a constant that is not known a priori. Key features of this approach include a significant reduction in control input chattering and a non-monotonic adaptive law for the observer gains, which prevents overestimation while ensuring the global boundedness of system states. The effectiveness and practical viability of the proposed algorithm are demonstrated through its application to the attitude stabilization of a rigid spacecraft.
comment: 6 pages, 4 figures
MAD: A Magnitude And Direction Policy Parametrization for Stability Constrained Reinforcement Learning
We introduce magnitude and direction (MAD) policies, a policy parameterization for reinforcement learning (RL) that preserves Lp closed-loop stability for nonlinear dynamical systems. Despite their completeness in describing all stabilizing controllers, methods based on nonlinear Youla and system-level synthesis are significantly impacted by the difficulty of parametrizing Lp-stable operators. In contrast, MAD policies introduce explicit feedback on state-dependent features - a key element behind the success of reinforcement learning pipelines - without jeopardizing closed-loop stability. This is achieved by letting the magnitude of the control input be described by a disturbance-feedback Lp-stable operator, while selecting its direction based on state-dependent features through a universal function approximator. We further characterize the robust stability properties of MAD policies under model mismatch. Unlike existing disturbance-feedback policy parametrizations, MAD policies introduce state-feedback components compatible with model-free RL pipelines, ensuring closed-loop stability with no model information beyond assuming open-loop stability. Numerical experiments show that MAD policies trained with deep deterministic policy gradient (DDPG) methods generalize to unseen scenarios - matching the performance of standard neural network policies while guaranteeing closed-loop stability by design.
Rough Stochastic Pontryagin Maximum Principle and an Indirect Shooting Method
We derive first-order Pontryagin optimality conditions for stochastic optimal control with deterministic controls for systems modeled by rough differential equations (RDE) driven by Gaussian rough paths. This Pontryagin Maximum Principle (PMP) applies to systems following stochastic differential equations (SDE) driven by Brownian motion, yet it does not rely on forward-backward SDEs and involves the same Hamiltonian as the deterministic PMP. The proof consists of first deriving various integrable error bounds for solutions to nonlinear and linear RDEs by leveraging recent results on Gaussian rough paths. The PMP then follows using standard techniques based on needle-like variations. As an application, we propose the first indirect shooting method for nonlinear stochastic optimal control and show that it converges 10x faster than a direct method on a stabilization task.
comment: Revisions to the presentation and proofs
Systems and Control (EESS)
Geometry of Distance Protection
Distance relays detect faults on transmission lines. They face uncertainty from the fault's location and resistance, as well as the current from the line's remote terminal. In this paper, we aggregate this uncertainty with the Minkowski sum. This allows us to explicitly model the power grid surrounding the relay's line, and in turn accommodate any mix of synchronous machines and inverter-based resources. To make the relay's task easier, inverters can inject perturbations, or auxiliary signals, such as negative-sequence current. We use Farkas' lemma to construct an optimization for designing inverter auxiliary signals.
Reliable and Scalable Robot Policy Evaluation with Imperfect Simulators
Rapid progress in imitation learning, foundation models, and large-scale datasets has led to robot manipulation policies that generalize to a wide-range of tasks and environments. However, rigorous evaluation of these policies remains a challenge. Typically in practice, robot policies are often evaluated on a small number of hardware trials without any statistical assurances. We present SureSim, a framework to augment large-scale simulation with relatively small-scale real-world testing to provide reliable inferences on the real-world performance of a policy. Our key idea is to formalize the problem of combining real and simulation evaluations as a prediction-powered inference problem, in which a small number of paired real and simulation evaluations are used to rectify bias in large-scale simulation. We then leverage non-asymptotic mean estimation algorithms to provide confidence intervals on mean policy performance. Using physics-based simulation, we evaluate both diffusion policy and multi-task fine-tuned \(\pi_0\) on a joint distribution of objects and initial conditions, and find that our approach saves over \(20-25\%\) of hardware evaluation effort to achieve similar bounds on policy performance.
A Hybrid GNN-IZR Framework for Fast and Empirically Robust AC Power Flow Analysis in Radial Distribution Systems
The Alternating Current Power Flow (ACPF) problem forces a trade-off between the speed of data-driven models and the reliability of analytical solvers. This paper introduces a hybrid framework that synergizes a Graph Neural Network (GNN) with the Implicit Z-Bus Recursive (IZR) method, a robust, non-iterative solver for radial distribution networks. The framework employs a physics-informed GNN for rapid initial predictions and invokes the IZR solver as a failsafe for stressed cases identified by a two-stage trigger. A failure is defined as any solution with a maximum power mismatch exceeding 0.1 p.u., a significant operational deviation. On a challenging test set of 7,500 stressed scenarios for the IEEE 33-bus system, the GNN-only model failed on 13.11 % of cases. In contrast, the hybrid framework identified all potential failures, delegating them to the IZR solver to achieve a 0.00 % failure rate, empirically matching the 100 % success rate of the analytical solver on this specific test set. An expanded ablation study confirms that both physics-informed training and Z-bus sensitivity features are critical, collaboratively reducing the GNN's failure rate from 98.72 % (data-only) to 13.11 %. The hybrid approach demonstrates a pragmatic path to achieving the empirical reliability of an analytical solver while leveraging GNN speed, enabling a significant increase in the number of scenarios analyzable in near real-time.
Adaptive Federated Learning via Dynamical System Model
Hyperparameter selection is critical for stable and efficient convergence of heterogeneous federated learning, where clients differ in computational capabilities, and data distributions are non-IID. Tuning hyperparameters is a manual and computationally expensive process as the hyperparameter space grows combinatorially with the number of clients. To address this, we introduce an end-to-end adaptive federated learning method in which both clients and central agents adaptively select their local learning rates and momentum parameters. Our approach models federated learning as a dynamical system, allowing us to draw on principles from numerical simulation and physical design. Through this perspective, selecting momentum parameters equates to critically damping the system for fast, stable convergence, while learning rates for clients and central servers are adaptively selected to satisfy accuracy properties from numerical simulation. The result is an adaptive, momentum-based federated learning algorithm in which the learning rates for clients and servers are dynamically adjusted and controlled by a single, global hyperparameter. By designing a fully integrated solution for both adaptive client updates and central agent aggregation, our method is capable of handling key challenges of heterogeneous federated learning, including objective inconsistency and client drift. Importantly, our approach achieves fast convergence while being insensitive to the choice of the global hyperparameter, making it well-suited for rapid prototyping and scalable deployment. Compared to state-of-the-art adaptive methods, our framework is shown to deliver superior convergence for heterogeneous federated learning while eliminating the need for hyperparameter tuning both client and server updates.
Learning to Capture Rocks using an Excavator: A Reinforcement Learning Approach with Guiding Reward Formulation
Rock capturing with standard excavator buckets is a challenging task typically requiring the expertise of skilled operators. Unlike soil digging, it involves manipulating large, irregular rocks in unstructured environments where complex contact interactions with granular material make model-based control impractical. Existing autonomous excavation methods focus mainly on continuous media or rely on specialized grippers, limiting their applicability to real-world construction sites. This paper introduces a fully data-driven control framework for rock capturing that eliminates the need for explicit modeling of rock or soil properties. A model-free reinforcement learning agent is trained in the AGX Dynamics simulator using the Proximal Policy Optimization (PPO) algorithm and a guiding reward formulation. The learned policy outputs joint velocity commands directly to the boom, arm, and bucket of a CAT365 excavator model. Robustness is enhanced through extensive domain randomization of rock geometry, density, and mass, as well as the initial configurations of the bucket, rock, and goal position. To the best of our knowledge, this is the first study to develop and evaluate an RL-based controller for the rock capturing task. Experimental results show that the policy generalizes well to unseen rocks and varying soil conditions, achieving high success rates comparable to those of human participants while maintaining machine stability. These findings demonstrate the feasibility of learning-based excavation strategies for discrete object manipulation without requiring specialized hardware or detailed material models.
DADS Under Unknown Input Coefficients
This short note shows that the Deadzone-Adapted Disturbance Suppression (DADS) adaptive control scheme is applicable to systems with unknown input coefficients. We study time-invariant, control-affine systems that satisfy the matching condition for which no bounds for the disturbance and the unknown parameters are known. The input coefficients can be time-varying as well as the unknown parameters. The only thing assumed for the input coefficients is their sign. The adaptive control design is Lyapunov-based and can be accomplished for every system for which a smooth globally stabilizing feedback exists when the disturbances are absent and all unknown parameters are known. The design is given by simple, explicit formulas. The proposed controllers guarantee an attenuation of the plant state to an assignable small level, despite unknown bounds on the parameters and disturbance, without a drift of the gain, state, and input.
comment: 23 pages, 10 figures
From Shadow to Light: Toward Safe and Efficient Policy Learning Across MPC, DeePC, RL, and LLM Agents
One of the main challenges in modern control applications, particularly in robot and vehicle motion control, is achieving accurate, fast, and safe movement. To address this, optimal control policies have been developed to enforce safety while ensuring high performance. Since basic first-principles models of real systems are often available, model-based controllers are widely used. Model predictive control (MPC) is a leading approach that optimizes performance while explicitly handling safety constraints. However, obtaining accurate models for complex systems is difficult, which motivates data-driven alternatives. ML-based MPC leverages learned models to reduce reliance on hand-crafted dynamics, while reinforcement learning (RL) can learn near-optimal policies directly from interaction data. Data-enabled predictive control (DeePC) goes further by bypassing modeling altogether, directly learning safe policies from raw input-output data. Recently, large language model (LLM) agents have also emerged, translating natural language instructions into structured formulations of optimal control problems. Despite these advances, data-driven policies face significant limitations. They often suffer from slow response times, high computational demands, and large memory needs, making them less practical for real-world systems with fast dynamics, limited onboard computing, or strict memory constraints. To address this, various technique, such as reduced-order modeling, function-approximated policy learning, and convex relaxations, have been proposed to reduce computational complexity. In this paper, we present eight such approaches and demonstrate their effectiveness across real-world applications, including robotic arms, soft robots, and vehicle motion control.
A Conformal Prediction-Based Chance-Constrained Programming Approach for 24/7 Carbon-Free Data Center Operation Scheduling
The rapid growth of AI applications is dramatically increasing data center energy demand, exacerbating carbon emissions, and necessitating a shift towards 24/7 carbon-free energy (CFE). Unlike traditional annual energy matching, 24/7 CFE requires matching real-time electricity consumption with clean energy generation every hour, presenting significant challenges due to the inherent variability and forecasting errors of renewable energy sources. Traditional robust and data-driven optimization methods often fail to leverage the features of the prediction model (also known as contextual or covariate information) when constructing the uncertainty set, leading to overly conservative operational decisions. This paper proposes a comprehensive approach for 24/7 CFE data center operation scheduling, focusing on robust decision-making under renewable generation uncertainty. This framework leverages covariate information through a multi-variable conformal prediction (CP) technique to construct statistically valid and adaptive uncertainty sets for renewable forecasts. The uncertainty sets directly inform the chance-constrained programming (CCP) problem, ensuring that chance constraints are met with a specified probability. We further establish theoretical underpinnings connecting the CP-generated uncertainty sets to the statistical feasibility guarantees of the CCP. Numerical results highlight the benefits of this covariate-aware approach, demonstrating up to 6.65% cost reduction and 6.96% decrease in carbon-based energy usage compared to conventional covariate-independent methods, thereby enabling data centers to progress toward 24/7 CEF.
Distributed MPC-based Coordination of Traffic Perimeter and Signal Control: A Lexicographic Optimization Approach
This paper introduces a comprehensive strategy that integrates traffic perimeter control with traffic signal control to alleviate congestion in an urban traffic network (UTN). The strategy is formulated as a lexicographic multi-objective optimization problem, starting with the regulation of traffic inflows at boundary junctions to maximize the capacity while ensuring a smooth operation of the UTN. Following this, the signal timings at internal junctions are collaboratively optimized to enhance overall traffic conditions under the regulated inflows. The use of a model predictive control (MPC) approach ensures that the control solution adheres to safety and capacity constraints within the network. To address the computational complexity of the problem, the UTN is divided into subnetworks, each managed by a local agent. A distributed solution method based on the alternating direction method of multipliers (ADMM) algorithm is employed, allowing each agent to determine its optimal control decisions using local information from its subnetwork and neighboring agents. Numerical simulations using VISSIM and MATLAB demonstrate the effectiveness of the proposed traffic control strategy.
Data-driven Practical Stabilization of Nonlinear Systems via Chain Policies: Sample Complexity and Incremental Learning
We propose a method for data-driven practical stabilization of nonlinear systems with provable guarantees, based on the concept of Nonparametric Chain Policies (NCPs). The approach employs a normalized nearest-neighbor rule to assign, at each state, a finite-duration control signal derived from stored data, after which the process repeats. Unlike recent works that model the system as linear, polynomial, or polynomial fraction, we only assume the system to be locally Lipschitz. Our analysis builds on the framework of Recurrent Lyapunov Functions (RLFs), which enable data-driven certification of practical stability using standard norm functions instead of requiring the explicit construction of a classical Lyapunov function. To extend this framework, we introduce the concept of Recurrent Control Lyapunov Functions (R-CLFs), which can certify the existence of an NCP that practically stabilizes an arbitrarily small c-neighborhood of an equilibrium point. We also provide an explicit sample complexity guarantee of O((3/rho)^d log(R/c)) number of trajectories, where R is the domain radius, d the state dimension, and rho a system-dependent constant. The proposed Chain Policies are nonparametric, thus allowing new verified data to be readily incorporated into the policy to either improve convergence rate or enlarge the certified region. Numerical experiments illustrate and validate these properties.
Multi-Agent Guided Policy Search for Non-Cooperative Dynamic Games
Multi-agent reinforcement learning (MARL) optimizes strategic interactions in non-cooperative dynamic games, where agents have misaligned objectives. However, data-driven methods such as multi-agent policy gradients (MA-PG) often suffer from instability and limit-cycle behaviors. Prior stabilization techniques typically rely on entropy-based exploration, which slows learning and increases variance. We propose a model-based approach that incorporates approximate priors into the reward function as regularization. In linear quadratic (LQ) games, we prove that such priors stabilize policy gradients and guarantee local exponential convergence to an approximate Nash equilibrium. We then extend this idea to infinite-horizon nonlinear games by introducing Multi-agent Guided Policy Search (MA-GPS), which constructs short-horizon local LQ approximations from trajectories of current policies to guide training. Experiments on nonlinear vehicle platooning and a six-player strategic basketball formation show that MA-GPS achieves faster convergence and more stable learning than existing MARL methods.
comment: We fix a few typos: 1. In the Introduction, mode-based optimization -> model-based optimization; 2. In the LQ game definition, there is an accidentally missing superscript i in equation (8). We apologize for the confusion that they may raise
Adaptive Disturbance Observer-based Full-Order Integral-Terminal Sliding Mode Control with Unknown A Priori Bound on Uncertainty
This study presents a novel, continuous finite-time control strategy for a class of nonlinear systems subject to matched uncertainties with unknown bounds. We propose an Adaptive Disturbance Observer-based Full-order Integral-Terminal Sliding Mode Control (ADO-FOITSMC) to stabilize a chain of integrators in presence of exogenous disturbances whose time derivative is bounded by a constant that is not known a priori. Key features of this approach include a significant reduction in control input chattering and a non-monotonic adaptive law for the observer gains, which prevents overestimation while ensuring the global boundedness of system states. The effectiveness and practical viability of the proposed algorithm are demonstrated through its application to the attitude stabilization of a rigid spacecraft.
comment: 6 pages, 4 figures
MAD: A Magnitude And Direction Policy Parametrization for Stability Constrained Reinforcement Learning
We introduce magnitude and direction (MAD) policies, a policy parameterization for reinforcement learning (RL) that preserves Lp closed-loop stability for nonlinear dynamical systems. Despite their completeness in describing all stabilizing controllers, methods based on nonlinear Youla and system-level synthesis are significantly impacted by the difficulty of parametrizing Lp-stable operators. In contrast, MAD policies introduce explicit feedback on state-dependent features - a key element behind the success of reinforcement learning pipelines - without jeopardizing closed-loop stability. This is achieved by letting the magnitude of the control input be described by a disturbance-feedback Lp-stable operator, while selecting its direction based on state-dependent features through a universal function approximator. We further characterize the robust stability properties of MAD policies under model mismatch. Unlike existing disturbance-feedback policy parametrizations, MAD policies introduce state-feedback components compatible with model-free RL pipelines, ensuring closed-loop stability with no model information beyond assuming open-loop stability. Numerical experiments show that MAD policies trained with deep deterministic policy gradient (DDPG) methods generalize to unseen scenarios - matching the performance of standard neural network policies while guaranteeing closed-loop stability by design.
Rough Stochastic Pontryagin Maximum Principle and an Indirect Shooting Method
We derive first-order Pontryagin optimality conditions for stochastic optimal control with deterministic controls for systems modeled by rough differential equations (RDE) driven by Gaussian rough paths. This Pontryagin Maximum Principle (PMP) applies to systems following stochastic differential equations (SDE) driven by Brownian motion, yet it does not rely on forward-backward SDEs and involves the same Hamiltonian as the deterministic PMP. The proof consists of first deriving various integrable error bounds for solutions to nonlinear and linear RDEs by leveraging recent results on Gaussian rough paths. The PMP then follows using standard techniques based on needle-like variations. As an application, we propose the first indirect shooting method for nonlinear stochastic optimal control and show that it converges 10x faster than a direct method on a stabilization task.
comment: Revisions to the presentation and proofs
Robotics
Reliable and Scalable Robot Policy Evaluation with Imperfect Simulators
Rapid progress in imitation learning, foundation models, and large-scale datasets has led to robot manipulation policies that generalize to a wide-range of tasks and environments. However, rigorous evaluation of these policies remains a challenge. Typically in practice, robot policies are often evaluated on a small number of hardware trials without any statistical assurances. We present SureSim, a framework to augment large-scale simulation with relatively small-scale real-world testing to provide reliable inferences on the real-world performance of a policy. Our key idea is to formalize the problem of combining real and simulation evaluations as a prediction-powered inference problem, in which a small number of paired real and simulation evaluations are used to rectify bias in large-scale simulation. We then leverage non-asymptotic mean estimation algorithms to provide confidence intervals on mean policy performance. Using physics-based simulation, we evaluate both diffusion policy and multi-task fine-tuned \(\pi_0\) on a joint distribution of objects and initial conditions, and find that our approach saves over \(20-25\%\) of hardware evaluation effort to achieve similar bounds on policy performance.
Stability-Aware Retargeting for Humanoid Multi-Contact Teleoperation
Teleoperation is a powerful method to generate reference motions and enable humanoid robots to perform a broad range of tasks. However, teleoperation becomes challenging when using hand contacts and non-coplanar surfaces, often leading to motor torque saturation or loss of stability through slipping. We propose a centroidal stability-based retargeting method that dynamically adjusts contact points and posture during teleoperation to enhance stability in these difficult scenarios. Central to our approach is an efficient analytical calculation of the stability margin gradient. This gradient is used to identify scenarios for which stability is highly sensitive to teleoperation setpoints and inform the local adjustment of these setpoints. We validate the framework in simulation and hardware by teleoperating manipulation tasks on a humanoid, demonstrating increased stability margins. We also demonstrate empirically that higher stability margins correlate with improved impulse resilience and joint torque margin.
RAP: 3D Rasterization Augmented End-to-End Planning
Imitation learning for end-to-end driving trains policies only on expert demonstrations. Once deployed in a closed loop, such policies lack recovery data: small mistakes cannot be corrected and quickly compound into failures. A promising direction is to generate alternative viewpoints and trajectories beyond the logged path. Prior work explores photorealistic digital twins via neural rendering or game engines, but these methods are prohibitively slow and costly, and thus mainly used for evaluation. In this work, we argue that photorealism is unnecessary for training end-to-end planners. What matters is semantic fidelity and scalability: driving depends on geometry and dynamics, not textures or lighting. Motivated by this, we propose 3D Rasterization, which replaces costly rendering with lightweight rasterization of annotated primitives, enabling augmentations such as counterfactual recovery maneuvers and cross-agent view synthesis. To transfer these synthetic views effectively to real-world deployment, we introduce a Raster-to-Real feature-space alignment that bridges the sim-to-real gap. Together, these components form Rasterization Augmented Planning (RAP), a scalable data augmentation pipeline for planning. RAP achieves state-of-the-art closed-loop robustness and long-tail generalization, ranking first on four major benchmarks: NAVSIM v1/v2, Waymo Open Dataset Vision-based E2E Driving, and Bench2Drive. Our results show that lightweight rasterization with feature alignment suffices to scale E2E training, offering a practical alternative to photorealistic rendering. Project page: https://alan-lanfeng.github.io/RAP/.
A KL-regularization framework for learning to plan with adaptive priors
Effective exploration remains a central challenge in model-based reinforcement learning (MBRL), particularly in high-dimensional continuous control tasks where sample efficiency is crucial. A prominent line of recent work leverages learned policies as proposal distributions for Model-Predictive Path Integral (MPPI) planning. Initial approaches update the sampling policy independently of the planner distribution, typically maximizing a learned value function with deterministic policy gradient and entropy regularization. However, because the states encountered during training depend on the MPPI planner, aligning the sampling policy with the planner improves the accuracy of value estimation and long-term performance. To this end, recent methods update the sampling policy by minimizing KL divergence to the planner distribution or by introducing planner-guided regularization into the policy update. In this work, we unify these MPPI-based reinforcement learning methods under a single framework by introducing Policy Optimization-Model Predictive Control (PO-MPC), a family of KL-regularized MBRL methods that integrate the planner's action distribution as a prior in policy optimization. By aligning the learned policy with the planner's behavior, PO-MPC allows more flexibility in the policy updates to trade off Return maximization and KL divergence minimization. We clarify how prior approaches emerge as special cases of this family, and we explore previously unstudied variations. Our experiments show that these extended configurations yield significant performance improvements, advancing the state of the art in MPPI-based RL.
comment: Preprint
Integrated Planning and Control on Manifolds: Factor Graph Representation and Toolkit
Model predictive control (MPC) faces significant limitations when applied to systems evolving on nonlinear manifolds, such as robotic attitude dynamics and constrained motion planning, where traditional Euclidean formulations struggle with singularities, over-parameterization, and poor convergence. To overcome these challenges, this paper introduces FactorMPC, a factor-graph based MPC toolkit that unifies system dynamics, constraints, and objectives into a modular, user-friendly, and efficient optimization structure. Our approach natively supports manifold-valued states with Gaussian uncertainties modeled in tangent spaces. By exploiting the sparsity and probabilistic structure of factor graphs, the toolkit achieves real-time performance even for high-dimensional systems with complex constraints. The velocity-extended on-manifold control barrier function (CBF)-based obstacle avoidance factors are designed for safety-critical applications. By bridging graphical models with safety-critical MPC, our work offers a scalable and geometrically consistent framework for integrated planning and control. The simulations and experimental results on the quadrotor demonstrate superior trajectory tracking and obstacle avoidance performance compared to baseline methods. To foster research reproducibility, we have provided open-source implementation offering plug-and-play factors.
ContextVLA: Vision-Language-Action Model with Amortized Multi-Frame Context
Leveraging temporal context is crucial for success in partially observable robotic tasks. However, prior work in behavior cloning has demonstrated inconsistent performance gains when using multi-frame observations. In this paper, we introduce ContextVLA, a policy model that robustly improves robotic task performance by effectively leveraging multi-frame observations. Our approach is motivated by the key observation that Vision-Language-Action models (VLA), i.e., policy models built upon a Vision-Language Model (VLM), more effectively utilize multi-frame observations for action generation. This suggests that VLMs' inherent temporal understanding capability enables them to extract more meaningful context from multi-frame observations. However, the high dimensionality of video inputs introduces significant computational overhead, making VLA training and inference inefficient. To address this, ContextVLA compresses past observations into a single context token, allowing the policy to efficiently leverage temporal context for action generation. Our experiments show that ContextVLA consistently improves over single-frame VLAs and achieves the benefits of full multi-frame training but with reduced training and inference times.
comment: Project page: https://huiwon-jang.github.io/contextvla
Flexible Locomotion Learning with Diffusion Model Predictive Control
Legged locomotion demands controllers that are both robust and adaptable, while remaining compatible with task and safety considerations. However, model-free reinforcement learning (RL) methods often yield a fixed policy that can be difficult to adapt to new behaviors at test time. In contrast, Model Predictive Control (MPC) provides a natural approach to flexible behavior synthesis by incorporating different objectives and constraints directly into its optimization process. However, classical MPC relies on accurate dynamics models, which are often difficult to obtain in complex environments and typically require simplifying assumptions. We present Diffusion-MPC, which leverages a learned generative diffusion model as an approximate dynamics prior for planning, enabling flexible test-time adaptation through reward and constraint based optimization. Diffusion-MPC jointly predicts future states and actions; at each reverse step, we incorporate reward planning and impose constraint projection, yielding trajectories that satisfy task objectives while remaining within physical limits. To obtain a planning model that adapts beyond imitation pretraining, we introduce an interactive training algorithm for diffusion based planner: we execute our reward-and-constraint planner in environment, then filter and reweight the collected trajectories by their realized returns before updating the denoiser. Our design enables strong test-time adaptability, allowing the planner to adjust to new reward specifications without retraining. We validate Diffusion-MPC on real world, demonstrating strong locomotion and flexible adaptation.
comment: 9 pages, 8 figures
Zenbo Patrol: A Social Assistive Robot Based on Multimodal Deep Learning for Real-time Illegal Parking Recognition and Notification
In the study, the social robot act as a patrol to recognize and notify illegal parking in real-time. Dual-model pipeline method and large multimodal model were compared, and the GPT-4o multimodal model was adopted in license plate recognition without preprocessing. For moving smoothly on a flat ground, the robot navigated in a simulated parking lot in the experiments. The robot changes angle view of the camera automatically to capture the images around with the format of license plate number. From the captured images of the robot, the numbers on the plate are recognized through the GPT-4o model, and identifies legality of the numbers. When an illegal parking is detected, the robot sends Line messages to the system manager immediately. The contribution of the work is that a novel multimodal deep learning method has validated with high accuracy in license plate recognition, and a social assistive robot is also provided for solving problems in a real scenario, and can be applied in an indoor parking lot.
Using Robotics to Improve Transcatheter Edge-to-Edge Repair of the Mitral Valve
Transcatheter valve repair presents significant challenges due to the mechanical limitations and steep learning curve associated with manual catheter systems. This paper investigates the use of robotics to facilitate transcatheter procedures in the context of mitral valve edge-to-edge repair. The complex handle-based control of a clinical repair device is replaced by intuitive robotic joint-based control via a game controller. Manual versus robotic performance is analyzed by decomposing the overall device delivery task into motion-specific steps and comparing capabilities on a step-by-step basis in a phantom model of the heart and vasculature. Metrics include procedure duration and clip placement accuracy. Results demonstrate that the robotic system can reduce procedural time and motion errors while also improving accuracy of clip placement. These findings suggest that robotic assistance can address key limitations of manual systems, offering a more reliable and user-friendly platform for complex transcatheter procedures.
comment: 7 pages, 9 figures
VBM-NET: Visual Base Pose Learning for Mobile Manipulation using Equivariant TransporterNet and GNNs
In Mobile Manipulation, selecting an optimal mobile base pose is essential for successful object grasping. Previous works have addressed this problem either through classical planning methods or by learning state-based policies. They assume access to reliable state information, such as the precise object poses and environment models. In this work, we study base pose planning directly from top-down orthographic projections of the scene, which provide a global overview of the scene while preserving spatial structure. We propose VBM-NET, a learning-based method for base pose selection using such top-down orthographic projections. We use equivariant TransporterNet to exploit spatial symmetries and efficiently learn candidate base poses for grasping. Further, we use graph neural networks to represent a varying number of candidate base poses and use Reinforcement Learning to determine the optimal base pose among them. We show that VBM-NET can produce comparable solutions to the classical methods in significantly less computation time. Furthermore, we validate sim-to-real transfer by successfully deploying a policy trained in simulation to real-world mobile manipulation.
Learning to Capture Rocks using an Excavator: A Reinforcement Learning Approach with Guiding Reward Formulation
Rock capturing with standard excavator buckets is a challenging task typically requiring the expertise of skilled operators. Unlike soil digging, it involves manipulating large, irregular rocks in unstructured environments where complex contact interactions with granular material make model-based control impractical. Existing autonomous excavation methods focus mainly on continuous media or rely on specialized grippers, limiting their applicability to real-world construction sites. This paper introduces a fully data-driven control framework for rock capturing that eliminates the need for explicit modeling of rock or soil properties. A model-free reinforcement learning agent is trained in the AGX Dynamics simulator using the Proximal Policy Optimization (PPO) algorithm and a guiding reward formulation. The learned policy outputs joint velocity commands directly to the boom, arm, and bucket of a CAT365 excavator model. Robustness is enhanced through extensive domain randomization of rock geometry, density, and mass, as well as the initial configurations of the bucket, rock, and goal position. To the best of our knowledge, this is the first study to develop and evaluate an RL-based controller for the rock capturing task. Experimental results show that the policy generalizes well to unseen rocks and varying soil conditions, achieving high success rates comparable to those of human participants while maintaining machine stability. These findings demonstrate the feasibility of learning-based excavation strategies for discrete object manipulation without requiring specialized hardware or detailed material models.
HEHA: Hierarchical Planning for Heterogeneous Multi-Robot Exploration of Unknown Environments
This paper considers the path planning problem for autonomous exploration of an unknown environment using multiple heterogeneous robots such as drones, wheeled, and legged robots, which have different capabilities to traverse complex terrains. A key challenge there is to intelligently allocate the robots to the unknown areas to be explored and determine the visiting order of those spaces subject to traversablity constraints, which leads to a large scale constrained optimization problem that needs to be quickly and iteratively solved every time when new space are explored. To address the challenge, we propose HEHA (Hierarchical Exploration with Heterogeneous Agents) by leveraging a recent hierarchical method that decompose the exploration into global planning and local planning. The major contribution in HEHA is its global planning, where we propose a new routing algorithm PEAF (Partial Anytime Focal search) that can quickly find bounded sub-optimal solutions to minimize the maximum path length among the agents subject to traversability constraints. Additionally, the local planner in HEHA also considers heterogeneity to avoid repeated and duplicated exploration among the robots. The experimental results show that, our HEHA can reduce up to 30% of the exploration time than the baselines.
comment: 5 Figures
From Shadow to Light: Toward Safe and Efficient Policy Learning Across MPC, DeePC, RL, and LLM Agents
One of the main challenges in modern control applications, particularly in robot and vehicle motion control, is achieving accurate, fast, and safe movement. To address this, optimal control policies have been developed to enforce safety while ensuring high performance. Since basic first-principles models of real systems are often available, model-based controllers are widely used. Model predictive control (MPC) is a leading approach that optimizes performance while explicitly handling safety constraints. However, obtaining accurate models for complex systems is difficult, which motivates data-driven alternatives. ML-based MPC leverages learned models to reduce reliance on hand-crafted dynamics, while reinforcement learning (RL) can learn near-optimal policies directly from interaction data. Data-enabled predictive control (DeePC) goes further by bypassing modeling altogether, directly learning safe policies from raw input-output data. Recently, large language model (LLM) agents have also emerged, translating natural language instructions into structured formulations of optimal control problems. Despite these advances, data-driven policies face significant limitations. They often suffer from slow response times, high computational demands, and large memory needs, making them less practical for real-world systems with fast dynamics, limited onboard computing, or strict memory constraints. To address this, various technique, such as reduced-order modeling, function-approximated policy learning, and convex relaxations, have been proposed to reduce computational complexity. In this paper, we present eight such approaches and demonstrate their effectiveness across real-world applications, including robotic arms, soft robots, and vehicle motion control.
Feedback Matters: Augmenting Autonomous Dissection with Visual and Topological Feedback
Autonomous surgical systems must adapt to highly dynamic environments where tissue properties and visual cues evolve rapidly. Central to such adaptability is feedback: the ability to sense, interpret, and respond to changes during execution. While feedback mechanisms have been explored in surgical robotics, ranging from tool and tissue tracking to error detection, existing methods remain limited in handling the topological and perceptual challenges of tissue dissection. In this work, we propose a feedback-enabled framework for autonomous tissue dissection that explicitly reasons about topological changes from endoscopic images after each dissection action. This structured feedback guides subsequent actions, enabling the system to localize dissection progress and adapt policies online. To improve the reliability of such feedback, we introduce visibility metrics that quantify tissue exposure and formulate optimal controller designs that actively manipulate tissue to maximize visibility. Finally, we integrate these feedback mechanisms with both planning-based and learning-based dissection methods, and demonstrate experimentally that they significantly enhance autonomy, reduce errors, and improve robustness in complex surgical scenarios.
SITCOM: Scaling Inference-Time COMpute for VLAs NeurIPS 2025
Learning robust robotic control policies remains a major challenge due to the high cost of collecting labeled data, limited generalization to unseen environments, and difficulties in planning over long horizons. While Vision-Language-Action (VLA) models offer a promising solution by grounding natural language instructions into single-step control commands, they often lack mechanisms for lookahead and struggle with compounding errors in dynamic tasks. In this project, we introduce Scaling Inference-Time COMpute for VLAs (SITCOM), a framework that augments any pretrained VLA with model-based rollouts and reward-based trajectory selection, inspired by Model Predictive Control algorithm. SITCOM leverages a learned dynamics model to simulate multi-step action rollouts to select the best candidate plan for real-world execution, transforming one-shot VLAs into robust long-horizon planners. We develop an efficient transformer-based dynamics model trained on large-scale BridgeV2 data and fine-tuned on SIMPLER environments to bridge the Real2Sim gap, and score candidate rollouts using rewards from simulator. Through comprehensive evaluation across multiple tasks and settings in the SIMPLER environment, we demonstrate that SITCOM when combined with a good reward function can significantly improve task completion rate from 48% to 72% using trained dynamics model.
comment: Accepted at the NeurIPS 2025 Workshop on Space in Vision, Language, and Embodied AI (SpaVLE). *Equal contribution
Learning Geometry-Aware Nonprehensile Pushing and Pulling with Dexterous Hands
Nonprehensile manipulation, such as pushing and pulling, enables robots to move, align, or reposition objects that may be difficult to grasp due to their geometry, size, or relationship to the robot or the environment. Much of the existing work in nonprehensile manipulation relies on parallel-jaw grippers or tools such as rods and spatulas. In contrast, multi-fingered dexterous hands offer richer contact modes and versatility for handling diverse objects to provide stable support over the objects, which compensates for the difficulty of modeling the dynamics of nonprehensile manipulation. Therefore, we propose Geometry-aware Dexterous Pushing and Pulling (GD2P) for nonprehensile manipulation with dexterous robotic hands. We study pushing and pulling by framing the problem as synthesizing and learning pre-contact dexterous hand poses that lead to effective manipulation. We generate diverse hand poses via contact-guided sampling, filter them using physics simulation, and train a diffusion model conditioned on object geometry to predict viable poses. At test time, we sample hand poses and use standard motion planners to select and execute pushing and pulling actions. We perform 840 real-world experiments with an Allegro Hand, comparing our method to baselines. The results indicate that GD2P offers a scalable route for training dexterous nonprehensile manipulation policies. We further demonstrate GD2P on a LEAP Hand, highlighting its applicability to different hand morphologies. Our pre-trained models and dataset, including 1.3 million hand poses across 2.3k objects, will be open-source to facilitate further research. Our project website is available at: geodex2p.github.io.
comment: Typos corrected
A Benchmarking Study of Vision-Based Robotic Grasping Algorithms: A Comparative Analysis
We present a benchmarking study of vision-based robotic grasping algorithms and provide a comparative analysis. In particular, we compare two machine-learning-based and two analytical algorithms using an existing benchmarking protocol from the literature and determine the algorithms strengths and weaknesses under different experimental conditions. These conditions include variations in lighting, background textures, cameras with different noise levels, and grippers. We also run analogous experiments in simulations and with real robots and present the discrepancies. Some experiments are also run in two different laboratories using the same protocols to further analyze the repeatability of our results. We believe that this study, comprising 5040 experiments, provides important insights into the role and challenges of systematic experimentation in robotic manipulation and guides the development of new algorithms by considering the factors that could impact the performance. The experiment recordings and our benchmarking software are publicly available.
comment: Accepted in IEEE Robotics and Automation Magazine 2025. Previously this version with slight modifications appeared as arXiv:2503.11163, which was submitted as a new work by accident. We have requested for removal of arXiv:2503.11163 from the other account, while simultaneously submitting this version as a revision to arXiv:2307.11622. We are stressing that this submission is intentional
CHOICE: Coordinated Human-Object Interaction in Cluttered Environments for Pick-and-Place Actions
Animating human-scene interactions such as pick-and-place tasks in cluttered, complex layouts is a challenging task, with objects of a wide variation of geometries and articulation under scenarios with various obstacles. The main difficulty lies in the sparsity of the motion data compared to the wide variation of the objects and environments as well as the poor availability of transition motions between different tasks, increasing the complexity of the generalization to arbitrary conditions. To cope with this issue, we develop a system that tackles the interaction synthesis problem as a hierarchical goal-driven task. Firstly, we develop a bimanual scheduler that plans a set of keyframes for simultaneously controlling the two hands to efficiently achieve the pick-and-place task from an abstract goal signal such as the target object selected by the user. Next, we develop a neural implicit planner that generates guidance hand trajectories under diverse object shape/types and obstacle layouts. Finally, we propose a linear dynamic model for our DeepPhase controller that incorporates a Kalman filter to enable smooth transitions in the frequency domain, resulting in a more realistic and effective multi-objective control of the character.Our system can produce a wide range of natural pick-and-place movements with respect to the geometry of objects, the articulation of containers and the layout of the objects in the scene.
comment: ACM Transaction on Graphics 2025;21 pages, 15 figures; Webpage: https://lujintaozju.github.io/publications/CHOICE/
Time-Optimized Safe Navigation in Unstructured Environments through Learning Based Depth Completion
Quadrotors hold significant promise for several applications such as agriculture, search and rescue, and infrastructure inspection. Achieving autonomous operation requires systems to navigate safely through complex and unfamiliar environments. This level of autonomy is particularly challenging due to the complexity of such environments and the need for real-time decision making especially for platforms constrained by size, weight, and power (SWaP), which limits flight time and precludes the use of bulky sensors like Light Detection and Ranging (LiDAR) for mapping. Furthermore, computing globally optimal, collision-free paths and translating them into time-optimized, safe trajectories in real time adds significant computational complexity. To address these challenges, we present a fully onboard, real-time navigation system that relies solely on lightweight onboard sensors. Our system constructs a dense 3D map of the environment using a novel visual depth estimation approach that fuses stereo and monocular learning-based depth, yielding longer-range, denser, and less noisy depth maps than conventional stereo methods. Building on this map, we introduce a novel planning and trajectory generation framework capable of rapidly computing time-optimal global trajectories. As the map is incrementally updated with new depth information, our system continuously refines the trajectory to maintain safety and optimality. Both our planner and trajectory generator outperforms state-of-the-art methods in terms of computational efficiency and guarantee obstacle-free trajectories. We validate our system through robust autonomous flight experiments in diverse indoor and outdoor environments, demonstrating its effectiveness for safe navigation in previously unknown settings.
INGRID: Intelligent Generative Robotic Design Using Large Language Models
The integration of large language models (LLMs) into robotic systems has accelerated progress in embodied artificial intelligence, yet current approaches remain constrained by existing robotic architectures, particularly serial mechanisms. This hardware dependency fundamentally limits the scope of robotic intelligence. Here, we present INGRID (Intelligent Generative Robotic Design), a framework that enables the automated design of parallel robotic mechanisms through deep integration with reciprocal screw theory and kinematic synthesis methods. We decompose the design challenge into four progressive tasks: constraint analysis, kinematic joint generation, chain construction, and complete mechanism design. INGRID demonstrates the ability to generate novel parallel mechanisms with both fixed and variable mobility, discovering kinematic configurations not previously documented in the literature. We validate our approach through three case studies demonstrating how INGRID assists users in designing task-specific parallel robots based on desired mobility requirements. By bridging the gap between mechanism theory and machine learning, INGRID enables researchers without specialized robotics training to create custom parallel mechanisms, thereby decoupling advances in robotic intelligence from hardware constraints. This work establishes a foundation for mechanism intelligence, where AI systems actively design robotic hardware, potentially transforming the development of embodied AI systems.
comment: We are revising it
Viewpoint-Agnostic Manipulation Policies with Strategic Vantage Selection
Since vision-based manipulation policies are typically trained from data gathered from a single viewpoint, their performance drops when the view changes during deployment. Naively aggregating demonstrations from numerous random views is not only costly but also known to destabilize learning, as excessive visual diversity acts as noise. We present Vantage, a viewpoint selection framework to fine-tune any pre-trained policy on a small, strategically set of camera poses to induce viewpoint-agnostic behavior. Instead of relying on costly brute-force search over viewpoints, Vantage formulates camera placement as an information gain optimization problem in a continuous space. This approach balances exploration of novel poses with exploitation of promising ones, while also providing theoretical guarantees about convergence and robustness. Across manipulation tasks and policy families, Vantage consistently improves success under viewpoint shifts compared to fixed, grid, or random data selection strategies with only a handful of fine-tuning steps. Experiments conducted on simulated and real-world setups show that Vantage increases the task success rate by 25% for diffusion policies, and yields robust gains in dynamic-camera settings.
Training-free Task-oriented Grasp Generation
This paper presents a training-free pipeline for task-oriented grasp generation that combines pre-trained grasp generation models with vision-language models (VLMs). Unlike traditional approaches that focus solely on stable grasps, our method incorporates task-specific requirements by leveraging the semantic reasoning capabilities of VLMs. We evaluate five querying strategies, each utilizing different visual representations of candidate grasps, and demonstrate significant improvements over a baseline method in both grasp success and task compliance rates, with absolute gains of up to 36.9\% in overall success rate. Our results underline the potential of VLMs to enhance task-oriented manipulation, providing insights for future research in robotic grasping and human-robot interaction.
comment: Jiaming Wang, Diwen Liu, and Jizhuo Chen contributed equally
Humanoid Policy ~ Human Policy
Training manipulation policies for humanoid robots with diverse data enhances their robustness and generalization across tasks and platforms. However, learning solely from robot demonstrations is labor-intensive, requiring expensive tele-operated data collection which is difficult to scale. This paper investigates a more scalable data source, egocentric human demonstrations, to serve as cross-embodiment training data for robot learning. We mitigate the embodiment gap between humanoids and humans from both the data and modeling perspectives. We collect an egocentric task-oriented dataset (PH2D) that is directly aligned with humanoid manipulation demonstrations. We then train a human-humanoid behavior policy, which we term Human Action Transformer (HAT). The state-action space of HAT is unified for both humans and humanoid robots and can be differentiably retargeted to robot actions. Co-trained with smaller-scale robot data, HAT directly models humanoid robots and humans as different embodiments without additional supervision. We show that human data improves both generalization and robustness of HAT with significantly better data collection efficiency. Code and data: https://human-as-robot.github.io/
comment: Code and data: https://human-as-robot.github.io/
ShapeICP: Iterative Category-level Object Pose and Shape Estimation from Depth
Category-level object pose and shape estimation from a single depth image has recently drawn research attention due to its potential utility for tasks such as robotics manipulation. The task is particularly challenging because the three unknowns, object pose, object shape, and model-to-measurement correspondences, are compounded together, but only a single view of depth measurements is provided. Most of the prior work heavily relies on data-driven approaches to obtain solutions to at least one of the unknowns, and typically two, risking generalization failures if not designed and trained carefully. The shape representations used in the prior work also mainly focus on point clouds and signed distance fields (SDFs). In stark contrast to the prior work, we approach the problem using an iterative estimation method that does not require learning from pose-annotated data. Moreover, we construct and adopt a novel mesh-based object active shape model (ASM), which additionally maintains vertex connectivity compared to the commonly used point-based object ASM. Our algorithm, ShapeICP, is based on the iterative closest point (ICP) algorithm but is equipped with additional features for the category-level pose and shape estimation task. Although not using pose-annotated data, ShapeICP surpasses many data-driven approaches that rely on pose data for training, opening up a new solution space for researchers to consider.
Universal Adversarial Perturbation Attacks On Modern Behavior Cloning Policies
Learning from Demonstration (LfD) algorithms have shown promising results in robotic manipulation tasks, but their vulnerability to offline universal perturbation attacks remains underexplored. This paper presents a comprehensive study of adversarial attacks on both classic and recently proposed algorithms, including Behavior Cloning (BC), LSTM-GMM, Implicit Behavior Cloning (IBC), Diffusion Policy (DP), and Vector-Quantizied Behavior Transformer (VQ-BET). We study the vulnerability of these methods to universal adversarial perturbations. Our experiments on several simulated robotic manipulation tasks reveal that most of the current methods are highly vulnerable to adversarial perturbations. We also show that these attacks are often transferable across algorithms, architectures, and tasks, raising concerning security vulnerabilities to black-box attacks. To the best of our knowledge, we are the first to present a systematic study of the vulnerabilities of different LfD algorithms to both white-box and black-box attacks. Our findings highlight the vulnerabilities of modern BC algorithms, paving the way for future work in addressing such limitations.
Rough Stochastic Pontryagin Maximum Principle and an Indirect Shooting Method
We derive first-order Pontryagin optimality conditions for stochastic optimal control with deterministic controls for systems modeled by rough differential equations (RDE) driven by Gaussian rough paths. This Pontryagin Maximum Principle (PMP) applies to systems following stochastic differential equations (SDE) driven by Brownian motion, yet it does not rely on forward-backward SDEs and involves the same Hamiltonian as the deterministic PMP. The proof consists of first deriving various integrable error bounds for solutions to nonlinear and linear RDEs by leveraging recent results on Gaussian rough paths. The PMP then follows using standard techniques based on needle-like variations. As an application, we propose the first indirect shooting method for nonlinear stochastic optimal control and show that it converges 10x faster than a direct method on a stabilization task.
comment: Revisions to the presentation and proofs
Multiagent Systems
Strategy Logic, Imperfect Information, and Hyperproperties KR 2025
Strategy logic (SL) is a powerful temporal logic that enables first-class reasoning over strategic behavior in multi-agent systems (MAS). In many MASs, the agents (and their strategies) cannot observe the global state of the system, leading to many extensions of SL centered around imperfect information, such as strategy logic with imperfect information (SL$_\mathit{ii}$). Along orthogonal lines, researchers have studied the combination of strategic behavior and hyperproperties. Hyperproperties are system properties that relate multiple executions in a system and commonly arise when specifying security policies. Hyper Strategy Logic (HyperSL) is a temporal logic that combines quantification over strategies with the ability to express hyperproperties on the executions of different strategy profiles. In this paper, we study the relation between SL$_\mathit{ii}$ and HyperSL. Our main result is that both logics (restricted to formulas where no state formulas are nested within path formulas) are equivalent in the sense that we can encode SL$_\mathit{ii}$ instances into HyperSL instances and vice versa. For the former direction, we build on the well-known observation that imperfect information is a hyperproperty. For the latter direction, we construct a self-composition of MASs and show how we can simulate hyperproperties using imperfect information.
comment: KR 2025
Distributed Area Coverage with High Altitude Balloons Using Multi-Agent Reinforcement Learning
High Altitude Balloons (HABs) can leverage stratospheric wind layers for limited horizontal control, enabling applications in reconnaissance, environmental monitoring, and communications networks. Existing multi-agent HAB coordination approaches use deterministic methods like Voronoi partitioning and extremum seeking control for large global constellations, which perform poorly for smaller teams and localized missions. While single-agent HAB control using reinforcement learning has been demonstrated on HABs, coordinated multi-agent reinforcement learning (MARL) has not yet been investigated. This work presents the first systematic application of multi-agent reinforcement learning (MARL) to HAB coordination for distributed area coverage. We extend our previously developed reinforcement learning simulation environment (RLHAB) to support cooperative multi-agent learning, enabling multiple agents to operate simultaneously in realistic atmospheric conditions. We adapt QMIX for HAB area coverage coordination, leveraging Centralized Training with Decentralized Execution to address atmospheric vehicle coordination challenges. Our approach employs specialized observation spaces providing individual state, environmental context, and teammate data, with hierarchical rewards prioritizing coverage while encouraging spatial distribution. We demonstrate that QMIX achieves similar performance to the theoretically optimal geometric deterministic method for distributed area coverage, validating the MARL approach and providing a foundation for more complex autonomous multi-HAB missions where deterministic methods become intractable.
Cooperation in public goods game on regular lattices with agents changing interaction groups
The emergence of cooperation in the groups of interacting agents is one of the most fascinating phenomena observed in many complex systems studied in social science and ecology, even in the situations where one would expect the agent to use a free-rider policy. This is especially surprising in the situation where no external mechanisms based on reputation or punishment are present. One of the possible explanations of this effect is the inhomogeneity of the various aspects of interactions, which can be used to clarify the seemingly paradoxical behavior. In this report we demonstrate that the diversity of interaction networks helps to some degree to explain the emergence of cooperation. We extend the model of spatial interaction diversity introduced in [L. Shang et al., Physica A, 593:126999 (2022)] by enabling the evaluation of the interaction groups. We show that the process of the reevaluation of the interaction group facilitates the emergence of cooperation. Furthermore, we also observe that a significant participation of agents switching their interaction neighborhoods has a negative impact on the formation of cooperation. The introduced scenario can help to understand the formation of cooperation in the systems where no additional mechanisms for controlling agents are included.
comment: 18 pages, 8 figures, code available at https://github.com/jmiszczak/pgg_group_diversity
Deep Reinforcement Learning for Multi-Agent Coordination
We address the challenge of coordinating multiple robots in narrow and confined environments, where congestion and interference often hinder collective task performance. Drawing inspiration from insect colonies, which achieve robust coordination through stigmergy -- modifying and interpreting environmental traces -- we propose a Stigmergic Multi-Agent Deep Reinforcement Learning (S-MADRL) framework that leverages virtual pheromones to model local and social interactions, enabling decentralized emergent coordination without explicit communication. To overcome the convergence and scalability limitations of existing algorithms such as MADQN, MADDPG, and MAPPO, we leverage curriculum learning, which decomposes complex tasks into progressively harder sub-problems. Simulation results show that our framework achieves the most effective coordination of up to eight agents, where robots self-organize into asymmetric workload distributions that reduce congestion and modulate group performance. This emergent behavior, analogous to strategies observed in nature, demonstrates a scalable solution for decentralized multi-agent coordination in crowded environments with communication constraints.
comment: 11 pages, 8 figures, 1 table, presented at SWARM 2022, to be published in Journal of Artificial Life and Robotics
REFINE: Enhancing Program Repair Agents through Context-Aware Patch Refinement
Large Language Models (LLMs) have recently shown strong potential in automatic program repair (APR), especially in repository-level settings where the goal is to generate patches based on natural language issue descriptions, large codebases, and regression tests. However, despite their promise, current LLM-based APR techniques often struggle to produce correct fixes due to limited understanding of code context and over-reliance on incomplete test suites. As a result, they frequently generate Draft Patches-partially correct patches that either incompletely address the bug or overfit to the test cases. In this work, we propose a novel patch refinement framework, Refine, that systematically transforms Draft Patches into correct ones. Refine addresses three key challenges: disambiguating vague issue and code context, diversifying patch candidates through test-time scaling, and aggregating partial fixes via an LLM-powered code review process. We implement Refine as a general refinement module that can be integrated into both open-agent-based and workflow-based APR systems. Our evaluation on the SWE-Bench Lite benchmark shows that Refine achieves state-of-the-art results among workflow-based approaches and approaches the best-known performance across all APR categories. Specifically, Refine boosts AutoCodeRover's performance by 14.67%, achieving a score of 51.67% and surpassing all prior baselines. On SWE-Bench Verified, Refine improves the resolution rate by 12.2%, and when integrated across multiple APR systems, it yields an average improvement of 14%-demonstrating its broad effectiveness and generalizability. These results highlight the effectiveness of refinement as a missing component in current APR pipelines and the potential of agentic collaboration in closing the gap between near-correct and correct patches. We also open source our code.
comment: We also open source our code at https://anonymous.4open.science/r/SemAgent-7B2F/README.md
LLM-MCoX: Large Language Model-based Multi-robot Coordinated Exploration and Search
Autonomous exploration and object search in unknown indoor environments remain challenging for multi-robot systems (MRS). Traditional approaches often rely on greedy frontier assignment strategies with limited inter-robot coordination. In this work, we introduce LLM-MCoX (LLM-based Multi-robot Coordinated Exploration and Search), a novel framework that leverages Large Language Models (LLMs) for intelligent coordination of both homogeneous and heterogeneous robot teams tasked with efficient exploration and target object search. Our approach combines real-time LiDAR scan processing for frontier cluster extraction and doorway detection with multimodal LLM reasoning (e.g., GPT-4o) to generate coordinated waypoint assignments based on shared environment maps and robot states. LLM-MCoX demonstrates superior performance compared to existing methods, including greedy and Voronoi-based planners, achieving 22.7% faster exploration times and 50% improved search efficiency in large environments with 6 robots. Notably, LLM-MCoX enables natural language-based object search capabilities, allowing human operators to provide high-level semantic guidance that traditional algorithms cannot interpret.
RobustFlow: Towards Robust Agentic Workflow Generation
The automated generation of agentic workflows is a promising frontier for enabling large language models (LLMs) to solve complex tasks. However, our investigation reveals that the robustness of agentic workflow remains a critical, unaddressed challenge. Current methods often generate wildly inconsistent workflows when provided with instructions that are semantically identical but differently phrased. This brittleness severely undermines their reliability and trustworthiness for real-world applications. To quantitatively diagnose this instability, we propose metrics based on nodal and topological similarity to evaluate workflow consistency against common semantic variations such as paraphrasing and noise injection. Subsequently, we further propose a novel training framework, RobustFlow, that leverages preference optimization to teach models invariance to instruction variations. By training on sets of synonymous task descriptions, RobustFlow boosts workflow robustness scores to 70\% - 90\%, which is a substantial improvement over existing approaches. The code is publicly available at https://github.com/DEFENSE-SEU/RobustFlow.
Extending Stable and Popular Matching Algorithms from Bipartite to Arbitrary Instances
We consider stable and popular matching problems in arbitrary graphs, which are referred to as stable roommates instances. We extend the 3/2-approximation algorithm for the maximum size weakly stable matching problem to the roommates case, which solves a more than 20 year old open question of Irving and Manlove about the approximability of maximum size weakly stable matchings in roommates instances with ties [Irving and Manlove 2002] and has nice applications for the problem of matching residents to hospitals in the presence of couples. We also extend the algorithm that finds a maximum size popular matching in bipartite graphs in the case of strict preferences and the algorithm to find a popular matching among maximum weight matchings. While previous attempts to extend the idea of promoting the agents or duplicating the edges from bipartite instances to arbitrary ones failed, these results show that with the help of a simple observation, we can indeed bridge the gap and extend these algorithms
OneDSE: A Unified Microprocessor Metric Prediction and Design Space Exploration Framework
With the slowing of Moores Law and increasing impact of power constraints, processor designs rely on architectural innovation to achieve differentiating performance. However, the innovation complexity has simultaneously increased the design space of modern high performance processors. Specifically, we identify two key challenges in prior Design Space Exploration (DSE) approaches for modern CPU design - (a) cost model (prediction method) is either slow or microarchitecture-specific or workload-specific and single model is inefficient to learn the whole design space (b) optimization (exploration method) is slow and inaccurate in the large CPU parameter space. This work presents a novel solution called OneDSE to address these emerging challenges in modern CPU design. OneDSE is a unified cost model (metric predictor) and optimizer (CPU parameter explorer) with three key techniques - 1. Transformer-based workload-Aware CPU Estimation (TrACE) framework to predict metrics in the parameter space (TrACE-p) and parameters in the in the metric space (TrACE-m). TrACE-p outperforms State of The Art (SOTA) IPC prediction methods by 5.71x and 28x for single and multiple workloads respectively while being two orders of magnitude faster. 2. We also propose a novel Metric spAce Search opTimizer (MAST) that leverages TrACE-m and outperforms SoTA metaheuristics by 1.19x while being an order of magnitude faster. 3. We propose Subsystem-based Multi-Agent Reinforcement-learning based fine-Tuning (SMART)-TrACE that achieves a 10.6% reduction in prediction error compared to TrACE, enabling more accurate and efficient exploration of the CPU design space.
Systems and Control (CS)
Use of Quadcopter Wakes to Supplement Strawberry Pollination
Pollinators are critical to the world's ecosystems and food supply, yet recent studies have found pollination shortfalls in several crops, including strawberry. This is troubling because wild and managed pollinators are currently experiencing declines. One possibility is to try and provide supplemental pollination solutions. These solutions should be affordable and simple for farmers to implement if their use is to be widespread; quadcopters are a great example, already used for monitoring on many farms. This paper investigates a new method for artificial pollination based on wind pollination that bears further investigation. After determining the height where the lateral flow is maximized, we performed field experiments with a quadcopter assisting natural pollinators. Although our results in the field were inconclusive, lab studies show that the idea shows promise and could be adapted for better field results.
comment: 7 pages, 7 figures
A Real-Time Framework for Intermediate Map Construction and Kinematically Feasible Off-Road Planning Without OSM
Off-road environments present unique challenges for autonomous navigation due to their complex and unstructured nature. Traditional global path-planning methods, which typically aim to minimize path length and travel time, perform poorly on large-scale maps and fail to account for critical factors such as real-time performance, kinematic feasibility, and memory efficiency. This paper introduces a novel global path-planning method specifically designed for off-road environments, addressing these essential factors. The method begins by constructing an intermediate map within the pixel coordinate system, incorporating geographical features like off-road trails, waterways, restricted and passable areas, and trees. The planning problem is then divided into three sub-problems: graph-based path planning, kinematic feasibility checking, and path smoothing. This approach effectively meets real-time performance requirements while ensuring kinematic feasibility and efficient memory use. The method was tested in various off-road environments with large-scale maps up to several square kilometers in size, successfully identifying feasible paths in an average of 1.5 seconds and utilizing approximately 1.5GB of memory under extreme conditions. The proposed framework is versatile and applicable to a wide range of off-road autonomous navigation tasks, including search and rescue missions and agricultural operations.
3D Electronic-Photonic Heterogenous Interconnect Platforms Enabling Energy-Efficient Scalable Architectures For Future HPC Systems
3D interconnects have emerged as a solution to address the scaling issues of interconnect bandwidth and the memory wall problem in high-performance computing (HPC), such as High-Bandwidth Memory (HBM). However, the copper-based electrical interconnect retains fundamental limitations. Dense I/O for high-speed signals lead to degraded signal quality for end-to-end links, necessitating additional circuits to mitigate signal impairments and resulting in poor energy efficiency. We propose a 3D chiplet stacking electronic-photonic interconnect (EPIC) platform, which offers a solution by moving the high-speed data communication interface to the optical domain across the 3D stack by using Through Silicon Optical Vias (TSOV), while retaining the functionality of electrical TSVs and 2.5D interconnects for power delivery and short-reach low-latency communications. We then benchmark the proposed model against state-of-the-art 3D electrical interconnects to demonstrate our 3D EPIC platform beating the 3D electrical interconnects to $>$10 TB/s/$mm^2$ bandwidth density. We present a pathway to extend our demonstrated, industry-ready design to achieving $\leq$100 fJ/bit high-speed communication.
Convex Pollution Control of Wastewater Treatment Systems
We design a model-predictive controller for managing the actuators in sewer networks. It minimizes flooding and combined-sewer overflow during rain and pollution at other times. To make the problem tractable, we use a convex relaxation of the microbial growth kinetics and a physically motivated linearization of the mass flow bilinearities. With these approximations, the trajectory optimization in each control period is a second-order cone program. In simulation, the controller releases roughly 15% less pollutant mass than a conventional controller while treating nearly the same volume of flow. It does so by better balancing the flow over the treatment plants and over time.
Electrical System Architecture for Aviation Electrification
The electrification of aircraft is reshaping the foundations of aerospace design by positioning electrical systems at the center of propulsion, control, and onboard functionality. This chapter provides an overview of electrical system architectures for electric and hybrid electric aircraft, highlighting both established principles and emerging design strategies. The discussion begins with the motivations for electrification, including reducing environmental impact, improving operational efficiency, and replacing complex pneumatic and hydraulic subsystems with lighter and more reliable electrical alternatives. Aircraft electrical architectures are classified into four major categories: conventional, more electric, all electric, and hybrid electric. A range of system topologies is examined, including direct current (DC), alternating current (AC), hybrid, and distributed configurations. Each is considered in terms of its effectiveness in delivering power, enabling redundancy, supporting fault isolation, and managing thermal performance. Real world examples are presented to demonstrate practical applications, with case studies drawn from the Boeing 787 Dreamliner, the Eviation Alice commuter aircraft, and NASA X57 Maxwell demonstrator. These examples illustrate the ongoing transition from incremental subsystem electrification toward fully integrated architectures that promise higher efficiency and greater sustainability.
Enhancing Data Center Low-Voltage Ride-Through
Data center loads have expanded significantly in recent years. Compared to traditional loads, data centers are highly sensitive to voltage deviations and thus their protection mechanisms trip more proactively during voltage fluctuations. During a grid fault, simultaneous tripping of large-scale data centers can further destabilize the transmission system and even lead to cascading failures. In response, transmission system operators are imposing voltage ride-through (VRT) requirements for data centers. In this work, we enhance the VRT capability of data centers by designing voltage controllers for their internal power distribution network. We first systematically analyze VRT standards and the controllable resources related to data centers. These resources enable the design of voltage control strategies to regulate voltages internal to the data center, thereby allowing loads to remain online during voltage disturbances from the external transmission grid. We study and contrast both centralized and decentralized controllers that unify the control of heterogeneous flexible resources. Additionally, we construct an integrated test system that simulates both the transient fault response of the transmission system and the data center distribution network. Case studies demonstrate that the proposed voltage control mechanisms provide effective yet simple solutions to enhance data center low-voltage ride-through capability.
HOFLON: Hybrid Offline Learning and Online Optimization for Process Start-Up and Grade-Transition Control
Start-ups and product grade-changes are critical steps in continuous-process plant operation, because any misstep immediately affects product quality and drives operational losses. These transitions have long relied on manual operation by a handful of expert operators, but the progressive retirement of that workforce is leaving plant owners without the tacit know-how needed to execute them consistently. In the absence of a process model, offline reinforcement learning (RL) promises to capture and even surpass human expertise by mining historical start-up and grade-change logs, yet standard offline RL struggles with distribution shift and value-overestimation whenever a learned policy ventures outside the data envelope. We introduce HOFLON (Hybrid Offline Learning + Online Optimization) to overcome those limitations. Offline, HOFLON learns (i) a latent data manifold that represents the feasible region spanned by past transitions and (ii) a long-horizon Q-critic that predicts the cumulative reward from state-action pairs. Online, it solves a one-step optimization problem that maximizes the Q-critic while penalizing deviations from the learned manifold and excessive rates of change in the manipulated variables. We test HOFLON on two industrial case studies: a polymerization reactor start-up and a paper-machine grade-change problem, and benchmark it against Implicit Q-Learning (IQL), a leading offline-RL algorithm. In both plants HOFLON not only surpasses IQL but also delivers, on average, better cumulative rewards than the best start-up or grade-change observed in the historical data, demonstrating its potential to automate transition operations beyond current expert capability.
comment: 31 pages, 15 figures, submitted to Computers and Chemical Engineering
A Trustworthy Industrial Fault Diagnosis Architecture Integrating Probabilistic Models and Large Language Models
There are limitations of traditional methods and deep learning methods in terms of interpretability, generalization, and quantification of uncertainty in industrial fault diagnosis, and there are core problems of insufficient credibility in industrial fault diagnosis. The architecture performs preliminary analysis through a Bayesian network-based diagnostic engine and features an LLM-driven cognitive quorum module with multimodal input capabilities. The module conducts expert-level arbitration of initial diagnoses by analyzing structured features and diagnostic charts, prioritizing final decisions after conflicts are identified. To ensure the reliability of the system output, the architecture integrates a confidence calibration module based on temperature calibration and a risk assessment module, which objectively quantifies the reliability of the system using metrics such as expected calibration error (ECE). Experimental results on a dataset containing multiple fault types showed that the proposed framework improved diagnostic accuracy by more than 28 percentage points compared to the baseline model, while the calibrated ECE was reduced by more than 75%. Case studies have confirmed that HCAA effectively corrects misjudgments caused by complex feature patterns or knowledge gaps in traditional models, providing novel and practical engineering solutions for building high-trust, explainable AI diagnostic systems for industrial applications.
comment: 1tables,6 figs,11pages
On the Duality Between Quantized Time and States in Dynamic Simulation
This letter introduces a formal duality between discrete-time and quantized-state numerical methods. We interpret quantized state system (QSS) methods as integration schemes applied to a dual form of the system model, where time is seen as a state-dependent variable. This perspective enables the definition of novel QSS-based schemes inspired by classical time-integration techniques. As a proof of concept, we illustrate the idea by introducing a QSS Adams-Bashforth method applied to a test equation. We then move to demonstrate how the proposed approach can achieve notable performance improvements in realistic power system simulations.
Dissecting Larval Zebrafish Hunting using Deep Reinforcement Learning Trained RNN Agents
Larval zebrafish hunting provides a tractable setting to study how ecological and energetic constraints shape adaptive behavior in both biological brains and artificial agents. Here we develop a minimal agent-based model, training recurrent policies with deep reinforcement learning in a bout-based zebrafish simulator. Despite its simplicity, the model reproduces hallmark hunting behaviors -- including eye vergence-linked pursuit, speed modulation, and stereotyped approach trajectories -- that closely match real larval zebrafish. Quantitative trajectory analyses show that pursuit bouts systematically reduce prey angle by roughly half before strike, consistent with measurements. Virtual experiments and parameter sweeps vary ecological and energetic constraints, bout kinematics (coupled vs. uncoupled turns and forward motion), and environmental factors such as food density, food speed, and vergence limits. These manipulations reveal how constraints and environments shape pursuit dynamics, strike success, and abort rates, yielding falsifiable predictions for neuroscience experiments. These sweeps identify a compact set of constraints -- binocular sensing, the coupling of forward speed and turning in bout kinematics, and modest energetic costs on locomotion and vergence -- that are sufficient for zebrafish-like hunting to emerge. Strikingly, these behaviors arise in minimal agents without detailed biomechanics, fluid dynamics, circuit realism, or imitation learning from real zebrafish data. Taken together, this work provides a normative account of zebrafish hunting as the optimal balance between energetic cost and sensory benefit, highlighting the trade-offs that structure vergence and trajectory dynamics. We establish a virtual lab that narrows the experimental search space and generates falsifiable predictions about behavior and neural coding.
Optimal Energy Management in Indoor Farming Using Lighting Flexibility and Intelligent Model Predictive Control
Indoor farming enables year-round food production but its reliance on artificial lighting significantly increases energy consumption, peak load charges, and energy costs for growers. Recent studies indicate that plants are able to tolerate interruptions in light, enabling the design of 24-hour lighting schedules (or "recipes") with strategic light modulation in alignment with day-ahead pricing. Thus, we propose an optimal lighting control strategy for indoor farming that modulates light intensity and photoperiod to reduce energy costs. The control strategy is implemented within a model predictive control framework and augmented with transformer-based neural networks to forecast 24-hour ahead solar radiation and electricity prices to improve energy cost reduction. The control strategy is informed by real-world experimentation on lettuce crops to discover minimum light exposure and appropriate dark-light intervals, which are mathematically formulated as constraints to maintain plant health. Simulations for a one-hectare greenhouse, based on real electricity market data from Ontario, demonstrate an annual cost reduction of $318,400 (20.9%), a peak load decrease of 1.6 MW (33.32%), and total energy savings of 1890 MWh (20.2%) against a baseline recipe. These findings highlight the potential of intelligent lighting control to improve the sustainability and economic feasibility of indoor farming.
Optimising Battery Energy Storage System Trading via Energy Market Operator Price Forecast
In electricity markets around the world, the ability to anticipate price movements with precision can be the difference between profit and loss, especially for fast-acting assets like battery energy storage systems (BESS). As grid volatility increases due to renewables and market decentralisation, operators and forecasters alike face growing pressure to transform prediction into strategy. Yet while forecast data is abundant, especially in advanced markets like Australia's National Electricity Market (NEM), its practical value in driving real-world BESS trading decisions remains largely unexplored. This thesis dives into that gap. This work addresses a key research question: Can the accuracy of the Australian Energy Market Operator (AEMO) energy price forecasts be systematically leveraged to develop a reliable and profitable battery energy storage system trading algorithm? Despite the availability of AEMO price forecasts, no existing framework evaluates their reliability or incorporates them into practical BESS trading strategies. By analysing patterns in forecast accuracy based on time of day, forecast horizon, and regional variations, this project creates a novel, forecast-informed BESS trading model to optimise arbitrage financial returns. The performance of this forecast-driven algorithm is benchmarked against a basic trading algorithm with no knowledge of forecast data. The study further explores the potential of machine learning techniques to predict future energy prices by enhancing AEMO forecasts to govern a more advanced trading strategy. The research outcomes will inform future improvements in energy market trading models and promote more efficient BESS integration into market operations.
Safety-Oriented Dynamic Path Planning for Automated Vehicles
Ensuring safety in autonomous vehicles necessitates advanced path planning and obstacle avoidance capabilities, particularly in dynamic environments. This paper introduces a bi-level control framework that efficiently augments road boundaries by incorporating time-dependent grid projections of obstacle movements, thus enabling precise and adaptive path planning. The main control loop utilizes Nonlinear Model Predictive Control (NMPC) for real-time path optimization, wherein homotopy-based constraint relaxation is employed to improve the solvability of the optimal control problem (OCP). Furthermore, an independent backup loop runs concurrently to provide safe fallback trajectories when an optimal trajectory cannot be computed by the main loop within a critical time frame, thus enhancing safety and real-time performance. Our evaluation showcases the benefits of the proposed methods in various driving scenarios, highlighting the real-time applicability and robustness of our approach. Overall, the framework represents a significant step towards safer and more reliable autonomous driving in complex and dynamic environments.
comment: Published in 2025 IEEE 101st Vehicular Technology Conference (VTC2025-Spring), Oslo, Norway, June 17-20, 2025. Received Best Conference Paper Award
Cyber Resilience of Three-phase Unbalanced Distribution System Restoration under Sparse Adversarial Attack on Load Forecasting
System restoration is critical for power system resilience, nonetheless, its growing reliance on artificial intelligence (AI)-based load forecasting introduces significant cybersecurity risks. Inaccurate forecasts can lead to infeasible planning, voltage and frequency violations, and unsuccessful recovery of de-energized segments, yet the resilience of restoration processes to such attacks remains largely unexplored. This paper addresses this gap by quantifying how adversarially manipulated forecasts impact restoration feasibility and grid security. We develop a gradient-based sparse adversarial attack that strategically perturbs the most influential spatiotemporal inputs, exposing vulnerabilities in forecasting models while maintaining stealth. We further create a restoration-aware validation framework that embeds these compromised forecasts into a sequential restoration model and evaluates operational feasibility using an unbalanced three-phase optimal power flow formulation. Simulation results show that the proposed approach is more efficient and stealthier than baseline attacks. It reveals system-level failures, such as voltage and power ramping violations that prevent the restoration of critical loads. These findings provide actionable insights for designing cybersecurity-aware restoration planning frameworks.
comment: 10 pages, 7 figures
Learning Safety-Compatible Observers for Unknown Systems
This paper presents a data-driven approach for jointly learning a robust full-state observer and its robustness certificate for systems with unknown dynamics. Leveraging incremental input-to-state stability (delta ISS) notions, we jointly learn a delta ISS Lyapunov function that serves as the robustness certificate and prove practical convergence of the estimation error under standard fidelity assumptions on the learned models. This renders the observer safety-compatible: they can be consumed by certificate-based safe controllers so that, when the controller tolerates bounded estimation error, the controller's certificate remains valid under output feedback. We further extend the approach to interconnected systems via the small-gain theorem, yielding a distributed observer design framework. We validate the approach on a variety of nonlinear systems.
comment: Submitted to American Control Conference (ACC)
Learning Passive Continuous-Time Dynamics with Multistep Port-Hamiltonian Gaussian Processes
We propose the multistep port-Hamiltonian Gaussian process (MS-PHS GP) to learn physically consistent continuous-time dynamics and a posterior over the Hamiltonian from noisy, irregularly-sampled trajectories. By placing a GP prior on the Hamiltonian surface $H$ and encoding variable-step multistep integrator constraints as finite linear functionals, MS-PHS GP enables closed-form conditioning of both the vector field and the Hamiltonian surface without latent states, while enforcing energy balance and passivity by design. We state a finite-sample vector-field bound that separates the estimation and variable-step discretization terms. Lastly, we demonstrate improved vector-field recovery and well-calibrated Hamiltonian uncertainty on mass-spring, Van der Pol, and Duffing benchmarks.
Spectral Flow Learning Theory: Finite-Sample Guarantees for Vector-Field Identification
We study the identification of continuous-time vector fields from irregularly sampled trajectories. We introduce Spectral Flow Learning (SFL), which learns in a windowed flow space using a lag-linear label operator that aggregates lagged Koopman actions. We provide finite-sample high-probability (FS-HP) guarantees for the class of variable-step linear multistep methods (vLLM). The FS-HP rates are constructed using spectral regularization with qualification-controlled filters for flow predictors under standard source and filter assumptions. A multistep observability inequality links flow error to vector-field error and yields two-term bounds that combine a statistical rate with an explicit discretization bias from vLMM theory. This preliminary preprint states the results and sketches proofs, with full proofs and extensions deferred to a journal version.
Nonparametric adaptive payload tracking for an offshore crane
A nonparametric adaptive controller is proposed for crane control where the payload tracks a desired trajectory with feedback from the payload position. The controller is based on a novel version of partial feedback linearization where the unactuated crane load dynamics are controlled with the position of the actuated crane dynamics instead of the acceleration. This is made possible by taking advantage of the gravity terms in a new Cartesian model that we propose for the load dynamics. This Cartesian model structure makes it possible to implement a nonparametric adaptive controller which cancels disturbances on the crane load by approximating the effects of unknown disturbance forces and structurally unknown dynamics in a reproducing kernel Hilbert space (RKHS). It is shown that the nonparametric adaptive controller leads to uniformly ultimately bounded errors in the presence of unknown forces and unmodeled dynamics. In addition, it is shown that the proposed partial feedback linearization based on the Cartesian model has certain advantages in payload tracking control also in the non-adaptive case. The performance of the nonparametric adaptive controller is validated in simulation and experiments with good results.
Sample Complexity of Linear Quadratic Regulator Without Initial Stability
Inspired by REINFORCE, we introduce a novel receding-horizon algorithm for the Linear Quadratic Regulator (LQR) problem with unknown dynamics. Unlike prior methods, our algorithm avoids reliance on two-point gradient estimates while maintaining the same order of sample complexity. Furthermore, it eliminates the restrictive requirement of starting with a stable initial policy, broadening its applicability. Beyond these improvements, we introduce a refined analysis of error propagation through the contraction of the Riccati operator under the Riemannian distance. This refinement leads to a better sample complexity and ensures improved convergence guarantees.
Contrastive-KAN: A Semi-Supervised Intrusion Detection Framework for Cybersecurity with scarce Labeled Data
In the era of the Fourth Industrial Revolution, cybersecurity and intrusion detection systems are vital for the secure and reliable operation of IoT and IIoT environments. A key challenge in this domain is the scarcity of labeled cyberattack data, as most industrial systems operate under normal conditions. This data imbalance, combined with the high cost of annotation, hinders the effective training of machine learning models. Moreover, the rapid detection of attacks is essential, especially in critical infrastructure, to prevent large-scale disruptions. To address these challenges, we propose a real-time intrusion detection system based on a semi-supervised contrastive learning framework using the Kolmogorov-Arnold Network (KAN). Our method leverages abundant unlabeled data to effectively distinguish between normal and attack behaviors. We validate our approach on three benchmark datasets, UNSW-NB15, BoT-IoT, and Gas Pipeline, using only 2.20%, 1.28%, and 8% of labeled samples, respectively, to simulate real-world conditions. Experimental results show that our method outperforms existing contrastive learning-based approaches. We further compare KAN with a traditional multilayer perceptron (MLP), demonstrating KAN's superior performance in both detection accuracy and robustness under limited supervision. KAN's ability to model complex relationships, along with its learnable activation functions, is also explored and visualized, offering interpretability and the potential for rule extraction. The method supports multi-class classification and proves effective in safety, critical environments where reliability is paramount.
A Corrector-aided Look-ahead Distance-based Guidance for Online Reference Path Following with an Efficient Mid-course Guidance Strategy
Efficient path-following is crucial in most of the applications of autonomous vehicles (UxV). Among various guidance strategies presented in literature, the look-ahead distance ($L_1$)-based nonlinear guidance has received significant attention due to its ease in implementation and ability to maintain a low cross-track error while following simpler reference paths and generating bounded lateral acceleration commands. However, the constant value of $L_1$ becomes problematic when the UxV is far away from the reference path and also produces higher cross-track error while following complex reference paths having high variation in radius of curvature. To address these challenges, the notion of look-ahead distance is leveraged in a novel way to develop a two-phase guidance strategy. Initially, when the UxV is far from the reference path, an optimized $L_1$ selection strategy is developed to guide the UxV towards the vicinity of the start point of the reference path, while maintaining minimal lateral acceleration command. Once the vehicle reaches a close neighborhood of the reference path, a novel notion of corrector point is incorporated in the constant $L_1$-based guidance scheme to generate the guidance command that effectively reduces the root mean square of the cross-track error and lateral acceleration requirement thereafter. Simulation results validate satisfactory performance of this proposed corrector point and look-ahead point pair-based guidance strategy, along with the developed mid-course guidance scheme. Also, its superiority over the conventional constant $L_1$ guidance scheme is established by simulation studies over different initial condition scenarios.
comment: This paper is currently under review for publication in ACC 2026
Distributionally robust LMI synthesis for LTI systems
This article shows that distributionally robust controller synthesis as investigated in \cite{taskesen2024distributionally} can be formulated as a convex linear matrix inequality (LMI) synthesis problem. To this end, we rely on well-established convexification techniques from robust control. The LMI synthesis problem we propose has the advantage that it can be solved efficiently using off-the-shelf semi-definite programming (SDP) solvers. In addition, our formulation exposes the studied distributionally robust controller synthesis problem as an instance of robust $H_2$ synthesis.
comment: Submitted to Transactions of Automatic Control
A Hybrid Strategy for Probabilistic Forecasting and Trading of Aggregated Wind-Solar Power: Design and Analysis in HEFTCom2024
Obtaining accurate probabilistic energy forecasts and making effective decisions amid diverse uncertainties are routine challenges in future energy systems. This paper presents the winning solution of team GEB, which ranked 3rd in trading, 4th in forecasting, and 1st among student teams in the IEEE Hybrid Energy Forecasting and Trading Competition 2024 (HEFTCom2024). The solution provides accurate probabilistic forecasts for a wind-solar hybrid system, and achieves substantial trading revenue in the day-ahead electricity market. Key components include: (1) a stacking-based approach combining sister forecasts from various Numerical Weather Predictions (NWPs) to provide wind power forecasts, (2) an online solar post-processing model to address the distribution shift in the online test set caused by increased solar capacity, (3) a probabilistic aggregation method for accurate quantile forecasts of hybrid generation, and (4) a stochastic trading strategy to maximize expected trading revenue considering uncertainties in electricity prices. This paper also explores the potential of end-to-end learning to further enhance the trading revenue by shifting the distribution of forecast errors. Detailed case studies are provided to validate the effectiveness of these proposed methods. Code for all mentioned methods is available for reproduction and further research in both industry and academia.
comment: Solution description of IEEE Hybrid Energy Forecasting and Trading Competition (HEFTCom), preprint for International Journal of Forecasting
Hierarchical Multi-Agent MCTS for Safety-Critical Coordination in Mixed-Autonomy Roundabouts
Navigating unsignalized roundabouts in mixed-autonomy traffic presents significant challenges due to dense vehicle interactions, lane-changing complexities, and behavioral uncertainties of human-driven vehicles (HDVs). This paper proposes a safety-critical decision-making framework for connected and automated vehicles (CAVs) navigating dual-lane roundabouts alongside HDVs. We formulate the problem as a multi-agent Markov Decision Process and develop a hierarchical safety assessment mechanism that evaluates three critical interaction types: CAV-to-CAV (C2C), CAV-to-HDV (C2H), and CAV-to-Boundary (C2B). A key contribution is our lane-specific uncertainty model for HDVs, which captures distinct behavioral patterns between inner and outer lanes, with outer-lane vehicles exhibiting $2.3\times$ higher uncertainty due to less constrained movements. We integrate this safety framework with a multi-agent Monte Carlo Tree Search (MCTS) algorithm that employs safety-aware pruning to eliminate high-risk trajectories while maintaining computational efficiency. The reward function incorporates Shapley value-based credit assignment to balance individual performance with group coordination. Extensive simulation results validate the effectiveness of the proposed approach under both fully autonomous (100% AVs) and mixed traffic (50% AVs + 50% HDVs) conditions. Compared to benchmark methods, our framework consistently reduces trajectory deviations across all AVs and significantly lowers the rate of Post-Encroachment Time (PET) violations, achieving only 1.0% in the fully autonomous scenario and 3.2% in the mixed traffic setting.
comment: 11 pages, 10 figures
Systems and Control (EESS)
Use of Quadcopter Wakes to Supplement Strawberry Pollination
Pollinators are critical to the world's ecosystems and food supply, yet recent studies have found pollination shortfalls in several crops, including strawberry. This is troubling because wild and managed pollinators are currently experiencing declines. One possibility is to try and provide supplemental pollination solutions. These solutions should be affordable and simple for farmers to implement if their use is to be widespread; quadcopters are a great example, already used for monitoring on many farms. This paper investigates a new method for artificial pollination based on wind pollination that bears further investigation. After determining the height where the lateral flow is maximized, we performed field experiments with a quadcopter assisting natural pollinators. Although our results in the field were inconclusive, lab studies show that the idea shows promise and could be adapted for better field results.
comment: 7 pages, 7 figures
A Real-Time Framework for Intermediate Map Construction and Kinematically Feasible Off-Road Planning Without OSM
Off-road environments present unique challenges for autonomous navigation due to their complex and unstructured nature. Traditional global path-planning methods, which typically aim to minimize path length and travel time, perform poorly on large-scale maps and fail to account for critical factors such as real-time performance, kinematic feasibility, and memory efficiency. This paper introduces a novel global path-planning method specifically designed for off-road environments, addressing these essential factors. The method begins by constructing an intermediate map within the pixel coordinate system, incorporating geographical features like off-road trails, waterways, restricted and passable areas, and trees. The planning problem is then divided into three sub-problems: graph-based path planning, kinematic feasibility checking, and path smoothing. This approach effectively meets real-time performance requirements while ensuring kinematic feasibility and efficient memory use. The method was tested in various off-road environments with large-scale maps up to several square kilometers in size, successfully identifying feasible paths in an average of 1.5 seconds and utilizing approximately 1.5GB of memory under extreme conditions. The proposed framework is versatile and applicable to a wide range of off-road autonomous navigation tasks, including search and rescue missions and agricultural operations.
3D Electronic-Photonic Heterogenous Interconnect Platforms Enabling Energy-Efficient Scalable Architectures For Future HPC Systems
3D interconnects have emerged as a solution to address the scaling issues of interconnect bandwidth and the memory wall problem in high-performance computing (HPC), such as High-Bandwidth Memory (HBM). However, the copper-based electrical interconnect retains fundamental limitations. Dense I/O for high-speed signals lead to degraded signal quality for end-to-end links, necessitating additional circuits to mitigate signal impairments and resulting in poor energy efficiency. We propose a 3D chiplet stacking electronic-photonic interconnect (EPIC) platform, which offers a solution by moving the high-speed data communication interface to the optical domain across the 3D stack by using Through Silicon Optical Vias (TSOV), while retaining the functionality of electrical TSVs and 2.5D interconnects for power delivery and short-reach low-latency communications. We then benchmark the proposed model against state-of-the-art 3D electrical interconnects to demonstrate our 3D EPIC platform beating the 3D electrical interconnects to $>$10 TB/s/$mm^2$ bandwidth density. We present a pathway to extend our demonstrated, industry-ready design to achieving $\leq$100 fJ/bit high-speed communication.
Convex Pollution Control of Wastewater Treatment Systems
We design a model-predictive controller for managing the actuators in sewer networks. It minimizes flooding and combined-sewer overflow during rain and pollution at other times. To make the problem tractable, we use a convex relaxation of the microbial growth kinetics and a physically motivated linearization of the mass flow bilinearities. With these approximations, the trajectory optimization in each control period is a second-order cone program. In simulation, the controller releases roughly 15% less pollutant mass than a conventional controller while treating nearly the same volume of flow. It does so by better balancing the flow over the treatment plants and over time.
Electrical System Architecture for Aviation Electrification
The electrification of aircraft is reshaping the foundations of aerospace design by positioning electrical systems at the center of propulsion, control, and onboard functionality. This chapter provides an overview of electrical system architectures for electric and hybrid electric aircraft, highlighting both established principles and emerging design strategies. The discussion begins with the motivations for electrification, including reducing environmental impact, improving operational efficiency, and replacing complex pneumatic and hydraulic subsystems with lighter and more reliable electrical alternatives. Aircraft electrical architectures are classified into four major categories: conventional, more electric, all electric, and hybrid electric. A range of system topologies is examined, including direct current (DC), alternating current (AC), hybrid, and distributed configurations. Each is considered in terms of its effectiveness in delivering power, enabling redundancy, supporting fault isolation, and managing thermal performance. Real world examples are presented to demonstrate practical applications, with case studies drawn from the Boeing 787 Dreamliner, the Eviation Alice commuter aircraft, and NASA X57 Maxwell demonstrator. These examples illustrate the ongoing transition from incremental subsystem electrification toward fully integrated architectures that promise higher efficiency and greater sustainability.
Enhancing Data Center Low-Voltage Ride-Through
Data center loads have expanded significantly in recent years. Compared to traditional loads, data centers are highly sensitive to voltage deviations and thus their protection mechanisms trip more proactively during voltage fluctuations. During a grid fault, simultaneous tripping of large-scale data centers can further destabilize the transmission system and even lead to cascading failures. In response, transmission system operators are imposing voltage ride-through (VRT) requirements for data centers. In this work, we enhance the VRT capability of data centers by designing voltage controllers for their internal power distribution network. We first systematically analyze VRT standards and the controllable resources related to data centers. These resources enable the design of voltage control strategies to regulate voltages internal to the data center, thereby allowing loads to remain online during voltage disturbances from the external transmission grid. We study and contrast both centralized and decentralized controllers that unify the control of heterogeneous flexible resources. Additionally, we construct an integrated test system that simulates both the transient fault response of the transmission system and the data center distribution network. Case studies demonstrate that the proposed voltage control mechanisms provide effective yet simple solutions to enhance data center low-voltage ride-through capability.
HOFLON: Hybrid Offline Learning and Online Optimization for Process Start-Up and Grade-Transition Control
Start-ups and product grade-changes are critical steps in continuous-process plant operation, because any misstep immediately affects product quality and drives operational losses. These transitions have long relied on manual operation by a handful of expert operators, but the progressive retirement of that workforce is leaving plant owners without the tacit know-how needed to execute them consistently. In the absence of a process model, offline reinforcement learning (RL) promises to capture and even surpass human expertise by mining historical start-up and grade-change logs, yet standard offline RL struggles with distribution shift and value-overestimation whenever a learned policy ventures outside the data envelope. We introduce HOFLON (Hybrid Offline Learning + Online Optimization) to overcome those limitations. Offline, HOFLON learns (i) a latent data manifold that represents the feasible region spanned by past transitions and (ii) a long-horizon Q-critic that predicts the cumulative reward from state-action pairs. Online, it solves a one-step optimization problem that maximizes the Q-critic while penalizing deviations from the learned manifold and excessive rates of change in the manipulated variables. We test HOFLON on two industrial case studies: a polymerization reactor start-up and a paper-machine grade-change problem, and benchmark it against Implicit Q-Learning (IQL), a leading offline-RL algorithm. In both plants HOFLON not only surpasses IQL but also delivers, on average, better cumulative rewards than the best start-up or grade-change observed in the historical data, demonstrating its potential to automate transition operations beyond current expert capability.
comment: 31 pages, 15 figures, submitted to Computers and Chemical Engineering
A Trustworthy Industrial Fault Diagnosis Architecture Integrating Probabilistic Models and Large Language Models
There are limitations of traditional methods and deep learning methods in terms of interpretability, generalization, and quantification of uncertainty in industrial fault diagnosis, and there are core problems of insufficient credibility in industrial fault diagnosis. The architecture performs preliminary analysis through a Bayesian network-based diagnostic engine and features an LLM-driven cognitive quorum module with multimodal input capabilities. The module conducts expert-level arbitration of initial diagnoses by analyzing structured features and diagnostic charts, prioritizing final decisions after conflicts are identified. To ensure the reliability of the system output, the architecture integrates a confidence calibration module based on temperature calibration and a risk assessment module, which objectively quantifies the reliability of the system using metrics such as expected calibration error (ECE). Experimental results on a dataset containing multiple fault types showed that the proposed framework improved diagnostic accuracy by more than 28 percentage points compared to the baseline model, while the calibrated ECE was reduced by more than 75%. Case studies have confirmed that HCAA effectively corrects misjudgments caused by complex feature patterns or knowledge gaps in traditional models, providing novel and practical engineering solutions for building high-trust, explainable AI diagnostic systems for industrial applications.
comment: 1tables,6 figs,11pages
On the Duality Between Quantized Time and States in Dynamic Simulation
This letter introduces a formal duality between discrete-time and quantized-state numerical methods. We interpret quantized state system (QSS) methods as integration schemes applied to a dual form of the system model, where time is seen as a state-dependent variable. This perspective enables the definition of novel QSS-based schemes inspired by classical time-integration techniques. As a proof of concept, we illustrate the idea by introducing a QSS Adams-Bashforth method applied to a test equation. We then move to demonstrate how the proposed approach can achieve notable performance improvements in realistic power system simulations.
Dissecting Larval Zebrafish Hunting using Deep Reinforcement Learning Trained RNN Agents
Larval zebrafish hunting provides a tractable setting to study how ecological and energetic constraints shape adaptive behavior in both biological brains and artificial agents. Here we develop a minimal agent-based model, training recurrent policies with deep reinforcement learning in a bout-based zebrafish simulator. Despite its simplicity, the model reproduces hallmark hunting behaviors -- including eye vergence-linked pursuit, speed modulation, and stereotyped approach trajectories -- that closely match real larval zebrafish. Quantitative trajectory analyses show that pursuit bouts systematically reduce prey angle by roughly half before strike, consistent with measurements. Virtual experiments and parameter sweeps vary ecological and energetic constraints, bout kinematics (coupled vs. uncoupled turns and forward motion), and environmental factors such as food density, food speed, and vergence limits. These manipulations reveal how constraints and environments shape pursuit dynamics, strike success, and abort rates, yielding falsifiable predictions for neuroscience experiments. These sweeps identify a compact set of constraints -- binocular sensing, the coupling of forward speed and turning in bout kinematics, and modest energetic costs on locomotion and vergence -- that are sufficient for zebrafish-like hunting to emerge. Strikingly, these behaviors arise in minimal agents without detailed biomechanics, fluid dynamics, circuit realism, or imitation learning from real zebrafish data. Taken together, this work provides a normative account of zebrafish hunting as the optimal balance between energetic cost and sensory benefit, highlighting the trade-offs that structure vergence and trajectory dynamics. We establish a virtual lab that narrows the experimental search space and generates falsifiable predictions about behavior and neural coding.
Optimal Energy Management in Indoor Farming Using Lighting Flexibility and Intelligent Model Predictive Control
Indoor farming enables year-round food production but its reliance on artificial lighting significantly increases energy consumption, peak load charges, and energy costs for growers. Recent studies indicate that plants are able to tolerate interruptions in light, enabling the design of 24-hour lighting schedules (or "recipes") with strategic light modulation in alignment with day-ahead pricing. Thus, we propose an optimal lighting control strategy for indoor farming that modulates light intensity and photoperiod to reduce energy costs. The control strategy is implemented within a model predictive control framework and augmented with transformer-based neural networks to forecast 24-hour ahead solar radiation and electricity prices to improve energy cost reduction. The control strategy is informed by real-world experimentation on lettuce crops to discover minimum light exposure and appropriate dark-light intervals, which are mathematically formulated as constraints to maintain plant health. Simulations for a one-hectare greenhouse, based on real electricity market data from Ontario, demonstrate an annual cost reduction of $318,400 (20.9%), a peak load decrease of 1.6 MW (33.32%), and total energy savings of 1890 MWh (20.2%) against a baseline recipe. These findings highlight the potential of intelligent lighting control to improve the sustainability and economic feasibility of indoor farming.
Optimising Battery Energy Storage System Trading via Energy Market Operator Price Forecast
In electricity markets around the world, the ability to anticipate price movements with precision can be the difference between profit and loss, especially for fast-acting assets like battery energy storage systems (BESS). As grid volatility increases due to renewables and market decentralisation, operators and forecasters alike face growing pressure to transform prediction into strategy. Yet while forecast data is abundant, especially in advanced markets like Australia's National Electricity Market (NEM), its practical value in driving real-world BESS trading decisions remains largely unexplored. This thesis dives into that gap. This work addresses a key research question: Can the accuracy of the Australian Energy Market Operator (AEMO) energy price forecasts be systematically leveraged to develop a reliable and profitable battery energy storage system trading algorithm? Despite the availability of AEMO price forecasts, no existing framework evaluates their reliability or incorporates them into practical BESS trading strategies. By analysing patterns in forecast accuracy based on time of day, forecast horizon, and regional variations, this project creates a novel, forecast-informed BESS trading model to optimise arbitrage financial returns. The performance of this forecast-driven algorithm is benchmarked against a basic trading algorithm with no knowledge of forecast data. The study further explores the potential of machine learning techniques to predict future energy prices by enhancing AEMO forecasts to govern a more advanced trading strategy. The research outcomes will inform future improvements in energy market trading models and promote more efficient BESS integration into market operations.
Safety-Oriented Dynamic Path Planning for Automated Vehicles
Ensuring safety in autonomous vehicles necessitates advanced path planning and obstacle avoidance capabilities, particularly in dynamic environments. This paper introduces a bi-level control framework that efficiently augments road boundaries by incorporating time-dependent grid projections of obstacle movements, thus enabling precise and adaptive path planning. The main control loop utilizes Nonlinear Model Predictive Control (NMPC) for real-time path optimization, wherein homotopy-based constraint relaxation is employed to improve the solvability of the optimal control problem (OCP). Furthermore, an independent backup loop runs concurrently to provide safe fallback trajectories when an optimal trajectory cannot be computed by the main loop within a critical time frame, thus enhancing safety and real-time performance. Our evaluation showcases the benefits of the proposed methods in various driving scenarios, highlighting the real-time applicability and robustness of our approach. Overall, the framework represents a significant step towards safer and more reliable autonomous driving in complex and dynamic environments.
comment: Published in 2025 IEEE 101st Vehicular Technology Conference (VTC2025-Spring), Oslo, Norway, June 17-20, 2025. Received Best Conference Paper Award
Cyber Resilience of Three-phase Unbalanced Distribution System Restoration under Sparse Adversarial Attack on Load Forecasting
System restoration is critical for power system resilience, nonetheless, its growing reliance on artificial intelligence (AI)-based load forecasting introduces significant cybersecurity risks. Inaccurate forecasts can lead to infeasible planning, voltage and frequency violations, and unsuccessful recovery of de-energized segments, yet the resilience of restoration processes to such attacks remains largely unexplored. This paper addresses this gap by quantifying how adversarially manipulated forecasts impact restoration feasibility and grid security. We develop a gradient-based sparse adversarial attack that strategically perturbs the most influential spatiotemporal inputs, exposing vulnerabilities in forecasting models while maintaining stealth. We further create a restoration-aware validation framework that embeds these compromised forecasts into a sequential restoration model and evaluates operational feasibility using an unbalanced three-phase optimal power flow formulation. Simulation results show that the proposed approach is more efficient and stealthier than baseline attacks. It reveals system-level failures, such as voltage and power ramping violations that prevent the restoration of critical loads. These findings provide actionable insights for designing cybersecurity-aware restoration planning frameworks.
comment: 10 pages, 7 figures
Learning Safety-Compatible Observers for Unknown Systems
This paper presents a data-driven approach for jointly learning a robust full-state observer and its robustness certificate for systems with unknown dynamics. Leveraging incremental input-to-state stability (delta ISS) notions, we jointly learn a delta ISS Lyapunov function that serves as the robustness certificate and prove practical convergence of the estimation error under standard fidelity assumptions on the learned models. This renders the observer safety-compatible: they can be consumed by certificate-based safe controllers so that, when the controller tolerates bounded estimation error, the controller's certificate remains valid under output feedback. We further extend the approach to interconnected systems via the small-gain theorem, yielding a distributed observer design framework. We validate the approach on a variety of nonlinear systems.
comment: Submitted to American Control Conference (ACC)
Learning Passive Continuous-Time Dynamics with Multistep Port-Hamiltonian Gaussian Processes
We propose the multistep port-Hamiltonian Gaussian process (MS-PHS GP) to learn physically consistent continuous-time dynamics and a posterior over the Hamiltonian from noisy, irregularly-sampled trajectories. By placing a GP prior on the Hamiltonian surface $H$ and encoding variable-step multistep integrator constraints as finite linear functionals, MS-PHS GP enables closed-form conditioning of both the vector field and the Hamiltonian surface without latent states, while enforcing energy balance and passivity by design. We state a finite-sample vector-field bound that separates the estimation and variable-step discretization terms. Lastly, we demonstrate improved vector-field recovery and well-calibrated Hamiltonian uncertainty on mass-spring, Van der Pol, and Duffing benchmarks.
Spectral Flow Learning Theory: Finite-Sample Guarantees for Vector-Field Identification
We study the identification of continuous-time vector fields from irregularly sampled trajectories. We introduce Spectral Flow Learning (SFL), which learns in a windowed flow space using a lag-linear label operator that aggregates lagged Koopman actions. We provide finite-sample high-probability (FS-HP) guarantees for the class of variable-step linear multistep methods (vLLM). The FS-HP rates are constructed using spectral regularization with qualification-controlled filters for flow predictors under standard source and filter assumptions. A multistep observability inequality links flow error to vector-field error and yields two-term bounds that combine a statistical rate with an explicit discretization bias from vLMM theory. This preliminary preprint states the results and sketches proofs, with full proofs and extensions deferred to a journal version.
Nonparametric adaptive payload tracking for an offshore crane
A nonparametric adaptive controller is proposed for crane control where the payload tracks a desired trajectory with feedback from the payload position. The controller is based on a novel version of partial feedback linearization where the unactuated crane load dynamics are controlled with the position of the actuated crane dynamics instead of the acceleration. This is made possible by taking advantage of the gravity terms in a new Cartesian model that we propose for the load dynamics. This Cartesian model structure makes it possible to implement a nonparametric adaptive controller which cancels disturbances on the crane load by approximating the effects of unknown disturbance forces and structurally unknown dynamics in a reproducing kernel Hilbert space (RKHS). It is shown that the nonparametric adaptive controller leads to uniformly ultimately bounded errors in the presence of unknown forces and unmodeled dynamics. In addition, it is shown that the proposed partial feedback linearization based on the Cartesian model has certain advantages in payload tracking control also in the non-adaptive case. The performance of the nonparametric adaptive controller is validated in simulation and experiments with good results.
Sample Complexity of Linear Quadratic Regulator Without Initial Stability
Inspired by REINFORCE, we introduce a novel receding-horizon algorithm for the Linear Quadratic Regulator (LQR) problem with unknown dynamics. Unlike prior methods, our algorithm avoids reliance on two-point gradient estimates while maintaining the same order of sample complexity. Furthermore, it eliminates the restrictive requirement of starting with a stable initial policy, broadening its applicability. Beyond these improvements, we introduce a refined analysis of error propagation through the contraction of the Riccati operator under the Riemannian distance. This refinement leads to a better sample complexity and ensures improved convergence guarantees.
Contrastive-KAN: A Semi-Supervised Intrusion Detection Framework for Cybersecurity with scarce Labeled Data
In the era of the Fourth Industrial Revolution, cybersecurity and intrusion detection systems are vital for the secure and reliable operation of IoT and IIoT environments. A key challenge in this domain is the scarcity of labeled cyberattack data, as most industrial systems operate under normal conditions. This data imbalance, combined with the high cost of annotation, hinders the effective training of machine learning models. Moreover, the rapid detection of attacks is essential, especially in critical infrastructure, to prevent large-scale disruptions. To address these challenges, we propose a real-time intrusion detection system based on a semi-supervised contrastive learning framework using the Kolmogorov-Arnold Network (KAN). Our method leverages abundant unlabeled data to effectively distinguish between normal and attack behaviors. We validate our approach on three benchmark datasets, UNSW-NB15, BoT-IoT, and Gas Pipeline, using only 2.20%, 1.28%, and 8% of labeled samples, respectively, to simulate real-world conditions. Experimental results show that our method outperforms existing contrastive learning-based approaches. We further compare KAN with a traditional multilayer perceptron (MLP), demonstrating KAN's superior performance in both detection accuracy and robustness under limited supervision. KAN's ability to model complex relationships, along with its learnable activation functions, is also explored and visualized, offering interpretability and the potential for rule extraction. The method supports multi-class classification and proves effective in safety, critical environments where reliability is paramount.
A Corrector-aided Look-ahead Distance-based Guidance for Online Reference Path Following with an Efficient Mid-course Guidance Strategy
Efficient path-following is crucial in most of the applications of autonomous vehicles (UxV). Among various guidance strategies presented in literature, the look-ahead distance ($L_1$)-based nonlinear guidance has received significant attention due to its ease in implementation and ability to maintain a low cross-track error while following simpler reference paths and generating bounded lateral acceleration commands. However, the constant value of $L_1$ becomes problematic when the UxV is far away from the reference path and also produces higher cross-track error while following complex reference paths having high variation in radius of curvature. To address these challenges, the notion of look-ahead distance is leveraged in a novel way to develop a two-phase guidance strategy. Initially, when the UxV is far from the reference path, an optimized $L_1$ selection strategy is developed to guide the UxV towards the vicinity of the start point of the reference path, while maintaining minimal lateral acceleration command. Once the vehicle reaches a close neighborhood of the reference path, a novel notion of corrector point is incorporated in the constant $L_1$-based guidance scheme to generate the guidance command that effectively reduces the root mean square of the cross-track error and lateral acceleration requirement thereafter. Simulation results validate satisfactory performance of this proposed corrector point and look-ahead point pair-based guidance strategy, along with the developed mid-course guidance scheme. Also, its superiority over the conventional constant $L_1$ guidance scheme is established by simulation studies over different initial condition scenarios.
comment: This paper is currently under review for publication in ACC 2026
Distributionally robust LMI synthesis for LTI systems
This article shows that distributionally robust controller synthesis as investigated in \cite{taskesen2024distributionally} can be formulated as a convex linear matrix inequality (LMI) synthesis problem. To this end, we rely on well-established convexification techniques from robust control. The LMI synthesis problem we propose has the advantage that it can be solved efficiently using off-the-shelf semi-definite programming (SDP) solvers. In addition, our formulation exposes the studied distributionally robust controller synthesis problem as an instance of robust $H_2$ synthesis.
comment: Submitted to Transactions of Automatic Control
A Hybrid Strategy for Probabilistic Forecasting and Trading of Aggregated Wind-Solar Power: Design and Analysis in HEFTCom2024
Obtaining accurate probabilistic energy forecasts and making effective decisions amid diverse uncertainties are routine challenges in future energy systems. This paper presents the winning solution of team GEB, which ranked 3rd in trading, 4th in forecasting, and 1st among student teams in the IEEE Hybrid Energy Forecasting and Trading Competition 2024 (HEFTCom2024). The solution provides accurate probabilistic forecasts for a wind-solar hybrid system, and achieves substantial trading revenue in the day-ahead electricity market. Key components include: (1) a stacking-based approach combining sister forecasts from various Numerical Weather Predictions (NWPs) to provide wind power forecasts, (2) an online solar post-processing model to address the distribution shift in the online test set caused by increased solar capacity, (3) a probabilistic aggregation method for accurate quantile forecasts of hybrid generation, and (4) a stochastic trading strategy to maximize expected trading revenue considering uncertainties in electricity prices. This paper also explores the potential of end-to-end learning to further enhance the trading revenue by shifting the distribution of forecast errors. Detailed case studies are provided to validate the effectiveness of these proposed methods. Code for all mentioned methods is available for reproduction and further research in both industry and academia.
comment: Solution description of IEEE Hybrid Energy Forecasting and Trading Competition (HEFTCom), preprint for International Journal of Forecasting
Hierarchical Multi-Agent MCTS for Safety-Critical Coordination in Mixed-Autonomy Roundabouts
Navigating unsignalized roundabouts in mixed-autonomy traffic presents significant challenges due to dense vehicle interactions, lane-changing complexities, and behavioral uncertainties of human-driven vehicles (HDVs). This paper proposes a safety-critical decision-making framework for connected and automated vehicles (CAVs) navigating dual-lane roundabouts alongside HDVs. We formulate the problem as a multi-agent Markov Decision Process and develop a hierarchical safety assessment mechanism that evaluates three critical interaction types: CAV-to-CAV (C2C), CAV-to-HDV (C2H), and CAV-to-Boundary (C2B). A key contribution is our lane-specific uncertainty model for HDVs, which captures distinct behavioral patterns between inner and outer lanes, with outer-lane vehicles exhibiting $2.3\times$ higher uncertainty due to less constrained movements. We integrate this safety framework with a multi-agent Monte Carlo Tree Search (MCTS) algorithm that employs safety-aware pruning to eliminate high-risk trajectories while maintaining computational efficiency. The reward function incorporates Shapley value-based credit assignment to balance individual performance with group coordination. Extensive simulation results validate the effectiveness of the proposed approach under both fully autonomous (100% AVs) and mixed traffic (50% AVs + 50% HDVs) conditions. Compared to benchmark methods, our framework consistently reduces trajectory deviations across all AVs and significantly lowers the rate of Post-Encroachment Time (PET) violations, achieving only 1.0% in the fully autonomous scenario and 3.2% in the mixed traffic setting.
comment: 11 pages, 10 figures
Robotics
A Real-Time Framework for Intermediate Map Construction and Kinematically Feasible Off-Road Planning Without OSM
Off-road environments present unique challenges for autonomous navigation due to their complex and unstructured nature. Traditional global path-planning methods, which typically aim to minimize path length and travel time, perform poorly on large-scale maps and fail to account for critical factors such as real-time performance, kinematic feasibility, and memory efficiency. This paper introduces a novel global path-planning method specifically designed for off-road environments, addressing these essential factors. The method begins by constructing an intermediate map within the pixel coordinate system, incorporating geographical features like off-road trails, waterways, restricted and passable areas, and trees. The planning problem is then divided into three sub-problems: graph-based path planning, kinematic feasibility checking, and path smoothing. This approach effectively meets real-time performance requirements while ensuring kinematic feasibility and efficient memory use. The method was tested in various off-road environments with large-scale maps up to several square kilometers in size, successfully identifying feasible paths in an average of 1.5 seconds and utilizing approximately 1.5GB of memory under extreme conditions. The proposed framework is versatile and applicable to a wide range of off-road autonomous navigation tasks, including search and rescue missions and agricultural operations.
TCB-VIO: Tightly-Coupled Focal-Plane Binary-Enhanced Visual Inertial Odometry
Vision algorithms can be executed directly on the image sensor when implemented on the next-generation sensors known as focal-plane sensor-processor arrays (FPSP)s, where every pixel has a processor. FPSPs greatly improve latency, reducing the problems associated with the bottleneck of data transfer from a vision sensor to a processor. FPSPs accelerate vision-based algorithms such as visual-inertial odometry (VIO). However, VIO frameworks suffer from spatial drift due to the vision-based pose estimation, whilst temporal drift arises from the inertial measurements. FPSPs circumvent the spatial drift by operating at a high frame rate to match the high-frequency output of the inertial measurements. In this paper, we present TCB-VIO, a tightly-coupled 6 degrees-of-freedom VIO by a Multi-State Constraint Kalman Filter (MSCKF), operating at a high frame-rate of 250 FPS and from IMU measurements obtained at 400 Hz. TCB-VIO outperforms state-of-the-art methods: ROVIO, VINS-Mono, and ORB-SLAM3.
comment: Accepted at IEEE Robotics and Automation Letters
OpenFLAME: Federated Visual Positioning System to Enable Large-Scale Augmented Reality Applications
World-scale augmented reality (AR) applications need a ubiquitous 6DoF localization backend to anchor content to the real world consistently across devices. Large organizations such as Google and Niantic are 3D scanning outdoor public spaces in order to build their own Visual Positioning Systems (VPS). These centralized VPS solutions fail to meet the needs of many future AR applications -- they do not cover private indoor spaces because of privacy concerns, regulations, and the labor bottleneck of updating and maintaining 3D scans. In this paper, we present OpenFLAME, a federated VPS backend that allows independent organizations to 3D scan and maintain a separate VPS service for their own spaces. This enables access control of indoor 3D scans, distributed maintenance of the VPS backend, and encourages larger coverage. Sharding of VPS services introduces several unique challenges -- coherency of localization results across spaces, quality control of VPS services, selection of the right VPS service for a location, and many others. We introduce the concept of federated image-based localization and provide reference solutions for managing and merging data across maps without sharing private data.
WAFFLE: A Wearable Approach to Bite Timing Estimation in Robot-Assisted Feeding
Millions of people around the world need assistance with feeding. Robotic feeding systems offer the potential to enhance autonomy and quality of life for individuals with impairments and reduce caregiver workload. However, their widespread adoption has been limited by technical challenges such as estimating bite timing, the appropriate moment for the robot to transfer food to a user's mouth. In this work, we introduce WAFFLE: Wearable Approach For Feeding with LEarned bite timing, a system that accurately predicts bite timing by leveraging wearable sensor data to be highly reactive to natural user cues such as head movements, chewing, and talking. We train a supervised regression model on bite timing data from 14 participants and incorporate a user-adjustable assertiveness threshold to convert predictions into proceed or stop commands. In a study with 15 participants without motor impairments with the Obi feeding robot, WAFFLE performs statistically on par with or better than baseline methods across measures of feeling of control, robot understanding, and workload, and is preferred by the majority of participants for both individual and social dining. We further demonstrate WAFFLE's generalizability in a study with 2 participants with motor impairments in their home environments using a Kinova 7DOF robot. Our findings support WAFFLE's effectiveness in enabling natural, reactive bite timing that generalizes across users, robot hardware, robot positioning, feeding trajectories, foods, and both individual and social dining contexts.
Bridge Thinking and Acting: Unleashing Physical Potential of VLM with Generalizable Action Expert
Although Vision-Language Models (VLM) have demonstrated impressive planning and reasoning capabilities, translating these abilities into the physical world introduces significant challenges. Conventional Vision-Language-Action (VLA) models, which integrate reasoning and action into a monolithic architecture, generalize poorly because they are constrained by scarce, narrow-domain data. While recent dual-system approaches attempt to decouple "thinking" from "acting", they are often constrained by semantic ambiguities within the action module. This ambiguity makes large-scale, cross-task training infeasible. Consequently, these systems typically necessitate fine-tuning on newly collected data when deployed to novel environments, and the cooperation mechanism between the two systems remains ill-defined. To address these limitations, we introduce, for the first time, a framework centered around a generalizable action expert. Our approach utilizes sparse 3D trajectories as an intermediate representation, effectively bridging the high-level planning capabilities of the VLM with the low-level physical action module. During the planning phase, the VLM is only required to generate coarse 3D waypoints. These waypoints are then processed by our generalizable action expert, which refines them into dense, executable action sequences by sampling real-time point cloud observations of the environment. To promote training efficiency and robust generalization, we introduce a novel "Action Pre-training, Pointcloud Fine-tuning" paradigm. Our method combines the broad generalization capabilities of VLMs in visual understanding and planning with the fine-grained, action-level generalization of action expert.
NoTVLA: Narrowing of Dense Action Trajectories for Generalizable Robot Manipulation
Vision-Language-Action (VLA) models represent a pivotal advance in embodied intelligence, yet they confront critical barriers to real-world deployment, most notably catastrophic forgetting. This issue stems from their overreliance on continuous action sequences or action chunks, which inadvertently create isolated data silos that disrupt knowledge retention across tasks. To tackle these challenges, we propose the Narrowing of Trajectory VLA (NoTVLA) framework: a novel approach that narrows its focus to sparse trajectories, thereby avoiding the catastrophic forgetting associated with dense trajectory fine-tuning. A key innovation of NoTVLA lies in its trajectory planning strategy: instead of centering on the target object's trajectory, it leverages temporal compression and spatial reasoning pruning specifically for the robot end effector's trajectory. Furthermore, training is conducted using these sparse trajectories rather than dense action trajectories, an optimization that delivers remarkable practical advantages with better performance in zero-shot. In multi-task evaluation scenarios, NoTVLA achieves superior performance and generalization compared to pi0 while operating under two critical constraints: it uses over an order of magnitude less computing power than pi0 and requires no wrist-mounted camera. This design ensures that NoTVLA's operational accuracy closely approximates that of single-task expert models. Crucially, it also preserves the model's inherent language capabilities, enabling zero-shot generalization in specific scenarios, supporting unified model deployment across multiple robot platforms, and fostering a degree of generalization even when perceiving tasks from novel perspectives.
Seeing the Bigger Picture: 3D Latent Mapping for Mobile Manipulation Policy Learning
In this paper, we demonstrate that mobile manipulation policies utilizing a 3D latent map achieve stronger spatial and temporal reasoning than policies relying solely on images. We introduce Seeing the Bigger Picture (SBP), an end-to-end policy learning approach that operates directly on a 3D map of latent features. In SBP, the map extends perception beyond the robot's current field of view and aggregates observations over long horizons. Our mapping approach incrementally fuses multiview observations into a grid of scene-specific latent features. A pre-trained, scene-agnostic decoder reconstructs target embeddings from these features and enables online optimization of the map features during task execution. A policy, trainable with behavior cloning or reinforcement learning, treats the latent map as a state variable and uses global context from the map obtained via a 3D feature aggregator. We evaluate SBP on scene-level mobile manipulation and sequential tabletop manipulation tasks. Our experiments demonstrate that SBP (i) reasons globally over the scene, (ii) leverages the map as long-horizon memory, and (iii) outperforms image-based policies in both in-distribution and novel scenes, e.g., improving the success rate by 25% for the sequential manipulation task.
comment: Project website can be found at https://existentialrobotics.org/sbp_page/
COVER:COverage-VErified Roadmaps for Fixed-time Motion Planning in Continuous Semi-Static Environments
Having the ability to answer motion-planning queries within a fixed time budget is critical for the widespread deployment of robotic systems. Semi-static environments, where most obstacles remain static but a limited set can vary across queries, exhibit structured variability that can be systematically exploited to provide stronger guarantees than in general motion-planning problems. However, prior approaches in this setting either lack formal guarantees or rely on restrictive discretizations of obstacle configurations, limiting their applicability in realistic domains. This paper introduces COVER, a novel framework that incrementally constructs a coverage-verified roadmap in semi-static environments. By partitioning the obstacle configuration space and solving for feasible paths within each partition, COVER systematically verifies feasibility of the roadmap in each partition and guarantees fixed-time motion planning queries within the verified regions. We validate COVER with a 7-DOF simulated Panda robot performing table and shelf tasks, demonstrating that COVER achieves broader coverage with higher query success rates than prior works.
LIBERO-PRO: Towards Robust and Fair Evaluation of Vision-Language-Action Models Beyond Memorization
LIBERO has emerged as a widely adopted benchmark for evaluating Vision-Language-Action (VLA) models; however, its current training and evaluation settings are problematic, often leading to inflated performance estimates and preventing fair model comparison. To address these issues, we introduce LIBERO-PRO, an extended LIBERO benchmark that systematically evaluates model performance under reasonable perturbations across four dimensions: manipulated objects, initial states, task instructions, and environments. Experimental results reveal that, although existing models achieve over 90% accuracy under the standard LIBERO evaluation, their performance collapses to 0.0% under our generalized setting. Crucially, this discrepancy exposes the models' reliance on rote memorization of action sequences and environment layouts from the training set, rather than genuine task understanding or environmental perception. For instance, models persist in executing grasping actions when the target object is replaced with irrelevant items, and their outputs remain unchanged even when given corrupted instructions or even messy tokens. These findings expose the severe flaws in current evaluation practices, and we call on the community to abandon misleading methodologies in favor of robust assessments of model generalization and comprehension. Our code is available at: https://github.com/Zxy-MLlab/LIBERO-PRO.
comment: 12 pages,7 figures, 5 tables
Distributed Area Coverage with High Altitude Balloons Using Multi-Agent Reinforcement Learning
High Altitude Balloons (HABs) can leverage stratospheric wind layers for limited horizontal control, enabling applications in reconnaissance, environmental monitoring, and communications networks. Existing multi-agent HAB coordination approaches use deterministic methods like Voronoi partitioning and extremum seeking control for large global constellations, which perform poorly for smaller teams and localized missions. While single-agent HAB control using reinforcement learning has been demonstrated on HABs, coordinated multi-agent reinforcement learning (MARL) has not yet been investigated. This work presents the first systematic application of multi-agent reinforcement learning (MARL) to HAB coordination for distributed area coverage. We extend our previously developed reinforcement learning simulation environment (RLHAB) to support cooperative multi-agent learning, enabling multiple agents to operate simultaneously in realistic atmospheric conditions. We adapt QMIX for HAB area coverage coordination, leveraging Centralized Training with Decentralized Execution to address atmospheric vehicle coordination challenges. Our approach employs specialized observation spaces providing individual state, environmental context, and teammate data, with hierarchical rewards prioritizing coverage while encouraging spatial distribution. We demonstrate that QMIX achieves similar performance to the theoretically optimal geometric deterministic method for distributed area coverage, validating the MARL approach and providing a foundation for more complex autonomous multi-HAB missions where deterministic methods become intractable.
Trajectory prediction for heterogeneous agents: A performance analysis on small and imbalanced datasets
Robots and other intelligent systems navigating in complex dynamic environments should predict future actions and intentions of surrounding agents to reach their goals efficiently and avoid collisions. The dynamics of those agents strongly depends on their tasks, roles, or observable labels. Class-conditioned motion prediction is thus an appealing way to reduce forecast uncertainty and get more accurate predictions for heterogeneous agents. However, this is hardly explored in the prior art, especially for mobile robots and in limited data applications. In this paper, we analyse different class-conditioned trajectory prediction methods on two datasets. We propose a set of conditional pattern-based and efficient deep learning-based baselines, and evaluate their performance on robotics and outdoors datasets (TH\"OR-MAGNI and Stanford Drone Dataset). Our experiments show that all methods improve accuracy in most of the settings when considering class labels. More importantly, we observe that there are significant differences when learning from imbalanced datasets, or in new environments where sufficient data is not available. In particular, we find that deep learning methods perform better on balanced datasets, but in applications with limited data, e.g., cold start of a robot in a new environment, or imbalanced classes, pattern-based methods may be preferable.
comment: This paper has been accepted to the IEEE Robotics and Automation Letters journal and presented at the 40th Anniversary of the IEEE International Conference on Robotics and Automation, which was held in Rotterdam, Netherlands on 23-26 September, 2024
Model-Based Adaptive Precision Control for Tabletop Planar Pushing Under Uncertain Dynamics
Data-driven planar pushing methods have recently gained attention as they reduce manual engineering effort and improve generalization compared to analytical approaches. However, most prior work targets narrow capabilities (e.g., side switching, precision, or single-task training), limiting broader applicability. We present a model-based framework for non-prehensile tabletop pushing that uses a single learned model to address multiple tasks without retraining. Our approach employs a recurrent GRU-based architecture with additional non-linear layers to capture object-environment dynamics while ensuring stability. A tailored state-action representation enables the model to generalize across uncertain dynamics, variable push lengths, and diverse tasks. For control, we integrate the learned dynamics with a sampling-based Model Predictive Path Integral (MPPI) controller, which generates adaptive, task-oriented actions. This framework supports side switching, variable-length pushes, and objectives such as precise positioning, trajectory following, and obstacle avoidance. Training is performed in simulation with domain randomization to support sim-to-real transfer. We first evaluate the architecture through ablation studies, showing improved prediction accuracy and stable rollouts. We then validate the full system in simulation and real-world experiments using a Franka Panda robot with markerless tracking. Results demonstrate high success rates in precise positioning under strict thresholds and strong performance in trajectory tracking and obstacle avoidance. Moreover, multiple tasks are solved simply by changing the controller's objective function, without retraining. While our current focus is on a single object type, we extend the framework by training on wider push lengths and designing a balanced controller that reduces the number of steps for longer-horizon goals.
EmbodiSwap for Zero-Shot Robot Imitation Learning
We introduce EmbodiSwap - a method for producing photorealistic synthetic robot overlays over human video. We employ EmbodiSwap for zero-shot imitation learning, bridging the embodiment gap between in-the-wild ego-centric human video and a target robot embodiment. We train a closed-loop robot manipulation policy over the data produced by EmbodiSwap. We make novel use of V-JEPA as a visual backbone, repurposing V-JEPA from the domain of video understanding to imitation learning over synthetic robot videos. Adoption of V-JEPA outperforms alternative vision backbones more conventionally used within robotics. In real-world tests, our zero-shot trained V-JEPA model achieves an $82\%$ success rate, outperforming a few-shot trained $\pi_0$ network as well as $\pi_0$ trained over data produced by EmbodiSwap. We release (i) code for generating the synthetic robot overlays which takes as input human videos and an arbitrary robot URDF and generates a robot dataset, (ii) the robot dataset we synthesize over EPIC-Kitchens, HOI4D and Ego4D, and (iii) model checkpoints and inference code, to facilitate reproducible research and broader adoption.
comment: Video link: https://drive.google.com/file/d/1UccngwgPqUwPMhBja7JrXfZoTquCx_Qe/view?usp=sharing
Robust Visual Embodiment: How Robots Discover Their Bodies in Real Environments
Robots with internal visual self-models promise unprecedented adaptability, yet existing autonomous modeling pipelines remain fragile under realistic sensing conditions such as noisy imagery and cluttered backgrounds. This paper presents the first systematic study quantifying how visual degradations--including blur, salt-and-pepper noise, and Gaussian noise--affect robotic self-modeling. Through both simulation and physical experiments, we demonstrate their impact on morphology prediction, trajectory planning, and damage recovery in state-of-the-art pipelines. To overcome these challenges, we introduce a task-aware denoising framework that couples classical restoration with morphology-preserving constraints, ensuring retention of structural cues critical for self-modeling. In addition, we integrate semantic segmentation to robustly isolate robots from cluttered and colorful scenes. Extensive experiments show that our approach restores near-baseline performance across simulated and physical platforms, while existing pipelines degrade significantly. These contributions advance the robustness of visual self-modeling and establish practical foundations for deploying self-aware robots in unpredictable real-world environments.
An Amphibious Untethered Inchworm Soft Robot for Fast Crawling Locomotion
Untethered soft robots are essential for advancing the real-world deployment of soft robotic systems in diverse and multitasking environments. Inspired by soft-bodied inchworm, we present a fully untethered soft robot with a curved, flexible structure actuated by magnetic forces. The robot has a total mass of 102.63 g and demonstrates multimodal locomotion, achieving a maximum walking speed of 3.74 cm/s and a swimming speed of 0.82 cm/s. A compact and lightweight onboard control circuit enables wireless command transmission, while an integrated camera provides environmental perception. Through structural optimization and system-level integration, the robot successfully performs walking, steering, swimming, and payload transport without reliance on external infrastructure. The robot's dynamic performance and locomotion capabilities are systematically validated through experimental characterization.
Geometrically Exact Hard Magneto-Elastic Cosserat Shells: Static Formulation for Shape Morphing
Cosserat rod theory is the popular approach to modeling ferromagnetic soft robots as 1-Dimensional (1D) slender structures in most applications, such as biomedical. However, recent soft robots designed for locomotion and manipulation often exhibit a large width-to-length ratio that categorizes them as 2D shells. For analysis and shape-morphing control purposes, we develop an efficient coordinate-free static model of hard-magnetic shells found in soft magnetic grippers and walking soft robots. The approach is based on a novel formulation of Cosserat shell theory on the Special Euclidean group ($\mathbf{SE}(3)$). The shell is assumed to be a 2D manifold of material points with six degrees of freedom (position & rotation) suitable for capturing the behavior of a uniformly distributed array of spheroidal hard magnetic particles embedded in the rheological elastomer. The shell's configuration manifold is the space of all smooth embeddings $\mathbb{R}^2\rightarrow\mathbf{SE}(3)$. According to a novel definition of local deformation gradient based on the Lie group structure of $\mathbf{SE}(3)$, we derive the strong and weak forms of equilibrium equations, following the principle of virtual work. We extract the linearized version of the weak form for numerical implementations. The resulting finite element approach can avoid well-known challenges such as singularity and locking phenomenon in modeling shell structures. The proposed model is analytically and experimentally validated through a series of test cases that demonstrate its superior efficacy, particularly when the shell undergoes severe rotations and displacements.
Safety-Oriented Dynamic Path Planning for Automated Vehicles
Ensuring safety in autonomous vehicles necessitates advanced path planning and obstacle avoidance capabilities, particularly in dynamic environments. This paper introduces a bi-level control framework that efficiently augments road boundaries by incorporating time-dependent grid projections of obstacle movements, thus enabling precise and adaptive path planning. The main control loop utilizes Nonlinear Model Predictive Control (NMPC) for real-time path optimization, wherein homotopy-based constraint relaxation is employed to improve the solvability of the optimal control problem (OCP). Furthermore, an independent backup loop runs concurrently to provide safe fallback trajectories when an optimal trajectory cannot be computed by the main loop within a critical time frame, thus enhancing safety and real-time performance. Our evaluation showcases the benefits of the proposed methods in various driving scenarios, highlighting the real-time applicability and robustness of our approach. Overall, the framework represents a significant step towards safer and more reliable autonomous driving in complex and dynamic environments.
comment: Published in 2025 IEEE 101st Vehicular Technology Conference (VTC2025-Spring), Oslo, Norway, June 17-20, 2025. Received Best Conference Paper Award
Learning to Act Through Contact: A Unified View of Multi-Task Robot Learning
We present a unified framework for multi-task locomotion and manipulation policy learning grounded in a contact-explicit representation. Instead of designing different policies for different tasks, our approach unifies the definition of a task through a sequence of contact goals-desired contact positions, timings, and active end-effectors. This enables leveraging the shared structure across diverse contact-rich tasks, leading to a single policy that can perform a wide range of tasks. In particular, we train a goal-conditioned reinforcement learning (RL) policy to realise given contact plans. We validate our framework on multiple robotic embodiments and tasks: a quadruped performing multiple gaits, a humanoid performing multiple biped and quadrupedal gaits, and a humanoid executing different bimanual object manipulation tasks. Each of these scenarios is controlled by a single policy trained to execute different tasks grounded in contacts, demonstrating versatile and robust behaviours across morphologically distinct systems. Our results show that explicit contact reasoning significantly improves generalisation to unseen scenarios, positioning contact-explicit policy learning as a promising foundation for scalable loco-manipulation.
Deep Reinforcement Learning for Multi-Agent Coordination
We address the challenge of coordinating multiple robots in narrow and confined environments, where congestion and interference often hinder collective task performance. Drawing inspiration from insect colonies, which achieve robust coordination through stigmergy -- modifying and interpreting environmental traces -- we propose a Stigmergic Multi-Agent Deep Reinforcement Learning (S-MADRL) framework that leverages virtual pheromones to model local and social interactions, enabling decentralized emergent coordination without explicit communication. To overcome the convergence and scalability limitations of existing algorithms such as MADQN, MADDPG, and MAPPO, we leverage curriculum learning, which decomposes complex tasks into progressively harder sub-problems. Simulation results show that our framework achieves the most effective coordination of up to eight agents, where robots self-organize into asymmetric workload distributions that reduce congestion and modulate group performance. This emergent behavior, analogous to strategies observed in nature, demonstrates a scalable solution for decentralized multi-agent coordination in crowded environments with communication constraints.
comment: 11 pages, 8 figures, 1 table, presented at SWARM 2022, to be published in Journal of Artificial Life and Robotics
LLM-MCoX: Large Language Model-based Multi-robot Coordinated Exploration and Search
Autonomous exploration and object search in unknown indoor environments remain challenging for multi-robot systems (MRS). Traditional approaches often rely on greedy frontier assignment strategies with limited inter-robot coordination. In this work, we introduce LLM-MCoX (LLM-based Multi-robot Coordinated Exploration and Search), a novel framework that leverages Large Language Models (LLMs) for intelligent coordination of both homogeneous and heterogeneous robot teams tasked with efficient exploration and target object search. Our approach combines real-time LiDAR scan processing for frontier cluster extraction and doorway detection with multimodal LLM reasoning (e.g., GPT-4o) to generate coordinated waypoint assignments based on shared environment maps and robot states. LLM-MCoX demonstrates superior performance compared to existing methods, including greedy and Voronoi-based planners, achieving 22.7% faster exploration times and 50% improved search efficiency in large environments with 6 robots. Notably, LLM-MCoX enables natural language-based object search capabilities, allowing human operators to provide high-level semantic guidance that traditional algorithms cannot interpret.
AutoDrive-QA: A Multiple-Choice Benchmark for Vision-Language Evaluation in Urban Autonomous Driving
Evaluating vision-language models (VLMs) in urban driving contexts remains challenging, as existing benchmarks rely on open-ended responses that are ambiguous, annotation-intensive, and inconsistent to score. This lack of standardized evaluation slows progress toward safe and reliable AI for urban mobility. We introduce AutoDrive-QA, the first benchmark that systematically converts open-ended driving QA datasets (DriveLM, NuScenes-QA, LingoQA) into structured multiple-choice questions (MCQs) with distractors grounded in five realistic error categories: Driving Domain Misconceptions, Logical Inconsistencies, Misinterpreted Sensor Inputs, Computational Oversights, and Question Ambiguity. This framework enables reproducible and interpretable evaluation of VLMs across perception, prediction, and planning tasks in complex urban scenes. Experiments show that fine-tuning LLaVA-1.5-7B improves accuracy by about six percentage points across tasks, GPT-4V achieves the strongest zero-shot performance with up to 69.8% accuracy, and Qwen2-VL models also perform competitively, particularly in multi-view settings. Moreover, traditional metrics such as BLEU and CIDEr fail to distinguish strong from weak models. By providing an objective, domain-grounded evaluation protocol, AutoDrive-QA contributes to more transparent benchmarking of urban AI systems, supporting the development of safer and more trustworthy autonomous driving technologies for smart cities.
comment: Updated results with larger dataset experiments and expanded discussion
Nonparametric adaptive payload tracking for an offshore crane
A nonparametric adaptive controller is proposed for crane control where the payload tracks a desired trajectory with feedback from the payload position. The controller is based on a novel version of partial feedback linearization where the unactuated crane load dynamics are controlled with the position of the actuated crane dynamics instead of the acceleration. This is made possible by taking advantage of the gravity terms in a new Cartesian model that we propose for the load dynamics. This Cartesian model structure makes it possible to implement a nonparametric adaptive controller which cancels disturbances on the crane load by approximating the effects of unknown disturbance forces and structurally unknown dynamics in a reproducing kernel Hilbert space (RKHS). It is shown that the nonparametric adaptive controller leads to uniformly ultimately bounded errors in the presence of unknown forces and unmodeled dynamics. In addition, it is shown that the proposed partial feedback linearization based on the Cartesian model has certain advantages in payload tracking control also in the non-adaptive case. The performance of the nonparametric adaptive controller is validated in simulation and experiments with good results.
RT-GuIDE: Real-Time Gaussian Splatting for Information-Driven Exploration
We propose a framework for active mapping and exploration that leverages Gaussian splatting for constructing dense maps. Further, we develop a GPU-accelerated motion planning algorithm that can exploit the Gaussian map for real-time navigation. The Gaussian map constructed onboard the robot is optimized for both photometric and geometric quality while enabling real-time situational awareness for autonomy. We show through viewpoint selection experiments that our method yields comparable Peak Signal-to-Noise Ratio (PSNR) and similar reconstruction error to state-of-the-art approaches, while being orders of magnitude faster to compute. In closed-loop physics-based simulation and real-world experiments, our algorithm achieves better map quality (at least 0.8dB higher PSNR and more than 16% higher geometric reconstruction accuracy) than maps constructed by a state-of-the-art method, enabling semantic segmentation using off-the-shelf open-set models. Experiment videos and more details can be found on our project page: https://tyuezhan.github.io/RT GuIDE/
Cyber Racing Coach: A Haptic Shared Control Framework for Teaching Advanced Driving Skills
This study introduces a haptic shared control framework designed to teach human drivers advanced driving skills. In this context, shared control refers to a driving mode where the human driver collaborates with an autonomous driving system to control the steering of a vehicle simultaneously. Advanced driving skills are those necessary to safely push the vehicle to its handling limits in high-performance driving such as racing and emergency obstacle avoidance. Previous research has demonstrated the performance and safety benefits of shared control schemes using both subjective and objective evaluations. However, these schemes have not been assessed for their impact on skill acquisition on complex and demanding tasks. Prior research on long-term skill acquisition either applies haptic shared control to simple tasks or employs other feedback methods like visual and auditory aids. To bridge this gap, this study creates a cyber racing coach framework based on the haptic shared control paradigm and evaluates its performance in helping human drivers acquire high-performance driving skills. The framework introduces (1) an autonomous driving system that is capable of cooperating with humans in a highly performant driving scenario; and (2) a haptic shared control mechanism along with a fading scheme to gradually reduce the steering assistance from autonomy based on the human driver's performance during training. Two benchmarks are considered: self-learning (no assistance) and full assistance during training. Results from a human subject study indicate that the proposed framework helps human drivers develop superior racing skills compared to the benchmarks, resulting in better performance and consistency.
comment: Added the statement written as red text for the IEEE preprint policy
Dual-arm Motion Generation for Repositioning Care based on Deep Predictive Learning with Somatosensory Attention Mechanism IROS 2025
Caregiving is a vital role for domestic robots, especially the repositioning care has immense societal value, critically improving the health and quality of life of individuals with limited mobility. However, repositioning task is a challenging area of research, as it requires robots to adapt their motions while interacting flexibly with patients. The task involves several key challenges: (1) applying appropriate force to specific target areas; (2) performing multiple actions seamlessly, each requiring different force application policies; and (3) motion adaptation under uncertain positional conditions. To address these, we propose a deep neural network (DNN)-based architecture utilizing proprioceptive and visual attention mechanisms, along with impedance control to regulate the robot's movements. Using the dual-arm humanoid robot Dry-AIREC, the proposed model successfully generated motions to insert the robot's hand between the bed and a mannequin's back without applying excessive force, and it supported the transition from a supine to a lifted-up position. The project page is here: https://sites.google.com/view/caregiving-robot-airec/repositioning
comment: Accepted at IROS 2025
LERa: Replanning with Visual Feedback in Instruction Following IROS 2025
Large Language Models are increasingly used in robotics for task planning, but their reliance on textual inputs limits their adaptability to real-world changes and failures. To address these challenges, we propose LERa - Look, Explain, Replan - a Visual Language Model-based replanning approach that utilizes visual feedback. Unlike existing methods, LERa requires only a raw RGB image, a natural language instruction, an initial task plan, and failure detection - without additional information such as object detection or predefined conditions that may be unavailable in a given scenario. The replanning process consists of three steps: (i) Look - where LERa generates a scene description and identifies errors; (ii) Explain - where it provides corrective guidance; and (iii) Replan - where it modifies the plan accordingly. LERa is adaptable to various agent architectures and can handle errors from both dynamic scene changes and task execution failures. We evaluate LERa on the newly introduced ALFRED-ChaOS and VirtualHome-ChaOS datasets, achieving a 40% improvement over baselines in dynamic environments. In tabletop manipulation tasks with a predefined probability of task failure within the PyBullet simulator, LERa improves success rates by up to 67%. Further experiments, including real-world trials with a tabletop manipulator robot, confirm LERa's effectiveness in replanning. We demonstrate that LERa is a robust and adaptable solution for error-aware task execution in robotics. The project page is available at https://lera-robo.github.io.
comment: Accepted to IROS 2025
A Corrector-aided Look-ahead Distance-based Guidance for Online Reference Path Following with an Efficient Mid-course Guidance Strategy
Efficient path-following is crucial in most of the applications of autonomous vehicles (UxV). Among various guidance strategies presented in literature, the look-ahead distance ($L_1$)-based nonlinear guidance has received significant attention due to its ease in implementation and ability to maintain a low cross-track error while following simpler reference paths and generating bounded lateral acceleration commands. However, the constant value of $L_1$ becomes problematic when the UxV is far away from the reference path and also produces higher cross-track error while following complex reference paths having high variation in radius of curvature. To address these challenges, the notion of look-ahead distance is leveraged in a novel way to develop a two-phase guidance strategy. Initially, when the UxV is far from the reference path, an optimized $L_1$ selection strategy is developed to guide the UxV towards the vicinity of the start point of the reference path, while maintaining minimal lateral acceleration command. Once the vehicle reaches a close neighborhood of the reference path, a novel notion of corrector point is incorporated in the constant $L_1$-based guidance scheme to generate the guidance command that effectively reduces the root mean square of the cross-track error and lateral acceleration requirement thereafter. Simulation results validate satisfactory performance of this proposed corrector point and look-ahead point pair-based guidance strategy, along with the developed mid-course guidance scheme. Also, its superiority over the conventional constant $L_1$ guidance scheme is established by simulation studies over different initial condition scenarios.
comment: This paper is currently under review for publication in ACC 2026
Learning a Unified Policy for Position and Force Control in Legged Loco-Manipulation
Robotic loco-manipulation tasks often involve contact-rich interactions with the environment, requiring the joint modeling of contact force and robot position. However, recent visuomotor policies often focus solely on learning position or force control, overlooking their co-learning. In this work, we propose the first unified policy for legged robots that jointly models force and position control learned without reliance on force sensors. By simulating diverse combinations of position and force commands alongside external disturbance forces, we use reinforcement learning to learn a policy that estimates forces from historical robot states and compensates for them through position and velocity adjustments. This policy enables a wide range of manipulation behaviors under varying force and position inputs, including position tracking, force application, force tracking, and compliant interactions. Furthermore, we demonstrate that the learned policy enhances trajectory-based imitation learning pipelines by incorporating essential contact information through its force estimation module, achieving approximately 39.5% higher success rates across four challenging contact-rich manipulation tasks compared to position-control policies. Extensive experiments on both a quadrupedal manipulator and a humanoid robot validate the versatility and robustness of the proposed policy across diverse scenarios.
comment: website: https://unified-force.github.io/
Online Hybrid-Belief POMDP with Coupled Semantic-Geometric Models
Robots operating in complex and unknown environments frequently require geometric-semantic representations of the environment to safely perform their tasks. While inferring the environment, they must account for many possible scenarios when planning future actions. Since objects' class types are discrete and the robot's self-pose and the objects' poses are continuous, the environment can be represented by a hybrid discrete-continuous belief which is updated according to models and incoming data. Prior probabilities and observation models representing the environment can be learned from data using deep learning algorithms. Such models often couple environmental semantic and geometric properties. As a result, semantic variables are interconnected, causing semantic state space dimensionality to increase exponentially. In this paper, we consider planning under uncertainty using partially observable Markov decision processes (POMDPs) with hybrid semantic-geometric beliefs. The models and priors consider the coupling between semantic and geometric variables. Within POMDP, we introduce the concept of semantically aware safety. Obtaining representative samples of the theoretical hybrid belief, required for estimating the value function, is very challenging. As a key contribution, we develop a novel form of the hybrid belief and leverage it to sample representative samples. We show that under certain conditions, the value function and probability of safety can be calculated efficiently with an explicit expectation over all possible semantic mappings. Our simulations show that our estimates of the objective function and probability of safety achieve similar levels of accuracy compared to estimators that run exhaustively on the entire semantic state-space using samples from the theoretical hybrid belief. Nevertheless, the complexity of our estimators is polynomial rather than exponential.
comment: 20 pages, 9 figures
Dynamic Neural Potential Field: Online Trajectory Optimization in the Presence of Moving Obstacles
Generalist robot policies must operate safely and reliably in everyday human environments such as homes, offices, and warehouses, where people and objects move unpredictably. We present Dynamic Neural Potential Field (NPField-GPT), a learning-enhanced model predictive control (MPC) framework that couples classical optimization with a Transformer-based predictor of footprint-aware repulsive potentials. Given an occupancy sub-map, robot footprint, and optional dynamic-obstacle cues, our autoregressive NPField-GPT head forecasts a horizon of differentiable potentials that are injected into a sequential quadratic MPC program via L4CasADi, yielding real-time, constraint-aware trajectory optimization. We additionally study two baselines: (NPField-D1) static-frame decomposition and (NPField-D2) parallel MLP heads for all steps. In dynamic indoor scenarios from BenchMR and on a Husky UGV in office corridors, NPField-GPT produces safer, more conservative trajectories under motion changes, while D1/D2 offer lower latency. We also compare with the CIAO* and MPPI baselines. Across methods, the Transformer+MPC synergy preserves the transparency and stability of model-based planning while learning only the part that benefits from data: spatiotemporal collision risk. Code and trained models are available at https://github.com/CognitiveAISystems/Dynamic-Neural-Potential-Field
ATK: Automatic Task-driven Keypoint Selection for Robust Policy Learning
Visuomotor policies often suffer from perceptual challenges, where visual differences between training and evaluation environments degrade policy performance. Policies relying on state estimations, like 6D pose, require task-specific tracking and are difficult to scale, while raw sensor-based policies may lack robustness to small visual disturbances. In this work, we leverage 2D keypoints--spatially consistent features in the image frame--as a flexible state representation for robust policy learning and apply it to both sim-to-real transfer and real-world imitation learning. However, the choice of which keypoints to use can vary across objects and tasks. We propose a novel method, ATK, to automatically select keypoints in a task-driven manner so that the chosen keypoints are predictive of optimal behavior for the given task. Our proposal optimizes for a minimal set of keypoints that focus on task-relevant parts while preserving policy performance and robustness. We distill expert data (either from an expert policy in simulation or a human expert) into a policy that operates on RGB images while tracking the selected keypoints. By leveraging pre-trained visual modules, our system effectively encodes states and transfers policies to the real-world evaluation scenario despite wide scene variations and perceptual challenges such as transparent objects, fine-grained tasks, and deformable objects manipulation. We validate ATK on various robotic tasks, demonstrating that these minimal keypoint representations significantly improve robustness to visual disturbances and environmental variations. See all experiments and more details at https://yunchuzhang.github.io/ATK/.
FreqPolicy: Frequency Autoregressive Visuomotor Policy with Continuous Tokens NeurIPS
Learning effective visuomotor policies for robotic manipulation is challenging, as it requires generating precise actions while maintaining computational efficiency. Existing methods remain unsatisfactory due to inherent limitations in the essential action representation and the basic network architectures. We observe that representing actions in the frequency domain captures the structured nature of motion more effectively: low-frequency components reflect global movement patterns, while high-frequency components encode fine local details. Additionally, robotic manipulation tasks of varying complexity demand different levels of modeling precision across these frequency bands. Motivated by this, we propose a novel paradigm for visuomotor policy learning that progressively models hierarchical frequency components. To further enhance precision, we introduce continuous latent representations that maintain smoothness and continuity in the action space. Extensive experiments across diverse 2D and 3D robotic manipulation benchmarks demonstrate that our approach outperforms existing methods in both accuracy and efficiency, showcasing the potential of a frequency-domain autoregressive framework with continuous tokens for generalized robotic manipulation.Code is available at https://github.com/4DVLab/Freqpolicy
comment: Comments: Published at Neural Information Processing Systems (NeurIPS) 2025. Project page and code: https://freq-policy.github.io/
Physics-Based Motion Imitation with Adversarial Differential Discriminators SIGGRAPH
Multi-objective optimization problems, which require the simultaneous optimization of multiple objectives, are prevalent across numerous applications. Existing multi-objective optimization methods often rely on manually-tuned aggregation functions to formulate a joint optimization objective. The performance of such hand-tuned methods is heavily dependent on careful weight selection, a time-consuming and laborious process. These limitations also arise in the setting of reinforcement-learning-based motion tracking methods for physically simulated characters, where intricately crafted reward functions are typically used to achieve high-fidelity results. Such solutions not only require domain expertise and significant manual tuning, but also limit the applicability of the resulting reward function across diverse skills. To bridge this gap, we present a novel adversarial multi-objective optimization technique that is broadly applicable to a range of multi-objective reinforcement-learning tasks, including motion tracking. Our proposed Adversarial Differential Discriminator (ADD) receives a single positive sample, yet is still effective at guiding the optimization process. We demonstrate that our technique can enable characters to closely replicate a variety of acrobatic and agile behaviors, achieving comparable quality to state-of-the-art motion-tracking methods, without relying on manually-designed reward functions. Code and results are available at https://add-moo.github.io/.
comment: SIGGRAPH Asia 2025 Conference Papers
Motion Blender Gaussian Splatting for Dynamic Scene Reconstruction
Gaussian splatting has emerged as a powerful tool for high-fidelity reconstruction of dynamic scenes. However, existing methods primarily rely on implicit motion representations, such as encoding motions into neural networks or per-Gaussian parameters, which makes it difficult to further manipulate the reconstructed motions. This lack of explicit controllability limits existing methods to replaying recorded motions only, which hinders a wider application in robotics. To address this, we propose Motion Blender Gaussian Splatting (MBGS), a novel framework that uses motion graphs as an explicit and sparse motion representation. The motion of a graph's links is propagated to individual Gaussians via dual quaternion skinning, with learnable weight painting functions that determine the influence of each link. The motion graphs and 3D Gaussians are jointly optimized from input videos via differentiable rendering. Experiments show that MBGS achieves state-of-the-art performance on the highly challenging iPhone dataset while being competitive on HyperNeRF. We demonstrate the application potential of our method in animating novel object poses, synthesizing real robot demonstrations, and predicting robot actions through visual planning. The source code, models, video demonstrations can be found at http://mlzxy.github.io/motion-blender-gs.
comment: CoRL 2025
Multiagent Systems
Improving Cooperation in Collaborative Embodied AI
The integration of Large Language Models (LLMs) into multiagent systems has opened new possibilities for collaborative reasoning and cooperation with AI agents. This paper explores different prompting methods and evaluates their effectiveness in enhancing agent collaborative behaviour and decision-making. We enhance CoELA, a framework designed for building Collaborative Embodied Agents that leverage LLMs for multi-agent communication, reasoning, and task coordination in shared virtual spaces. Through systematic experimentation, we examine different LLMs and prompt engineering strategies to identify optimised combinations that maximise collaboration performance. Furthermore, we extend our research by integrating speech capabilities, enabling seamless collaborative voice-based interactions. Our findings highlight the effectiveness of prompt optimisation in enhancing collaborative agent performance; for example, our best combination improved the efficiency of the system running with Gemma3 by 22% compared to the original CoELA system. In addition, the speech integration provides a more engaging user interface for iterative system development and demonstrations.
comment: In proceedings of UKCI 2025
Axiomatisation for an asynchronous epistemic logic with sending and receiving messages
We investigate a public announcement logic for asynchronous public announcements wherein the sending of the announcements by the environment is separated from the reception of the announcements by the individual agents. Both come with different modalities. In the logical semantics, formulas are interpreted in a world of a Kripke model but given a history of prior announcements and receptions of announcements that already happened. An axiomatisation AA for such a logic has been given in prior work, for the formulas that are valid when interpreted in the Kripke model before any such announcements have taken place. This axiomatisation is a reduction system wherein one can show that every formula is equivalent to a purely epistemic formula without dynamic modalities for announcements and receptions. We propose a generalisation AA* of this axiomatisation, for the formulas that are valid when interpreted in the Kripke model given any history of prior announcements and receptions of announcements. It does not extend the axiomatisation AA, for example it is no longer valid that nobody has received any announcement. Unlike AA, this axiomatisation AA* is infinitary and it is not a reduction system.
Delay-Tolerant Augmented-Consensus-based Distributed Directed Optimization
Distributed optimization finds applications in large-scale machine learning, data processing and classification over multi-agent networks. In real-world scenarios, the communication network of agents may encounter latency that may affect the convergence of the optimization protocol. This paper addresses the case where the information exchange among the agents (computing nodes) over data-transmission channels (links) might be subject to communication time-delays, which is not well addressed in the existing literature. Our proposed algorithm improves the state-of-the-art by handling heterogeneous and arbitrary but bounded and fixed (time-invariant) delays over general strongly-connected directed networks. Arguments from matrix theory, algebraic graph theory, and augmented consensus formulation are applied to prove the convergence to the optimal value. Simulations are provided to verify the results and compare the performance with some existing delay-free algorithms.
comment: Systems & Control Letters
Long-Term Mapping of the Douro River Plume with Multi-Agent Reinforcement Learning
We study the problem of long-term (multiple days) mapping of a river plume using multiple autonomous underwater vehicles (AUVs), focusing on the Douro river representative use-case. We propose an energy - and communication - efficient multi-agent reinforcement learning approach in which a central coordinator intermittently communicates with the AUVs, collecting measurements and issuing commands. Our approach integrates spatiotemporal Gaussian process regression (GPR) with a multi-head Q-network controller that regulates direction and speed for each AUV. Simulations using the Delft3D ocean model demonstrate that our method consistently outperforms both single- and multi-agent benchmarks, with scaling the number of agents both improving mean squared error (MSE) and operational endurance. In some instances, our algorithm demonstrates that doubling the number of AUVs can more than double endurance while maintaining or improving accuracy, underscoring the benefits of multi-agent coordination. Our learned policies generalize across unseen seasonal regimes over different months and years, demonstrating promise for future developments of data-driven long-term monitoring of dynamic plume environments.
Destination-to-Chutes Task Mapping Optimization for Multi-Robot Coordination in Robotic Sorting Systems
We study optimizing a destination-to-chutes task mapping to improve throughput in Robotic Sorting Systems (RSS), where a team of robots sort packages on a sortation floor by transporting them from induct workstations to eject chutes based on their shipping destinations (e.g. Los Angeles or Pittsburgh). The destination-to-chutes task mapping is used to determine which chutes a robot can drop its package. Finding a high-quality task mapping is challenging because of the complexity of a real-world RSS. First, optimizing task mapping is interdependent with robot target assignment and path planning. Second, chutes will be CLOSED for a period of time once they receive sufficient packages to allow for downstream processing. Third, task mapping quality directly impacts the downstream processing, as scattered chutes for the same destination increase package handling time. In this paper, we first formally define task mappings and the problem of Task Mapping Optimization (TMO). We then present a simulator of RSS to evaluate task mappings. We then present a simple TMO method based on the Evolutionary Algorithm and Mixed Integer Linear Programming, demonstrating the advantage of our optimized task mappings over the greedily generated ones in various RSS setups with different map sizes, numbers of chutes, and destinations. Finally, we use Quality Diversity algorithms to analyze the throughput of a diverse set of task mappings. Our code is available online at https://github.com/lunjohnzhang/tmo_public.
comment: Accepted to IEEE International Symposium on Multi-Robot and Multi-Agent Systems (MRS) 2025
Downside Risk-Aware Equilibria for Strategic Decision-Making ECAI 2024
Game theory has traditionally had a relatively limited view of risk based on how a player's expected reward is impacted by the uncertainty of the actions of other players. Recently, a new game-theoretic approach provides a more holistic view of risk also considering the reward-variance. However, these variance-based approaches measure variance of the reward on both the upside and downside. In many domains, such as finance, downside risk only is of key importance, as this represents the potential losses associated with a decision. In contrast, large upside "risk" (e.g. profits) are not an issue. To address this restrictive view of risk, we propose a novel solution concept, downside risk aware equilibria (DRAE) based on lower partial moments. DRAE restricts downside risk, while placing no restrictions on upside risk, and additionally, models higher-order risk preferences. We demonstrate the applicability of DRAE on several games, successfully finding equilibria which balance downside risk with expected reward, and prove the existence and optimality of this equilibria.
comment: Accepted at ECAI 2024 Workshop on AI In Finance
The Argument is the Explanation: Structured Argumentation for Trust in Agents
Humans are black boxes -- we cannot observe their neural processes, yet society functions by evaluating verifiable arguments. AI explainability should follow this principle: stakeholders need verifiable reasoning chains, not mechanistic transparency. We propose using structured argumentation to provide a level of explanation and verification neither interpretability nor LLM-generated explanation is able to offer. Our pipeline achieves state-of-the-art 94.44 macro F1 on the AAEC published train/test split (5.7 points above prior work) and $0.81$ macro F1, $\sim$0.07 above previous published results with comparable data setups, for Argumentative MicroTexts relation classification, converting LLM text into argument graphs and enabling verification at each inferential step. We demonstrate this idea on multi-agent risk assessment using the Structured What-If Technique, where specialized agents collaborate transparently to carry out risk assessment otherwise achieved by humans alone. Using Bipolar Assumption-Based Argumentation, we capture support/attack relationships, thereby enabling automatic hallucination detection via fact nodes attacking arguments. We also provide a verification mechanism that enables iterative refinement through test-time feedback without retraining. For easy deployment, we provide a Docker container for the fine-tuned AMT model, and the rest of the code with the Bipolar ABA Python package on GitHub.
comment: 8 pages, 4 figures, 6 tables, submitted to IAAI-26
ContraGen: A Multi-Agent Generation Framework for Enterprise Contradictions Detection
Retrieval-Augmented Generation (RAG) integrates LLMs with external sources, offering advanced capabilities for information access and decision-making. However, contradictions in retrieved evidence can result in inconsistent or untrustworthy outputs, which is especially problematic in enterprise settings where compliance, governance, and accountability are critical. Existing benchmarks for contradiction detection are limited to sentence-level analysis and do not capture the complexity of enterprise documents such as contracts, financial filings, compliance reports, or policy manuals. To address this limitation, we propose ContraGen, a contradiction-aware benchmark framework tailored to enterprise domain. The framework generates synthetic enterprise-style documents with embedded contradictions, enabling systematic evaluation of both intra-document and cross-document consistency. Automated contradiction mining is combined with human-in-the-loop validation to ensure high accuracy. Our contributions include generating realistic enterprise documents, modeling a taxonomy of contradiction types common in business processes, enabling controlled creation of self- and pairwise contradictions, developing a contradiction-aware retrieval evaluation pipeline and embedding human oversight to reflect domain-specific judgment complexity. This work establishes a foundation for more trustworthy and accountable RAG systems in enterprise information-seeking applications, where detecting and resolving contradictions is essential for reducing risk and ensuring compliance.
LegalSim: Multi-Agent Simulation of Legal Systems for Discovering Procedural Exploits EMNLP 2025
We present LegalSim, a modular multi-agent simulation of adversarial legal proceedings that explores how AI systems can exploit procedural weaknesses in codified rules. Plaintiff and defendant agents choose from a constrained action space (for example, discovery requests, motions, meet-and-confer, sanctions) governed by a JSON rules engine, while a stochastic judge model with calibrated grant rates, cost allocations, and sanction tendencies resolves outcomes. We compare four policies: PPO, a contextual bandit with an LLM, a direct LLM policy, and a hand-crafted heuristic; Instead of optimizing binary case outcomes, agents are trained and evaluated using effective win rate and a composite exploit score that combines opponent-cost inflation, calendar pressure, settlement pressure at low merit, and a rule-compliance margin. Across configurable regimes (e.g., bankruptcy stays, inter partes review, tax procedures) and heterogeneous judges, we observe emergent ``exploit chains'', such as cost-inflating discovery sequences and calendar-pressure tactics that remain procedurally valid yet systemically harmful. Evaluation via cross-play and Bradley-Terry ratings shows, PPO wins more often, the bandit is the most consistently competitive across opponents, the LLM trails them, and the heuristic is weakest. The results are stable in judge settings, and the simulation reveals emergent exploit chains, motivating red-teaming of legal rule systems in addition to model-level testing.
comment: 12 pages with 2 figures, accepted at the NLLP workshop at EMNLP 2025
Lang-PINN: From Language to Physics-Informed Neural Networks via a Multi-Agent Framework
Physics-informed neural networks (PINNs) provide a powerful approach for solving partial differential equations (PDEs), but constructing a usable PINN remains labor-intensive and error-prone. Scientists must interpret problems as PDE formulations, design architectures and loss functions, and implement stable training pipelines. Existing large language model (LLM) based approaches address isolated steps such as code generation or architecture suggestion, but typically assume a formal PDE is already specified and therefore lack an end-to-end perspective. We present Lang-PINN, an LLM-driven multi-agent system that builds trainable PINNs directly from natural language task descriptions. Lang-PINN coordinates four complementary agents: a PDE Agent that parses task descriptions into symbolic PDEs, a PINN Agent that selects architectures, a Code Agent that generates modular implementations, and a Feedback Agent that executes and diagnoses errors for iterative refinement. This design transforms informal task statements into executable and verifiable PINN code. Experiments show that Lang-PINN achieves substantially lower errors and greater robustness than competitive baselines: mean squared error (MSE) is reduced by up to 3--5 orders of magnitude, end-to-end execution success improves by more than 50\%, and reduces time overhead by up to 74\%.
comment: PINN, PDE, Agent, LLM
An Architecture for Spatial Networking
Physical spaces are increasingly dense with networked devices, promising seamless coordination and ambient intelligence. Yet today, cloud-first architectures force all communication through wide-area networks regardless of physical proximity. We lack an abstraction for spatial networking: using physical spaces to create boundaries for private, robust, and low-latency communication. We introduce $\textit{Bifr\"ost}$, a programming model that realizes spatial networking using bigraphs to express both containment and connectivity, enabling policies to be scoped by physical boundaries, devices to be named by location, the instantiation of spatial services, and the composition of spaces while maintaining local autonomy. Bifr\"ost enables a new class of spatially-aware applications, where co-located devices communicate directly, physical barriers require explicit gateways, and local control bridges to global coordination.
MarketSenseAI 2.0: Enhancing Stock Analysis through LLM Agents
MarketSenseAI is a novel framework for holistic stock analysis which leverages Large Language Models (LLMs) to process financial news, historical prices, company fundamentals and the macroeconomic environment to support decision making in stock analysis and selection. In this paper, we present the latest advancements on MarketSenseAI, driven by rapid technological expansion in LLMs. Through a novel architecture combining Retrieval-Augmented Generation and LLM agents, the framework processes SEC filings and earnings calls, while enriching macroeconomic analysis through systematic processing of diverse institutional reports. We demonstrate a significant improvement in fundamental analysis accuracy over the previous version. Empirical evaluation on S\&P 100 stocks over two years (2023-2024) shows MarketSenseAI achieving cumulative returns of 125.9% compared to the index return of 73.5%, while maintaining comparable risk profiles. Further validation on S\&P 500 stocks during 2024 demonstrates the framework's scalability, delivering a 33.8% higher Sortino ratio than the market. This work marks a significant advancement in applying LLM technology to financial analysis, offering insights into the robustness of LLM-driven investment strategies.
comment: 25 pages, 7 figures, Under review at Financial Innovation (FIN)
Robotics
Simulation to Rules: A Dual-VLM Framework for Formal Visual Planning
Vision Language Models (VLMs) show strong potential for visual planning but struggle with precise spatial and long-horizon reasoning. In contrast, Planning Domain Definition Language (PDDL) planners excel at long-horizon formal planning, but cannot interpret visual inputs. Recent works combine these complementary advantages by enabling VLMs to turn visual planning problems into PDDL files for formal planning. However, while VLMs can generate PDDL problem files satisfactorily, they struggle to accurately generate the PDDL domain files, which describe all the planning rules. As a result, prior methods rely on human experts to predefine domain files or on constant environment access for refinement. We propose VLMFP, a Dual-VLM-guided framework that can autonomously generate both PDDL problem and domain files for formal visual planning. VLMFP introduces two VLMs to ensure reliable PDDL file generation: A SimVLM that simulates action consequences based on input rule descriptions, and a GenVLM that generates and iteratively refines PDDL files by comparing the PDDL and SimVLM execution results. VLMFP unleashes multiple levels of generalizability: The same generated PDDL domain file works for all the different instances under the same problem, and VLMs generalize to different problems with varied appearances and rules. We evaluate VLMFP with 6 grid-world domains and test its generalization to unseen instances, appearance, and game rules. On average, SimVLM accurately describes 95.5%, 82.6% of scenarios, simulates 85.5%, 87.8% of action sequence, and judges 82.4%, 85.6% goal reaching for seen and unseen appearances, respectively. With the guidance of SimVLM, VLMFP can generate PDDL files to reach 70.0%, 54.1% valid plans for unseen instances in seen and unseen appearances, respectively. Project page: https://sites.google.com/view/vlmfp.
comment: 30 pages, 5 figures, 5 tables
Optimal Smooth Coverage Trajectory Planning for Quadrotors in Cluttered Environment
For typical applications of UAVs in power grid scenarios, we construct the problem as planning UAV trajectories for coverage in cluttered environments. In this paper, we propose an optimal smooth coverage trajectory planning algorithm. The algorithm consists of two stages. In the front-end, a Genetic Algorithm (GA) is employed to solve the Traveling Salesman Problem (TSP) for Points of Interest (POIs), generating an initial sequence of optimized visiting points. In the back-end, the sequence is further optimized by considering trajectory smoothness, time consumption, and obstacle avoidance. This is formulated as a nonlinear least squares problem and solved to produce a smooth coverage trajectory that satisfies these constraints. Numerical simulations validate the effectiveness of the proposed algorithm, ensuring UAVs can smoothly cover all POIs in cluttered environments.
comment: This paper has been accepted for publication in the 44th Chinese Control Conference, 2025. Please cite the paper using appropriate formats
Improving Cooperation in Collaborative Embodied AI
The integration of Large Language Models (LLMs) into multiagent systems has opened new possibilities for collaborative reasoning and cooperation with AI agents. This paper explores different prompting methods and evaluates their effectiveness in enhancing agent collaborative behaviour and decision-making. We enhance CoELA, a framework designed for building Collaborative Embodied Agents that leverage LLMs for multi-agent communication, reasoning, and task coordination in shared virtual spaces. Through systematic experimentation, we examine different LLMs and prompt engineering strategies to identify optimised combinations that maximise collaboration performance. Furthermore, we extend our research by integrating speech capabilities, enabling seamless collaborative voice-based interactions. Our findings highlight the effectiveness of prompt optimisation in enhancing collaborative agent performance; for example, our best combination improved the efficiency of the system running with Gemma3 by 22% compared to the original CoELA system. In addition, the speech integration provides a more engaging user interface for iterative system development and demonstrations.
comment: In proceedings of UKCI 2025
MM-Nav: Multi-View VLA Model for Robust Visual Navigation via Multi-Expert Learning
Visual navigation policy is widely regarded as a promising direction, as it mimics humans by using egocentric visual observations for navigation. However, optical information of visual observations is difficult to be explicitly modeled like LiDAR point clouds or depth maps, which subsequently requires intelligent models and large-scale data. To this end, we propose to leverage the intelligence of the Vision-Language-Action (VLA) model to learn diverse navigation capabilities from synthetic expert data in a teacher-student manner. Specifically, we implement the VLA model, MM-Nav, as a multi-view VLA (with 360 observations) based on pretrained large language models and visual foundation models. For large-scale navigation data, we collect expert data from three reinforcement learning (RL) experts trained with privileged depth information in three challenging tailor-made environments for different navigation capabilities: reaching, squeezing, and avoiding. We iteratively train our VLA model using data collected online from RL experts, where the training ratio is dynamically balanced based on performance on individual capabilities. Through extensive experiments in synthetic environments, we demonstrate that our model achieves strong generalization capability. Moreover, we find that our student VLA model outperforms the RL teachers, demonstrating the synergistic effect of integrating multiple capabilities. Extensive real-world experiments further confirm the effectiveness of our method.
comment: Project page: https://pku-epic.github.io/MM-Nav-Web/
Mask2IV: Interaction-Centric Video Generation via Mask Trajectories
Generating interaction-centric videos, such as those depicting humans or robots interacting with objects, is crucial for embodied intelligence, as they provide rich and diverse visual priors for robot learning, manipulation policy training, and affordance reasoning. However, existing methods often struggle to model such complex and dynamic interactions. While recent studies show that masks can serve as effective control signals and enhance generation quality, obtaining dense and precise mask annotations remains a major challenge for real-world use. To overcome this limitation, we introduce Mask2IV, a novel framework specifically designed for interaction-centric video generation. It adopts a decoupled two-stage pipeline that first predicts plausible motion trajectories for both actor and object, then generates a video conditioned on these trajectories. This design eliminates the need for dense mask inputs from users while preserving the flexibility to manipulate the interaction process. Furthermore, Mask2IV supports versatile and intuitive control, allowing users to specify the target object of interaction and guide the motion trajectory through action descriptions or spatial position cues. To support systematic training and evaluation, we curate two benchmarks covering diverse action and object categories across both human-object interaction and robotic manipulation scenarios. Extensive experiments demonstrate that our method achieves superior visual realism and controllability compared to existing baselines.
comment: Project page: https://reagan1311.github.io/mask2iv
Learning Stability Certificate for Robotics in Real-World Environments
Stability certificates play a critical role in ensuring the safety and reliability of robotic systems. However, deriving these certificates for complex, unknown systems has traditionally required explicit knowledge of system dynamics, often making it a daunting task. This work introduces a novel framework that learns a Lyapunov function directly from trajectory data, enabling the certification of stability for autonomous systems without needing detailed system models. By parameterizing the Lyapunov candidate using a neural network and ensuring positive definiteness through Cholesky factorization, our approach automatically identifies whether the system is stable under the given trajectory. To address the challenges posed by noisy, real-world data, we allow for controlled violations of the stability condition, focusing on maintaining high confidence in the stability certification process. Our results demonstrate that this framework can provide data-driven stability guarantees, offering a robust method for certifying the safety of robotic systems in dynamic, real-world environments. This approach works without access to the internal control algorithms, making it applicable even in situations where system behavior is opaque or proprietary. The tool for learning the stability proof is open-sourced by this research: https://github.com/HansOersted/stability.
Whisker-based Tactile Flight for Tiny Drones
Tiny flying robots hold great potential for search-and-rescue, safety inspections, and environmental monitoring, but their small size limits conventional sensing-especially with poor-lighting, smoke, dust or reflective obstacles. Inspired by nature, we propose a lightweight, 3.2-gram, whisker-based tactile sensing apparatus for tiny drones, enabling them to navigate and explore through gentle physical interaction. Just as rats and moles use whiskers to perceive surroundings, our system equips drones with tactile perception in flight, allowing obstacle sensing even in pitch-dark conditions. The apparatus uses barometer-based whisker sensors to detect obstacle locations while minimising destabilisation. To address sensor noise and drift, we develop a tactile depth estimation method achieving sub-6 mm accuracy. This enables drones to navigate, contour obstacles, and explore confined spaces solely through touch-even in total darkness along both soft and rigid surfaces. Running fully onboard a 192-KB RAM microcontroller, the system supports autonomous tactile flight and is validated in both simulation and real-world tests. Our bio-inspired approach redefines vision-free navigation, opening new possibilities for micro aerial vehicles in extreme environments.
Geometry Meets Vision: Revisiting Pretrained Semantics in Distilled Fields
Semantic distillation in radiance fields has spurred significant advances in open-vocabulary robot policies, e.g., in manipulation and navigation, founded on pretrained semantics from large vision models. While prior work has demonstrated the effectiveness of visual-only semantic features (e.g., DINO and CLIP) in Gaussian Splatting and neural radiance fields, the potential benefit of geometry-grounding in distilled fields remains an open question. In principle, visual-geometry features seem very promising for spatial tasks such as pose estimation, prompting the question: Do geometry-grounded semantic features offer an edge in distilled fields? Specifically, we ask three critical questions: First, does spatial-grounding produce higher-fidelity geometry-aware semantic features? We find that image features from geometry-grounded backbones contain finer structural details compared to their counterparts. Secondly, does geometry-grounding improve semantic object localization? We observe no significant difference in this task. Thirdly, does geometry-grounding enable higher-accuracy radiance field inversion? Given the limitations of prior work and their lack of semantics integration, we propose a novel framework SPINE for inverting radiance fields without an initial guess, consisting of two core components: coarse inversion using distilled semantics, and fine inversion using photometric-based optimization. Surprisingly, we find that the pose estimation accuracy decreases with geometry-grounded features. Our results suggest that visual-only features offer greater versatility for a broader range of downstream tasks, although geometry-grounded features contain more geometric detail. Notably, our findings underscore the necessity of future research on effective strategies for geometry-grounding that augment the versatility and performance of pretrained semantic features.
A Dimension-Decomposed Learning Framework for Online Disturbance Identification in Quadrotor SE(3) Control
Quadrotor stability under complex dynamic disturbances and model uncertainties poses significant challenges. One of them remains the underfitting problem in high-dimensional features, which limits the identification capability of current learning-based methods. To address this, we introduce a new perspective: Dimension-Decomposed Learning (DiD-L), from which we develop the Sliced Adaptive-Neuro Mapping (SANM) approach for geometric control. Specifically, the high-dimensional mapping for identification is axially ``sliced" into multiple low-dimensional submappings (``slices"). In this way, the complex high-dimensional problem is decomposed into a set of simple low-dimensional tasks addressed by shallow neural networks and adaptive laws. These neural networks and adaptive laws are updated online via Lyapunov-based adaptation without any pre-training or persistent excitation (PE) condition. To enhance the interpretability of the proposed approach, we prove that the full-state closed-loop system exhibits arbitrarily close to exponential stability despite multi-dimensional time-varying disturbances and model uncertainties. This result is novel as it demonstrates exponential convergence without requiring pre-training for unknown disturbances and specific knowledge of the model.
Embracing Evolution: A Call for Body-Control Co-Design in Embodied Humanoid Robot
Humanoid robots, as general-purpose physical agents, must integrate both intelligent control and adaptive morphology to operate effectively in diverse real-world environments. While recent research has focused primarily on optimizing control policies for fixed robot structures, this position paper argues for evolving both control strategies and humanoid robots' physical structure under a co-design mechanism. Inspired by biological evolution, this approach enables robots to iteratively adapt both their form and behavior to optimize performance within task-specific and resource-constrained contexts. Despite its promise, co-design in humanoid robotics remains a relatively underexplored domain, raising fundamental questions about its feasibility and necessity in achieving true embodied intelligence. To address these challenges, we propose practical co-design methodologies grounded in strategic exploration, Sim2Real transfer, and meta-policy learning. We further argue for the essential role of co-design by analyzing it from methodological, application-driven, and community-oriented perspectives. Striving to guide and inspire future studies, we present open research questions, spanning from short-term innovations to long-term goals. This work positions co-design as a cornerstone for developing the next generation of intelligent and adaptable humanoid agents.
Long-Term Human Motion Prediction Using Spatio-Temporal Maps of Dynamics
Long-term human motion prediction (LHMP) is important for the safe and efficient operation of autonomous robots and vehicles in environments shared with humans. Accurate predictions are important for applications including motion planning, tracking, human-robot interaction, and safety monitoring. In this paper, we exploit Maps of Dynamics (MoDs), which encode spatial or spatio-temporal motion patterns as environment features, to achieve LHMP for horizons of up to 60 seconds. We propose an MoD-informed LHMP framework that supports various types of MoDs and includes a ranking method to output the most likely predicted trajectory, improving practical utility in robotics. Further, a time-conditioned MoD is introduced to capture motion patterns that vary across different times of day. We evaluate MoD-LHMP instantiated with three types of MoDs. Experiments on two real-world datasets show that MoD-informed method outperforms learning-based ones, with up to 50\% improvement in average displacement error, and the time-conditioned variant achieves the highest accuracy overall. Project code is available at https://github.com/test-bai-cpu/LHMP-with-MoDs.git
comment: IEEE Robotics and Automation Letters
HumanoidExo: Scalable Whole-Body Humanoid Manipulation via Wearable Exoskeleton
A significant bottleneck in humanoid policy learning is the acquisition of large-scale, diverse datasets, as collecting reliable real-world data remains both difficult and cost-prohibitive. To address this limitation, we introduce HumanoidExo, a novel system that transfers human motion to whole-body humanoid data. HumanoidExo offers a high-efficiency solution that minimizes the embodiment gap between the human demonstrator and the robot, thereby tackling the scarcity of whole-body humanoid data. By facilitating the collection of more voluminous and diverse datasets, our approach significantly enhances the performance of humanoid robots in dynamic, real-world scenarios. We evaluated our method across three challenging real-world tasks: table-top manipulation, manipulation integrated with stand-squat motions, and whole-body manipulation. Our results empirically demonstrate that HumanoidExo is a crucial addition to real-robot data, as it enables the humanoid policy to generalize to novel environments, learn complex whole-body control from only five real-robot demonstrations, and even acquire new skills (i.e., walking) solely from HumanoidExo data.
3D-CovDiffusion: 3D-Aware Diffusion Policy for Coverage Path Planning
Diffusion models, as a class of deep generative models, have recently emerged as powerful tools for robot skills by enabling stable training with reliable convergence. In this paper, we present an end-to-end framework for generating long, smooth trajectories that explicitly target high surface coverage across various industrial tasks, including polishing, robotic painting, and spray coating. The conventional methods are always fundamentally constrained by their predefined functional forms, which limit the shapes of the trajectories they can represent and make it difficult to handle complex and diverse tasks. Moreover, their generalization is poor, often requiring manual redesign or extensive parameter tuning when applied to new scenarios. These limitations highlight the need for more expressive generative models, making diffusion-based approaches a compelling choice for trajectory generation. By iteratively denoising trajectories with carefully learned noise schedules and conditioning mechanisms, diffusion models not only ensure smooth and consistent motion but also flexibly adapt to the task context. In experiments, our method improves trajectory continuity, maintains high coverage, and generalizes to unseen shapes, paving the way for unified end-to-end trajectory learning across industrial surface-processing tasks without category-specific models. On average, our approach improves Point-wise Chamfer Distance by 98.2\% and smoothness by 97.0\%, while increasing surface coverage by 61\% compared to prior methods. The link to our code can be found \href{https://anonymous.4open.science/r/spraydiffusion_ral-2FCE/README.md}{here}.
Real-Time Nonlinear Model Predictive Control of Heavy-Duty Skid-Steered Mobile Platform for Trajectory Tracking Tasks
This paper presents a framework for real-time optimal controlling of a heavy-duty skid-steered mobile platform for trajectory tracking. The importance of accurate real-time performance of the controller lies in safety considerations of situations where the dynamic system under control is affected by uncertainties and disturbances, and the controller should compensate for such phenomena in order to provide stable performance. A multiple-shooting nonlinear model-predictive control framework is proposed in this paper. This framework benefits from suitable algorithm along with readings from various sensors for genuine real-time performance with extremely high accuracy. The controller is then tested for tracking different trajectories where it demonstrates highly desirable performance in terms of both speed and accuracy. This controller shows remarkable improvement when compared to existing nonlinear model-predictive controllers in the literature that were implemented on skid-steered mobile platforms.
AI-Enhanced Kinematic Modeling of Flexible Manipulators Using Multi-IMU Sensor Fusion
This paper presents a novel framework for estimating the position and orientation of flexible manipulators undergoing vertical motion using multiple inertial measurement units (IMUs), optimized and calibrated with ground truth data. The flexible links are modeled as a series of rigid segments, with joint angles estimated from accelerometer and gyroscope measurements acquired by cost-effective IMUs. A complementary filter is employed to fuse the measurements, with its parameters optimized through particle swarm optimization (PSO) to mitigate noise and delay. To further improve estimation accuracy, residual errors in position and orientation are compensated using radial basis function neural networks (RBFNN). Experimental results validate the effectiveness of the proposed intelligent multi-IMU kinematic estimation method, achieving root mean square errors (RMSE) of 0.00021~m, 0.00041~m, and 0.00024~rad for $y$, $z$, and $\theta$, respectively.
YawSitter: Modeling and Controlling a Tail-Sitter UAV with Enhanced Yaw Control
Achieving precise lateral motion modeling and decoupled control in hover remains a significant challenge for tail-sitter Unmanned Aerial Vehicles (UAVs), primarily due to complex aerodynamic couplings and the absence of welldefined lateral dynamics. This paper presents a novel modeling and control strategy that enhances yaw authority and lateral motion by introducing a sideslip force model derived from differential propeller slipstream effects acting on the fuselage under differential thrust. The resulting lateral force along the body y-axis enables yaw-based lateral position control without inducing roll coupling. The control framework employs a YXZ Euler rotation formulation to accurately represent attitude and incorporate gravitational components while directly controlling yaw in the yaxis, thereby improving lateral dynamic behavior and avoiding singularities. The proposed approach is validated through trajectory-tracking simulations conducted in a Unity-based environment. Tests on both rectangular and circular paths in hover mode demonstrate stable performance, with low mean absolute position errors and yaw deviations constrained within 5.688 degrees. These results confirm the effectiveness of the proposed lateral force generation model and provide a foundation for the development of agile, hover-capable tail-sitter UAVs.
Single-Rod Brachiation Robot: Mechatronic Control Design and Validation of Prejump Phases
A complete mechatronic design of a minimal configuration brachiation robot is presented. The robot consists of a single rigid rod with gripper mechanisms attached to both ends. The grippers are used to hang the robot on a horizontal bar on which it swings or rotates. The motion is imposed by repositioning the robot's center of mass, which is performed using a crank-slide mechanism. Based on a non-linear model, an optimal control strategy is proposed, for repositioning the center of mass in a bang-bang manner. Consequently, utilizing the concept of input-output linearization, a continuous control strategy is proposed that takes into account the limited torque of the crank-slide mechanism and its geometry. An increased attention is paid to energy accumulation towards the subsequent jump stage of the brachiation. These two strategies are validated and compared in simulations. The continuous control strategy is then also implemented within a low-cost STM32-based control system, and both the swing and rotation stages of the brachiation motion are experimentally validated.
comment: 11 pages, 13 figures, 1 table, Accepted 27 July 2025, Available online 16 Sept 2025, Version of Record 28 Sept 2025
Metrics vs Surveys: Can Quantitative Measures Replace Human Surveys in Social Robot Navigation? A Correlation Analysis
Social, also called human-aware, navigation is a key challenge for the integration of mobile robots into human environments. The evaluation of such systems is complex, as factors such as comfort, safety, and legibility must be considered. Human-centered assessments, typically conducted through surveys, provide reliable insights but are costly, resource-intensive, and difficult to reproduce or compare across systems. Alternatively, numerical social navigation metrics are easy to compute and facilitate comparisons, yet the community lacks consensus on a standard set of metrics. This work explores the relationship between numerical metrics and human-centered evaluations to identify potential correlations. If specific quantitative measures align with human perceptions, they could serve as standardized evaluation tools, reducing the dependency on surveys. Our results indicate that while current metrics capture some aspects of robot navigation behavior, important subjective factors remain insufficiently represented and new metrics are necessary.
Point Cloud-Based Control Barrier Functions for Model Predictive Control in Safety-Critical Navigation of Autonomous Mobile Robots IROS2025
In this work, we propose a novel motion planning algorithm to facilitate safety-critical navigation for autonomous mobile robots. The proposed algorithm integrates a real-time dynamic obstacle tracking and mapping system that categorizes point clouds into dynamic and static components. For dynamic point clouds, the Kalman filter is employed to estimate and predict their motion states. Based on these predictions, we extrapolate the future states of dynamic point clouds, which are subsequently merged with static point clouds to construct the forward-time-domain (FTD) map. By combining control barrier functions (CBFs) with nonlinear model predictive control, the proposed algorithm enables the robot to effectively avoid both static and dynamic obstacles. The CBF constraints are formulated based on risk points identified through collision detection between the predicted future states and the FTD map. Experimental results from both simulated and real-world scenarios demonstrate the efficacy of the proposed algorithm in complex environments. In simulation experiments, the proposed algorithm is compared with two baseline approaches, showing superior performance in terms of safety and robustness in obstacle avoidance. The source code is released for the reference of the robotics community.
comment: 8 pages, 8 figures, accepted to IROS2025
Novel UWB Synthetic Aperture Radar Imaging for Mobile Robot Mapping
Traditional exteroceptive sensors in mobile robots, such as LiDARs and cameras often struggle to perceive the environment in poor visibility conditions. Recently, radar technologies, such as ultra-wideband (UWB) have emerged as potential alternatives due to their ability to see through adverse environmental conditions (e.g. dust, smoke and rain). However, due to the small apertures with low directivity, the UWB radars cannot reconstruct a detailed image of its field of view (FOV) using a single scan. Hence, a virtual large aperture is synthesized by moving the radar along a mobile robot path. The resulting synthetic aperture radar (SAR) image is a high-definition representation of the surrounding environment. Hence, this paper proposes a pipeline for mobile robots to incorporate UWB radar-based SAR imaging to map an unknown environment. Finally, we evaluated the performance of classical feature detectors: SIFT, SURF, BRISK, AKAZE and ORB to identify loop closures using UWB SAR images. The experiments were conducted emulating adverse environmental conditions. The results demonstrate the viability and effectiveness of UWB SAR imaging for high-resolution environmental mapping and loop closure detection toward more robust and reliable robotic perception systems.
comment: Accepted and presented at the 15th International Conference on Indoor Positioning and Indoor Navigation (IPIN) 2025, see https://ipin-conference.org/2025/
Action Deviation-Aware Inference for Low-Latency Wireless Robots
To support latency-sensitive AI applications ranging from autonomous driving to industrial robot manipulation, 6G envisions distributed ML, connecting distributed computational resources in edge and cloud over hyper-reliable low-latency communication (HRLLC). In this setting, speculative decoding can facilitate collaborative inference of models distributively deployed: an on-device draft model locally generates drafts and a remote server-based target model verifies and corrects them, resulting lower latency. However, unlike autoregressive text generation, behavior cloning policies, typically used for embodied AI applications like robot manipulation and autonomous driving, cannot parallelize verification and correction for multiple drafts as each action depends on observation which needs to be updated by a previous action. To this end, we propose Action Deviation-Aware Hybrid Inference, wherein the draft model estimates an action's need for verification and correction by the target model and selectively skips communication and computation for server operations. Action deviation shows a strong correlation with action's rejection probability by the target model, enabling selective skipping. We derive the path deviation threshold that balances the transmission rate and the inference performance, and we empirically show that action deviation-aware hybrid inference reduces uplink transmission and server operation by 40%, while lowering end-to-end latency by 33.32% relative to hybrid inference without skipping and achieving task success rate up to 97.03% of that of target model only inference.
Assist-as-needed Control for FES in Foot Drop Management
Foot drop is commonly managed using Functional Electrical Stimulation (FES), typically delivered via open-loop controllers with fixed stimulation intensities. While users may manually adjust the intensity through external controls, this approach risks overstimulation, leading to muscle fatigue and discomfort, or understimulation, which compromises dorsiflexion and increases fall risk. In this study, we propose a novel closed-loop FES controller that dynamically adjusts the stimulation intensity based on real-time toe clearance, providing "assistance as needed". We evaluate this system by inducing foot drop in healthy participants and comparing the effects of the closed-loop controller with a traditional open-loop controller across various walking conditions, including different speeds and surface inclinations. Kinematic data reveal that our closed-loop controller maintains adequate toe clearance without significantly affecting the joint angles of the hips, the knees, and the ankles, and while using significantly lower stimulation intensities compared to the open-loop controller. These findings suggest that the proposed method not only matches the effectiveness of existing systems but also offers the potential for reduced muscle fatigue and improved long-term user comfort and adherence.
Work Zones challenge VLM Trajectory Planning: Toward Mitigation and Robust Autonomous Driving
Visual Language Models (VLMs), with powerful multimodal reasoning capabilities, are gradually integrated into autonomous driving by several automobile manufacturers to enhance planning capability in challenging environments. However, the trajectory planning capability of VLMs in work zones, which often include irregular layouts, temporary traffic control, and dynamically changing geometric structures, is still unexplored. To bridge this gap, we conduct the \textit{first} systematic study of VLMs for work zone trajectory planning, revealing that mainstream VLMs fail to generate correct trajectories in $68.0%$ of cases. To better understand these failures, we first identify candidate patterns via subgraph mining and clustering analysis, and then confirm the validity of $8$ common failure patterns through human verification. Building on these findings, we propose REACT-Drive, a trajectory planning framework that integrates VLMs with Retrieval-Augmented Generation (RAG). Specifically, REACT-Drive leverages VLMs to convert prior failure cases into constraint rules and executable trajectory planning code, while RAG retrieves similar patterns in new scenarios to guide trajectory generation. Experimental results on the ROADWork dataset show that REACT-Drive yields a reduction of around $3\times$ in average displacement error relative to VLM baselines under evaluation with Qwen2.5-VL. In addition, REACT-Drive yields the lowest inference time ($0.58$s) compared with other methods such as fine-tuning ($17.90$s). We further conduct experiments using a real vehicle in 15 work zone scenarios in the physical world, demonstrating the strong practicality of REACT-Drive.
comment: 13 pages,5 figures
VERNIER: an open-source software pushing marker pose estimation down to the micrometer and nanometer scales
Pose estimation is still a challenge at the small scales. Few solutions exist to capture the 6 degrees of freedom of an object with nanometric and microradians resolutions over relatively large ranges. Over the years, we have proposed several fiducial marker and pattern designs to achieve reliable performance for various microscopy applications. Centimeter ranges are possible using pattern encoding methods, while nanometer resolutions can be achieved using phase processing of the periodic frames. This paper presents VERNIER, an open source phase processing software designed to provide fast and reliable pose measurement based on pseudo-periodic patterns. Thanks to a phase-based local thresholding algorithm, the software has proven to be particularly robust to noise, defocus and occlusion. The successive steps of the phase processing are presented, as well as the different types of patterns that address different application needs. The implementation procedure is illustrated with synthetic and experimental images. Finally, guidelines are given for selecting the appropriate pattern design and microscope magnification lenses as a function of the desired performance.
Periodic Event-Triggered Prescribed Time Control of Euler-Lagrange Systems under State and Input Constraints
This article proposes a periodic event-triggered adaptive barrier control policy for the trajectory tracking problem of perturbed Euler-Lagrangian systems with state, input, and temporal (SIT) constraints. In particular, an approximation-free adaptive-barrier control architecture is designed to ensure prescribed-time convergence of the tracking error to a prescribed bound while rejecting exogenous disturbances. In contrast to existing approaches that necessitate continuous real-time control action, the proposed controller generates event-based updates through periodic evaluation of the triggering condition. Additionally, we derive an upper bound on the monitoring period by analysing the performance degradation of the filtered tracking error to facilitate periodic evaluation of the event-triggered strategy. To this end, a time-varying threshold function is considered in the triggering mechanism to reduce the number of triggers during the transient phase of system behaviour. Notably, the proposed design avoids Zeno behaviour and precludes the need for continuous monitoring of the triggering condition. A simulation and experimental study is undertaken to demonstrate the efficacy of the proposed control scheme.
Flow with the Force Field: Learning 3D Compliant Flow Matching Policies from Force and Demonstration-Guided Simulation Data
While visuomotor policy has made advancements in recent years, contact-rich tasks still remain a challenge. Robotic manipulation tasks that require continuous contact demand explicit handling of compliance and force. However, most visuomotor policies ignore compliance, overlooking the importance of physical interaction with the real world, often leading to excessive contact forces or fragile behavior under uncertainty. Introducing force information into vision-based imitation learning could help improve awareness of contacts, but could also require a lot of data to perform well. One remedy for data scarcity is to generate data in simulation, yet computationally taxing processes are required to generate data good enough not to suffer from the Sim2Real gap. In this work, we introduce a framework for generating force-informed data in simulation, instantiated by a single human demonstration, and show how coupling with a compliant policy improves the performance of a visuomotor policy learned from synthetic data. We validate our approach on real-robot tasks, including non-prehensile block flipping and a bi-manual object moving, where the learned policy exhibits reliable contact maintenance and adaptation to novel conditions. Project Website: https://flow-with-the-force-field.github.io/webpage/
Team Xiaomi EV-AD VLA: Caption-Guided Retrieval System for Cross-Modal Drone Navigation -- Technical Report for IROS 2025 RoboSense Challenge Track 4
Cross-modal drone navigation remains a challenging task in robotics, requiring efficient retrieval of relevant images from large-scale databases based on natural language descriptions. The RoboSense 2025 Track 4 challenge addresses this challenge, focusing on robust, natural language-guided cross-view image retrieval across multiple platforms (drones, satellites, and ground cameras). Current baseline methods, while effective for initial retrieval, often struggle to achieve fine-grained semantic matching between text queries and visual content, especially in complex aerial scenes. To address this challenge, we propose a two-stage retrieval refinement method: Caption-Guided Retrieval System (CGRS) that enhances the baseline coarse ranking through intelligent reranking. Our method first leverages a baseline model to obtain an initial coarse ranking of the top 20 most relevant images for each query. We then use Vision-Language-Model (VLM) to generate detailed captions for these candidate images, capturing rich semantic descriptions of their visual content. These generated captions are then used in a multimodal similarity computation framework to perform fine-grained reranking of the original text query, effectively building a semantic bridge between the visual content and natural language descriptions. Our approach significantly improves upon the baseline, achieving a consistent 5\% improvement across all key metrics (Recall@1, Recall@5, and Recall@10). Our approach win TOP-2 in the challenge, demonstrating the practical value of our semantic refinement strategy in real-world robotic navigation scenarios.
A $1000\times$ Faster LLM-enhanced Algorithm For Path Planning in Large-scale Grid Maps
Path planning in grid maps, arising from various applications, has garnered significant attention. Existing methods, such as A*, Dijkstra, and their variants, work well for small-scale maps but fail to address large-scale ones due to high search time and memory consumption. Recently, Large Language Models (LLMs) have shown remarkable performance in path planning but still suffer from spatial illusion and poor planning performance. Among all the works, LLM-A* \cite{meng2024llm} leverages LLM to generate a series of waypoints and then uses A* to plan the paths between the neighboring waypoints. In this way, the complete path is constructed. However, LLM-A* still suffers from high computational time for large-scale maps. To fill this gap, we conducted a deep investigation into LLM-A* and found its bottleneck, resulting in limited performance. Accordingly, we design an innovative LLM-enhanced algorithm, abbr. as iLLM-A*. iLLM-A* includes 3 carefully designed mechanisms, including the optimization of A*, an incremental learning method for LLM to generate high-quality waypoints, and the selection of the appropriate waypoints for A* for path planning. Finally, a comprehensive evaluation on various grid maps shows that, compared with LLM-A*, iLLM-A* \textbf{1) achieves more than $1000\times$ speedup on average, and up to $2349.5\times$ speedup in the extreme case, 2) saves up to $58.6\%$ of the memory cost, 3) achieves both obviously shorter path length and lower path length standard deviation.}
A Trajectory Generator for High-Density Traffic and Diverse Agent-Interaction Scenarios
Accurate trajectory prediction is fundamental to autonomous driving, as it underpins safe motion planning and collision avoidance in complex environments. However, existing benchmark datasets suffer from a pronounced long-tail distribution problem, with most samples drawn from low-density scenarios and simple straight-driving behaviors. This underrepresentation of high-density scenarios and safety critical maneuvers such as lane changes, overtaking and turning is an obstacle to model generalization and leads to overly optimistic evaluations. To address these challenges, we propose a novel trajectory generation framework that simultaneously enhances scenarios density and enriches behavioral diversity. Specifically, our approach converts continuous road environments into a structured grid representation that supports fine-grained path planning, explicit conflict detection, and multi-agent coordination. Built upon this representation, we introduce behavior-aware generation mechanisms that combine rule-based decision triggers with Frenet-based trajectory smoothing and dynamic feasibility constraints. This design allows us to synthesize realistic high-density scenarios and rare behaviors with complex interactions that are often missing in real data. Extensive experiments on the large-scale Argoverse 1 and Argoverse 2 datasets demonstrate that our method significantly improves both agent density and behavior diversity, while preserving motion realism and scenario-level safety. Our synthetic data also benefits downstream trajectory prediction models and enhances performance in challenging high-density scenarios.
Multi-robot Rigid Formation Navigation via Synchronous Motion and Discrete-time Communication-Control Optimization
Rigid-formation navigation of multiple robots is essential for applications such as cooperative transportation. This process involves a team of collaborative robots maintaining a predefined geometric configuration, such as a square, while in motion. For untethered collaborative motion, inter-robot communication must be conducted through a wireless network. Notably, few existing works offer a comprehensive solution for multi-robot formation navigation executable on microprocessor platforms via wireless networks, particularly for formations that must traverse complex curvilinear paths. To address this gap, we introduce a novel "hold-and-hit" communication-control framework designed to work seamlessly with the widely-used Robotic Operating System (ROS) platform. The hold-and-hit framework synchronizes robot movements in a manner robust against wireless network delays and packet loss. It operates over discrete-time communication-control cycles, making it suitable for implementation on contemporary microprocessors. Complementary to hold-and-hit, we propose an intra-cycle optimization approach that enables rigid formations to closely follow desired curvilinear paths, even under the nonholonomic movement constraints inherent to most vehicular robots. The combination of hold-and-hit and intra-cycle optimization ensures precise and reliable navigation even in challenging scenarios. Simulations in a virtual environment demonstrate the superiority of our method in maintaining a four-robot square formation along an S-shaped path, outperforming two existing approaches. Furthermore, real-world experiments validate the effectiveness of our framework: the robots maintained an inter-distance error within $\pm 0.069m$ and an inter-angular orientation error within $\pm19.15^{\circ}$ while navigating along an S-shaped path at a fixed linear velocity of $0.1 m/s$.
Reachable Predictive Control: A Novel Control Algorithm for Nonlinear Systems with Unknown Dynamics and its Practical Applications
This paper proposes an algorithm capable of driving a system to follow a piecewise linear trajectory without prior knowledge of the system dynamics. Motivated by a critical failure scenario in which a system can experience an abrupt change in its dynamics, we demonstrate that it is possible to follow a set of waypoints comprised of states analytically proven to be reachable despite not knowing the system dynamics. The proposed algorithm first applies small perturbations to locally learn the system dynamics around the current state, then computes the set of states that are provably reachable using the locally learned dynamics and their corresponding maximum growth-rate bounds, and finally synthesizes a control action that navigates the system to a guaranteed reachable state.
Shape-Space Graphs: Fast and Collision-Free Path Planning for Soft Robots
Soft robots, inspired by elephant trunks or octopus arms, offer extraordinary flexibility to bend, twist, and elongate in ways that rigid robots cannot. However, their motion planning remains a challenge, especially in cluttered environments with obstacles, due to their highly nonlinear and infinite-dimensional kinematics. Here, we present a graph-based path planning tool for an elephant-trunk-inspired soft robotic arm designed with three artificial muscle fibers that allow for multimodal continuous deformation through contraction. Using a biomechanical model inspired by morphoelasticity and active filament theory, we precompute a shape library and construct a $k$-nearest neighbor graph in \emph{shape space}, ensuring that each node corresponds to a mechanically accurate and physically valid robot shape. For the graph, we use signed distance functions to prune nodes and edges colliding with obstacles, and define multi-objective edge costs based on geometric distance and actuation effort, enabling energy-efficient planning with collision avoidance. We demonstrate that our algorithm reliably avoids obstacles and generates feasible paths within milliseconds from precomputed graphs using Dijkstra's algorithm. We show that including energy costs can drastically reduce the actuation effort compared to geometry-only planning, at the expense of longer tip trajectories. Our results highlight the potential of shape-space graph search for fast and reliable path planning in the field of soft robotics, paving the way for real-time applications in surgical, industrial, and assistive settings.
SketchPlan: Diffusion Based Drone Planning From Human Sketches
We propose SketchPlan, a diffusion-based planner that interprets 2D hand-drawn sketches over depth images to generate 3D flight paths for drone navigation. SketchPlan comprises two components: a SketchAdapter that learns to map the human sketches to projected 2D paths, and DiffPath, a diffusion model that infers 3D trajectories from 2D projections and a first person view depth image. Our model achieves zero-shot sim-to-real transfer, generating accurate and safe flight paths in previously unseen real-world environments. To train the model, we build a synthetic dataset of 32k flight paths using a diverse set of photorealistic 3D Gaussian Splatting scenes. We automatically label the data by computing 2D projections of the 3D flight paths onto the camera plane, and use this to train the DiffPath diffusion model. However, since real human 2D sketches differ significantly from ideal 2D projections, we additionally label 872 of the 3D flight paths with real human sketches and use this to train the SketchAdapter to infer the 2D projection from the human sketch. We demonstrate SketchPlan's effectiveness in both simulated and real-world experiments, and show through ablations that training on a mix of human labeled and auto-labeled data together with a modular design significantly boosts its capabilities to correctly interpret human intent and infer 3D paths. In real-world drone tests, SketchPlan achieved 100\% success in low/medium clutter and 40\% in unseen high-clutter environments, outperforming key ablations by 20-60\% in task completion.
comment: Code available at https://github.com/sixnor/SketchPlan
Agile Tradespace Exploration for Space Rendezvous Mission Design via Transformers
Spacecraft rendezvous enables on-orbit servicing, debris removal, and crewed docking, forming the foundation for a scalable space economy. Designing such missions requires rapid exploration of the tradespace between control cost and flight time across multiple candidate targets. However, multi-objective optimization in this setting is challenging, as the underlying constraints are often highly nonconvex, and mission designers must balance accuracy (e.g., solving the full problem) with efficiency (e.g., convex relaxations), slowing iteration and limiting design agility. To address these challenges, this paper proposes an AI-powered framework that enables agile mission design for a wide range of Earth orbit rendezvous scenarios. Given the orbital information of the target spacecraft, boundary conditions, and a range of flight times, this work proposes a Transformer-based architecture that generates, in a single parallelized inference step, a set of near-Pareto optimal trajectories across varying flight times, thereby enabling rapid mission trade studies. The model is further extended to accommodate variable flight times and perturbed orbital dynamics, supporting realistic multi-objective trade-offs. Validation on chance-constrained rendezvous problems with passive safety constraints demonstrates that the model generalizes across both flight times and dynamics, consistently providing high-quality initial guesses that converge to superior solutions in fewer iterations. Moreover, the framework efficiently approximates the Pareto front, achieving runtimes comparable to convex relaxation by exploiting parallelized inference. Together, these results position the proposed framework as a practical surrogate for nonconvex trajectory generation and mark an important step toward AI-driven trajectory design for accelerating preliminary mission planning in real-world rendezvous applications.
comment: 14 pages, 7 figures
Efficient Surgical Robotic Instrument Pose Reconstruction in Real World Conditions Using Unified Feature Detection
Accurate camera-to-robot calibration is essential for any vision-based robotic control system and especially critical in minimally invasive surgical robots, where instruments conduct precise micro-manipulations. However, MIS robots have long kinematic chains and partial visibility of their degrees of freedom in the camera, which introduces challenges for conventional camera-to-robot calibration methods that assume stiff robots with good visibility. Previous works have investigated both keypoint-based and rendering-based approaches to address this challenge in real-world conditions; however, they often struggle with consistent feature detection or have long inference times, neither of which are ideal for online robot control. In this work, we propose a novel framework that unifies the detection of geometric primitives (keypoints and shaft edges) through a shared encoding, enabling efficient pose estimation via projection geometry. This architecture detects both keypoints and edges in a single inference and is trained on large-scale synthetic data with projective labeling. This method is evaluated across both feature detection and pose estimation, with qualitative and quantitative results demonstrating fast performance and state-of-the-art accuracy in challenging surgical environments.
LapSurgie: Humanoid Robots Performing Surgery via Teleoperated Handheld Laparoscopy
Robotic laparoscopic surgery has gained increasing attention in recent years for its potential to deliver more efficient and precise minimally invasive procedures. However, adoption of surgical robotic platforms remains largely confined to high-resource medical centers, exacerbating healthcare disparities in rural and low-resource regions. To close this gap, a range of solutions has been explored, from remote mentorship to fully remote telesurgery. Yet, the practical deployment of surgical robotic systems to underserved communities remains an unsolved challenge. Humanoid systems offer a promising path toward deployability, as they can directly operate in environments designed for humans without extensive infrastructure modifications -- including operating rooms. In this work, we introduce LapSurgie, the first humanoid-robot-based laparoscopic teleoperation framework. The system leverages an inverse-mapping strategy for manual-wristed laparoscopic instruments that abides to remote center-of-motion constraints, enabling precise hand-to-tool control of off-the-shelf surgical laparoscopic tools without additional setup requirements. A control console equipped with a stereo vision system provides real-time visual feedback. Finally, a comprehensive user study across platforms demonstrates the effectiveness of the proposed framework and provides initial evidence for the feasibility of deploying humanoid robots in laparoscopic procedures.
Distributed Connectivity Maintenance and Recovery for Quadrotor Motion Planning
Maintaining connectivity is crucial in many multi-robot applications, yet fragile to obstacles and visual occlusions. We present a real-time distributed framework for multi-robot navigation certified by high-order control barrier functions (HOCBFs) that controls inter-robot proximity to maintain connectivity while avoiding collisions. We incorporate control Lyapunov functions to enable connectivity recovery from initial disconnected configurations and temporary losses, providing robust connectivity during navigation in obstacle-rich environments. Our trajectory generation framework concurrently produces planning and control through a Bezier-parameterized trajectory, which naturally provides smooth curves with arbitrary degree of derivatives. The main contribution is the unified MPC-CLF-CBF framework, a continuous-time trajectory generation and control method for connectivity maintenance and recovery of multi-robot systems. We validate the framework through extensive simulations and a physical experiment with 4 Crazyflie nano-quadrotors.
Real-Time Threaded Houbara Detection and Segmentation for Wildlife Conservation using Mobile Platforms
Real-time animal detection and segmentation in natural environments are vital for wildlife conservation, enabling non-invasive monitoring through remote camera streams. However, these tasks remain challenging due to limited computational resources and the cryptic appearance of many species. We propose a mobile-optimized two-stage deep learning framework that integrates a Threading Detection Model (TDM) to parallelize YOLOv10-based detection and MobileSAM-based segmentation. Unlike prior YOLO+SAM pipelines, our approach improves real-time performance by reducing latency through threading. YOLOv10 handles detection while MobileSAM performs lightweight segmentation, both executed concurrently for efficient resource use. On the cryptic Houbara Bustard, a conservation-priority species, our model achieves mAP50 of 0.9627, mAP75 of 0.7731, mAP95 of 0.7178, and a MobileSAM mIoU of 0.7421. YOLOv10 operates at 43.7 ms per frame, confirming real-time readiness. We introduce a curated Houbara dataset of 40,000 annotated images to support model training and evaluation across diverse conditions. The code and dataset used in this study are publicly available on GitHub at https://github.com/LyesSaadSaoud/mobile-houbara-detseg. For interactive demos and additional resources, visit https://lyessaadsaoud.github.io/LyesSaadSaoud-Threaded-YOLO-SAM-Houbara.
Digital-Twin Evaluation for Proactive Human-Robot Collision Avoidance via Prediction-Guided A-RRT*
Human-robot collaboration requires precise prediction of human motion over extended horizons to enable proactive collision avoidance. Unlike existing planners that rely solely on kinodynamic models, we present a prediction-driven safe planning framework that leverages granular, joint-by-joint human motion forecasting validated in a physics-based digital twin. A capsule-based artificial potential field (APF) converts these granular predictions into collision risk metrics, triggering an Adaptive RRT* (A-RRT*) planner when thresholds are exceeded. The depth camera is used to extract 3D skeletal poses and a convolutional neural network-bidirectional long short-term memory (CNN-BiLSTM) model to predict individual joint trajectories ahead of time. A digital twin model integrates real-time human posture prediction placed in front of a simulated robot to evaluate motions and physical contacts. The proposed method enables validation of planned trajectories ahead of time and bridging potential latency gaps in updating planned trajectories in real-time. In 50 trials, our method achieved 100% proactive avoidance with > 250 mm clearance and sub-2 s replanning, demonstrating superior precision and reliability compared to existing kinematic-only planners through the integration of predictive human modeling with digital twin validation.
Robust Permissive Controller Synthesis for Interval MDPs
We address the problem of robust permissive controller synthesis for robots operating under uncertain dynamics, modeled as Interval Markov Decision Processes (IMDPs). IMDPs generalize standard MDPs by allowing transition probabilities to vary within intervals, capturing epistemic uncertainty from sensing noise, actuation imprecision, and coarse system abstractions-common in robotics. Traditional controller synthesis typically yields a single deterministic strategy, limiting adaptability. In contrast, permissive controllers (multi-strategies) allow multiple actions per state, enabling runtime flexibility and resilience. However, prior work on permissive controller synthesis generally assumes exact transition probabilities, which is unrealistic in many robotic applications. We present the first framework for robust permissive controller synthesis on IMDPs, guaranteeing that all strategies compliant with the synthesized multi-strategy satisfy reachability or reward-based specifications under all admissible transitions. We formulate the problem as mixed-integer linear programs (MILPs) and propose two encodings: a baseline vertex-enumeration method and a scalable duality-based method that avoids explicit enumeration. Experiments on four benchmark domains show that both methods synthesize robust, maximally permissive controllers and scale to large IMDPs with up to hundreds of thousands of states.
Destination-to-Chutes Task Mapping Optimization for Multi-Robot Coordination in Robotic Sorting Systems
We study optimizing a destination-to-chutes task mapping to improve throughput in Robotic Sorting Systems (RSS), where a team of robots sort packages on a sortation floor by transporting them from induct workstations to eject chutes based on their shipping destinations (e.g. Los Angeles or Pittsburgh). The destination-to-chutes task mapping is used to determine which chutes a robot can drop its package. Finding a high-quality task mapping is challenging because of the complexity of a real-world RSS. First, optimizing task mapping is interdependent with robot target assignment and path planning. Second, chutes will be CLOSED for a period of time once they receive sufficient packages to allow for downstream processing. Third, task mapping quality directly impacts the downstream processing, as scattered chutes for the same destination increase package handling time. In this paper, we first formally define task mappings and the problem of Task Mapping Optimization (TMO). We then present a simulator of RSS to evaluate task mappings. We then present a simple TMO method based on the Evolutionary Algorithm and Mixed Integer Linear Programming, demonstrating the advantage of our optimized task mappings over the greedily generated ones in various RSS setups with different map sizes, numbers of chutes, and destinations. Finally, we use Quality Diversity algorithms to analyze the throughput of a diverse set of task mappings. Our code is available online at https://github.com/lunjohnzhang/tmo_public.
comment: Accepted to IEEE International Symposium on Multi-Robot and Multi-Agent Systems (MRS) 2025
A Simulation Evaluation Suite for Robust Adaptive Quadcopter Control
Robust adaptive control methods are essential for maintaining quadcopter performance under external disturbances and model uncertainties. However, fragmented evaluations across tasks, simulators, and implementations hinder systematic comparison of these methods. This paper introduces an easy-to-deploy, modular simulation testbed for quadcopter control, built on RotorPy, that enables evaluation under a wide range of disturbances such as wind, payload shifts, rotor faults, and control latency. The framework includes a library of representative adaptive and non-adaptive controllers and provides task-relevant metrics to assess tracking accuracy and robustness. The unified modular environment enables reproducible evaluation across control methods and eliminates redundant reimplementation of components such as disturbance models, trajectory generators, and analysis tools. We illustrate the testbed's versatility through examples spanning multiple disturbance scenarios and trajectory types, including automated stress testing, to demonstrate its utility for systematic analysis. Code is available at https://github.com/Dz298/AdaptiveQuadBench.
Warm-Starting Optimization-Based Motion Planning for Robotic Manipulators via Point Cloud-Conditioned Flow Matching
Rapid robot motion generation is critical in Human-Robot Collaboration (HRC) systems, as robots need to respond to dynamic environments in real time by continuously observing their surroundings and replanning their motions to ensure both safe interactions and efficient task execution. Current sampling-based motion planners face challenges in scaling to high-dimensional configuration spaces and often require post-processing to interpolate and smooth the generated paths, resulting in time inefficiency in complex environments. Optimization-based planners, on the other hand, can incorporate multiple constraints and generate smooth trajectories directly, making them potentially more time-efficient. However, optimization-based planners are sensitive to initialization and may get stuck in local minima. In this work, we present a novel learning-based method that utilizes a Flow Matching model conditioned on a single-view point cloud to learn near-optimal solutions for optimization initialization. Our method does not require prior knowledge of the environment, such as obstacle locations and geometries, and can generate feasible trajectories directly from single-view depth camera input. Simulation studies on a UR5e robotic manipulator in cluttered workspaces demonstrate that the proposed generative initializer achieves a high success rate on its own, significantly improves the success rate of trajectory optimization compared with traditional and learning-based benchmark initializers, requires fewer optimization iterations, and exhibits strong generalization to unseen environments.
Optimal swimming with body compliance in an overdamped medium
Elongate animals and robots use undulatory body waves to locomote through diverse environments. Geometric mechanics provides a framework to model and optimize such systems in highly damped environments, connecting a prescribed shape change pattern (gait) with locomotion displacement. However, existing approaches assume precise execution of prescribed gaits, whereas in practice environmental interactions with compliant bodies of animals or robots frequently perturb the realized trajectories. In this work, we extend geometric mechanics to predict locomotor performance and search for optimal swimming strategy of compliant undulators. We introduce a compliant extension of Purcell's three-link swimmer by incorporating series-connected springs at the joints. Body dynamics are derived with resistive force theory. Geometric mechanics is incorporated into movement prediction and into an optimization framework that identifies strategies for controlling compliant swimmers to achieve maximal displacement. We validate our framework on a physical cable-driven three-link limbless robot, and demonstrate accurate prediction and optimization of locomotor performance under varied programmed, state-dependent compliance in a granular medium. Our results establish a systematic physics-based approach for modeling and controlling compliant swimming locomotion, highlighting compliance as a design feature that can be exploited for robust movement in homogeneous and heterogeneous environments.
Less is More: Lean yet Powerful Vision-Language Model for Autonomous Driving
In this work, we reconceptualize autonomous driving as a generalized language and formulate the trajectory planning task as next waypoint prediction. We introduce Max-V1, a novel framework for one-stage end-to-end autonomous driving. Our framework presents a single-pass generation paradigm that aligns with the inherent sequentiality of driving. This approach leverages the generative capacity of the VLM (Vision-Language Model) to enable end-to-end trajectory prediction directly from front-view camera input. The efficacy of this method is underpinned by a principled supervision strategy derived from statistical modeling. This provides a well-defined learning objective, which makes the framework highly amenable to master complex driving policies through imitation learning from large-scale expert demonstrations. Empirically, our method achieves the state-of-the-art performance on the nuScenes dataset, delivers an overall improvement of over 30% compared to prior baselines. Furthermore, it exhibits superior generalization performance on cross-domain datasets acquired from diverse vehicles, demonstrating notable potential for cross-vehicle robustness and adaptability. Due to these empirical strengths, this work introduces a model enabling fundamental driving behaviors, laying the foundation for the development of more capable self-driving agents. Code will be available upon publication.
Learned IMU Bias Prediction for Invariant Visual Inertial Odometry
Autonomous mobile robots operating in novel environments depend critically on accurate state estimation, often utilizing visual and inertial measurements. Recent work has shown that an invariant formulation of the extended Kalman filter improves the convergence and robustness of visual-inertial odometry by utilizing the Lie group structure of a robot's position, velocity, and orientation states. However, inertial sensors also require measurement bias estimation, yet introducing the bias in the filter state breaks the Lie group symmetry. In this paper, we design a neural network to predict the bias of an inertial measurement unit (IMU) from a sequence of previous IMU measurements. This allows us to use an invariant filter for visual inertial odometry, relying on the learned bias prediction rather than introducing the bias in the filter state. We demonstrate that an invariant multi-state constraint Kalman filter (MSCKF) with learned bias predictions achieves robust visual-inertial odometry in real experiments, even when visual information is unavailable for extended periods and the system needs to rely solely on IMU measurements.
comment: This work has been submitted to the IEEE for possible publication
HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person Scenarios NeurIPS 2025
Action segmentation is a core challenge in high-level video understanding, aiming to partition untrimmed videos into segments and assign each a label from a predefined action set. Existing methods primarily address single-person activities with fixed action sequences, overlooking multi-person scenarios. In this work, we pioneer textual reference-guided human action segmentation in multi-person settings, where a textual description specifies the target person for segmentation. We introduce the first dataset for Referring Human Action Segmentation, i.e., RHAS133, built from 133 movies and annotated with 137 fine-grained actions with 33h video data, together with textual descriptions for this new task. Benchmarking existing action segmentation methods on RHAS133 using VLM-based feature extractors reveals limited performance and poor aggregation of visual cues for the target person. To address this, we propose a holistic-partial aware Fourier-conditioned diffusion framework, i.e., HopaDIFF, leveraging a novel cross-input gate attentional xLSTM to enhance holistic-partial long-range reasoning and a novel Fourier condition to introduce more fine-grained control to improve the action segmentation generation. HopaDIFF achieves state-of-the-art results on RHAS133 in diverse evaluation settings. The dataset and code are available at https://github.com/KPeng9510/HopaDIFF.
comment: Accepted to NeurIPS 2025. The dataset and code are available at https://github.com/KPeng9510/HopaDIFF
Decoupling Geometry from Optimization in 2D Irregular Cutting and Packing Problems: an Open-Source Collision Detection Engine
Addressing irregular cutting and packing (C&P) optimization problems poses two distinct challenges: the geometric challenge of determining whether or not an item can be placed feasibly at a certain position, and the optimization challenge of finding a good solution according to some objective function. Until now, those tackling such problems have had to address both challenges simultaneously, requiring two distinct sets of expertise and a lot of research & development effort. One way to lower this barrier is to decouple the two challenges. In this paper we introduce a powerful collision detection engine (CDE) for 2D irregular C&P problems which assumes full responsibility for the geometric challenge. The CDE (i) allows users to focus with full confidence on their optimization challenge by abstracting geometry away and (ii) enables independent advances to propagate to all optimization algorithms built atop it. We present a set of core principles and design philosophies to model a general and adaptable CDE focused on maximizing performance, accuracy and robustness. These principles are accompanied by a concrete open-source implementation called $\texttt{jagua-rs}$. This paper together with its implementation serves as a catalyst for future advances in irregular C&P problems by providing a solid foundation which can either be used as it currently exists or be further improved upon.
comment: 25 pages, 16 figures
Beyond Anthropomorphism: Enhancing Grasping and Eliminating a Degree of Freedom by Fusing the Abduction of Digits Four and Five IROS
This paper presents the SABD hand, a 16-degree-of-freedom (DoF) robotic hand that departs from purely anthropomorphic designs to achieve an expanded grasp envelope, enable manipulation poses beyond human capability, and reduce the required number of actuators. This is achieved by combining the adduction/abduction (Add/Abd) joint of digits four and five into a single joint with a large range of motion. The combined joint increases the workspace of the digits by 400% and reduces the required DoFs while retaining dexterity. Experimental results demonstrate that the combined Add/Abd joint enables the hand to grasp objects with a side distance of up to 200 mm. Reinforcement learning-based investigations show that the design enables grasping policies that are effective not only for handling larger objects but also for achieving enhanced grasp stability. In teleoperated trials, the hand successfully performed 86% of attempted grasps on suitable YCB objects, including challenging non-anthropomorphic configurations. These findings validate the design's ability to enhance grasp stability, flexibility, and dexterous manipulation without added complexity, making it well-suited for a wide range of applications. A supplementary video is available at https://youtu.be/P3jRts46o4s .
comment: The first five listed authors have equal contribution. This work has been accepted to the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025
Learning High-Fidelity Robot Self-Model with Articulated 3D Gaussian Splatting
Self-modeling enables robots to build task-agnostic models of their morphology and kinematics based on data that can be automatically collected, with minimal human intervention and prior information, thereby enhancing machine intelligence. Recent research has highlighted the potential of data-driven technology in modeling the morphology and kinematics of robots. However, existing self-modeling methods suffer from either low modeling quality or excessive data acquisition costs. Beyond morphology and kinematics, texture is also a crucial component of robots, which is challenging to model and remains unexplored. In this work, a high-quality, texture-aware, and link-level method is proposed for robot self-modeling. We utilize three-dimensional (3D) Gaussians to represent the static morphology and texture of robots, and cluster the 3D Gaussians to construct neural ellipsoid bones, whose deformations are controlled by the transformation matrices generated by a kinematic neural network. The 3D Gaussians and kinematic neural network are trained using data pairs composed of joint angles, camera parameters and multi-view images without depth information. By feeding the kinematic neural network with joint angles, we can utilize the well-trained model to describe the corresponding morphology, kinematics and texture of robots at the link level, and render robot images from different perspectives with the aid of 3D Gaussian splatting. Furthermore, we demonstrate that the established model can be exploited to perform downstream tasks such as motion planning and inverse kinematics.
comment: This paper is accepted by IJRR. The code will be open-sourced on GitHub as soon as possible after the paper is officially published
S-Graphs 2.0 -- A Hierarchical-Semantic Optimization and Loop Closure for SLAM
The hierarchical structure of 3D scene graphs shows a high relevance for representations purposes, as it fits common patterns from man-made environments. But, additionally, the semantic and geometric information in such hierarchical representations could be leveraged to speed up the optimization and management of map elements and robot poses. In this direction, we present our work Situational Graphs 2.0 (S-Graphs 2.0), which leverages the hierarchical structure of indoor scenes for efficient data management and optimization. Our algorithm begins by constructing a situational graph that represents the environment into four layers: Keyframes, Walls, Rooms, and Floors. Our first novelty lies in the front-end, which includes a floor detection module capable of identifying stairways and assigning floor-level semantic relations to the underlying layers. Floor-level semantics allows us to propose a floor-based loop closure strategy, that effectively rejects false positive closures that typically appear due to aliasing between different floors of a building. Our second novelty lies in leveraging our representation hierarchy in the optimization. Our proposal consists of: (1) local optimization over a window of recent keyframes and their connected components across the four representation layers, (2) floor-level global optimization, which focuses only on keyframes and their connections within the current floor during loop closures, and (3) room-level local optimization, marginalizing redundant keyframes that share observations within the room, which reduces the computational footprint. We validate our algorithm extensively in different real multi-floor environments. Our approach shows state-of-art-art accuracy metrics in large-scale multi-floor environments, estimating hierarchical representations up to 10x faster, in average, than competing baselines
comment: 8 pages, 9 figures, Accepted in IEEE RA-L September 2025
An Intention-driven Lane Change Framework Considering Heterogeneous Dynamic Cooperation in Mixed-traffic Environment
In mixed-traffic environments, where autonomous vehicles (AVs) interact with diverse human-driven vehicles (HVs), unpredictable intentions and heterogeneous behaviors make safe and efficient lane change maneuvers highly challenging. Existing methods often oversimplify these interactions by assuming uniform patterns. We propose an intention-driven lane change framework that integrates driving-style recognition, cooperation-aware decision-making, and coordinated motion planning. A deep learning classifier trained on the NGSIM dataset identifies human driving styles in real time. A cooperation score with intrinsic and interactive components estimates surrounding drivers' intentions and quantifies their willingness to cooperate with the ego vehicle. Decision-making combines behavior cloning with inverse reinforcement learning to determine whether a lane change should be initiated. For trajectory generation, model predictive control is integrated with IRL-based intention inference to produce collision-free and socially compliant maneuvers. Experiments show that the proposed model achieves 94.2\% accuracy and 94.3\% F1-score, outperforming rule-based and learning-based baselines by 4-15\% in lane change recognition. These results highlight the benefit of modeling inter-driver heterogeneity and demonstrate the potential of the framework to advance context-aware and human-like autonomous driving in complex traffic environments.
SlideSLAM: Sparse, Lightweight, Decentralized Metric-Semantic SLAM for Multi-Robot Navigation
This paper develops a real-time decentralized metric-semantic SLAM algorithm that enables a heterogeneous robot team to collaboratively construct object-based metric-semantic maps. The proposed framework integrates a data-driven front-end for instance segmentation from either RGBD cameras or LiDARs and a custom back-end for optimizing robot trajectories and object landmarks in the map. To allow multiple robots to merge their information, we design semantics-driven place recognition algorithms that leverage the informativeness and viewpoint invariance of the object-level metric-semantic map for inter-robot loop closure detection. A communication module is designed to track each robot's observations and those of other robots whenever communication links are available. The framework supports real-time, decentralized operation onboard the robots and has been integrated with three types of aerial and ground platforms. We validate its effectiveness through experiments in both indoor and outdoor environments, as well as benchmarks on public datasets and comparisons with existing methods. The framework is open-sourced and suitable for both single-agent and multi-robot real-time metric-semantic SLAM applications. The code is available at: https://github.com/KumarRobotics/SLIDE_SLAM.
comment: Xu Liu, Jiuzhou Lei, and Ankit Prabhu contributed equally to this work
Latent Action Diffusion for Cross-Embodiment Manipulation
End-to-end learning is emerging as a powerful paradigm for robotic manipulation, but its effectiveness is limited by data scarcity and the heterogeneity of action spaces across robot embodiments. In particular, diverse action spaces across different end-effectors create barriers for cross-embodiment learning and skill transfer. We address this challenge through diffusion policies learned in a latent action space that unifies diverse end-effector actions. We first show that we can learn a semantically aligned latent action space for anthropomorphic robotic hands, a human hand, and a parallel jaw gripper using encoders trained with a contrastive loss. Second, we show that by using our proposed latent action space for co-training on manipulation data from different end-effectors, we can utilize a single policy for multi-robot control and obtain up to 25.3% improved manipulation success rates, indicating successful skill transfer despite a significant embodiment gap. Our approach using latent cross-embodiment policies presents a new method to unify different action spaces across embodiments, enabling efficient multi-robot control and data sharing across robot setups. This unified representation significantly reduces the need for extensive data collection for each new robot morphology, accelerates generalization across embodiments, and ultimately facilitates more scalable and efficient robotic learning.
comment: 15 pages, 7 figures, website: https://mimicrobotics.github.io/lad/
MCGS-SLAM: A Multi-Camera SLAM Framework Using Gaussian Splatting for High-Fidelity Mapping
Recent progress in dense SLAM has primarily targeted monocular setups, often at the expense of robustness and geometric coverage. We present MCGS-SLAM, the first purely RGB-based multi-camera SLAM system built on 3D Gaussian Splatting (3DGS). Unlike prior methods relying on sparse maps or inertial data, MCGS-SLAM fuses dense RGB inputs from multiple viewpoints into a unified, continuously optimized Gaussian map. A multi-camera bundle adjustment (MCBA) jointly refines poses and depths via dense photometric and geometric residuals, while a scale consistency module enforces metric alignment across views using low-rank priors. The system supports RGB input and maintains real-time performance at large scale. Experiments on synthetic and real-world datasets show that MCGS-SLAM consistently yields accurate trajectories and photorealistic reconstructions, usually outperforming monocular baselines. Notably, the wide field of view from multi-camera input enables reconstruction of side-view regions that monocular setups miss, critical for safe autonomous operation. These results highlight the promise of multi-camera Gaussian Splatting SLAM for high-fidelity mapping in robotics and autonomous driving.
From Real World to Logic and Back: Learning Generalizable Relational Concepts For Long Horizon Robot Planning
Robots still lag behind humans in their ability to generalize from limited experience, particularly when transferring learned behaviors to long-horizon tasks in unseen environments. We present the first method that enables robots to autonomously invent symbolic, relational concepts directly from a small number of raw, unsegmented, and unannotated demonstrations. From these, the robot learns logic-based world models that support zero-shot generalization to tasks of far greater complexity than those in training. Our framework achieves performance on par with hand-engineered symbolic models, while scaling to execution horizons far beyond training and handling up to 18$\times$ more objects than seen during learning. The results demonstrate a framework for autonomously acquiring transferable symbolic abstractions from raw robot experience, contributing toward the development of interpretable, scalable, and generalizable robot planning systems. Project website and code: https://aair-lab.github.io/r2l-lamp.
Systems and Control (CS)
A Dimension-Decomposed Learning Framework for Online Disturbance Identification in Quadrotor SE(3) Control
Quadrotor stability under complex dynamic disturbances and model uncertainties poses significant challenges. One of them remains the underfitting problem in high-dimensional features, which limits the identification capability of current learning-based methods. To address this, we introduce a new perspective: Dimension-Decomposed Learning (DiD-L), from which we develop the Sliced Adaptive-Neuro Mapping (SANM) approach for geometric control. Specifically, the high-dimensional mapping for identification is axially ``sliced" into multiple low-dimensional submappings (``slices"). In this way, the complex high-dimensional problem is decomposed into a set of simple low-dimensional tasks addressed by shallow neural networks and adaptive laws. These neural networks and adaptive laws are updated online via Lyapunov-based adaptation without any pre-training or persistent excitation (PE) condition. To enhance the interpretability of the proposed approach, we prove that the full-state closed-loop system exhibits arbitrarily close to exponential stability despite multi-dimensional time-varying disturbances and model uncertainties. This result is novel as it demonstrates exponential convergence without requiring pre-training for unknown disturbances and specific knowledge of the model.
Eigenvalue Tracking of Large-Scale Systems Impacted by Time Delays
The paper focuses on tracking eigenvalue trajectories in power system models with time delays. We formulate a continuation-based approach that employs numerical integration to follow eigenvalues as system parameters vary, in the presence of one or multiple delayed variables. The formulation preserves the sparsity of the delay differential-algebraic equation (DDAE) system model and allows treating the delay magnitude itself as a varying parameter with implementation aspects discussed in detail. Accuracy is demonstrated on a modified IEEE 39-bus system with distributed energy resources. Scalability is discussed using a realistic dynamic model of the Irish transmission network.
Economic zone data-enabled predictive control for connected open water systems
Real-time regulation of water distribution in connected open water systems is critical for ensuring system safety and meeting operational requirements. In this work, we consider a connected open water system that includes linkage hydraulic structures such as weirs, pumps and sluice gates. We propose a mixed-integer economic zone data-enabled predictive control (DeePC) approach, which is used to maintain the water levels of the branches within desired zones to avoid floods and reduce the energy consumption of the pumps in the considered water system. The proposed DeePC-based approach predicts the future dynamics of the system water levels, and generates optimal control actions based on system input and output data, thereby eliminating the need for both first-principles modeling and explicit data-driven modeling. To achieve multiple control objectives in order of priority, we utilize lexicographic optimization and adapt traditional DeePC cost function for zone tracking and energy consumption minimization. Additionally, Bayesian optimization is utilized to determine the control target zone, which effectively balances zone tracking and energy consumption in the presence of external disturbances. Comprehensive simulations and comparative analyses demonstrate the effectiveness of the proposed method. The proposed method maintains water levels within the desired zone for 97.04% of the operating time, with an average energy consumption of 33.5 kWh per 0.5 h. Compared to baseline methods, the proposed approach reduces the zone-tracking mean square error by 98.82% relative to economic zone DeePC without Bayesian optimization, and lowers energy consumption by 44.08% relative to economic set-point tracking DeePC. As compared to passive pump/gate control, the proposed method lowers the frequency of zone violations by 86.94% and the average energy consumption by 4.69%.
Real-Time Peer-to-Peer Energy Trading for Multi-Microgrids: Improved Double Auction Mechanism and Prediction-Free Online Trading Approach
Peer-to-peer energy trading offers a promising solution for enhancing renewable energy utilization and economic benefits within interconnected microgrids. However, existing real-time P2P markets face two key challenges: high computational complexity in trading mechanisms, and suboptimal participant decision-making under diverse uncertainties. Existing prediction-based decision-making methods rely heavily on accurate forecasts, which are typically unavailable for microgrids, while prediction-free methods suffer from myopic behaviors. To address these challenges, this paper proposes an improved double auction mechanism combined with an adaptive step-size search algorithm to reduce computational burden, and a data-driven dual-reference online optimization (DDOO) framework to enhance participant decision-making. The improved mechanism simplifies bidding procedures, significantly reducing computational burden and ensuring rapid convergence to the market equilibrium. Additionally, the prediction-free DDOO framework mitigates myopic decision-making by introducing two informative reference signals. Case studies on a 20-microgrid system demonstrate the effectiveness and scalability of the proposed mechanism and approach. The improved mechanism significantly decreases the computational time while increasing local energy self-sufficiency periods from 0.01% to 29.86%, reducing reverse power flow periods from 24.51% to 3.96%, and lowering average operating costs by 19.20%. Compared with conventional approaches such as Lyapunov optimization and model predictive control, the DDOO framework achieves a 10%-13% reduction in operating costs with an optimality gap of only 5.76%.
Corrosion Risk Estimation for Heritage Preservation: An Internet of Things and Machine Learning Approach Using Temperature and Humidity
Proactive preservation of steel structures at culturally significant heritage sites like the San Sebastian Basilica in the Philippines requires accurate corrosion forecasting. This study developed an Internet of Things hardware system connected with LoRa wireless communications to monitor heritage buildings with steel structures. From a three year dataset generated by the IoT system, we built a machine learning framework for predicting atmospheric corrosion rates using only temperature and relative humidity data. Deployed via a Streamlit dashboard with ngrok tunneling for public access, the framework provides real-time corrosion monitoring and actionable preservation recommendations. This minimal-data approach is scalable and cost effective for heritage sites with limited monitoring resources, showing that advanced regression can extract accurate corrosion predictions from basic meteorological data enabling proactive preservation of culturally significant structures worldwide without requiring extensive sensor networks
comment: 17 pages
Single-Rod Brachiation Robot: Mechatronic Control Design and Validation of Prejump Phases
A complete mechatronic design of a minimal configuration brachiation robot is presented. The robot consists of a single rigid rod with gripper mechanisms attached to both ends. The grippers are used to hang the robot on a horizontal bar on which it swings or rotates. The motion is imposed by repositioning the robot's center of mass, which is performed using a crank-slide mechanism. Based on a non-linear model, an optimal control strategy is proposed, for repositioning the center of mass in a bang-bang manner. Consequently, utilizing the concept of input-output linearization, a continuous control strategy is proposed that takes into account the limited torque of the crank-slide mechanism and its geometry. An increased attention is paid to energy accumulation towards the subsequent jump stage of the brachiation. These two strategies are validated and compared in simulations. The continuous control strategy is then also implemented within a low-cost STM32-based control system, and both the swing and rotation stages of the brachiation motion are experimentally validated.
comment: 11 pages, 13 figures, 1 table, Accepted 27 July 2025, Available online 16 Sept 2025, Version of Record 28 Sept 2025
Incomplete Air Mixing Reduces the Efficiency of Commercial Buildings Behaving as Virtual Batteries
Commercial building Heating, Ventilation, and Air Conditioning (HVAC) systems can provide flexibility to the electricity grid. Some researchers have found it convenient to model HVAC systems as virtual batteries. These models also better align with models used by grid planners and operators. However, experiments have shown that HVAC load shifting can be inefficient, and virtual battery models do not capture this inefficiency well. While the models typically use the average room temperature as the system's ``state of charge," they do not capture other factors that affect HVAC power/energy such as airflow and mixing. Here, we develop a new analytical building model to explore how incomplete mixing of supply air into a conditioned space leads to inefficiency in a virtual battery capturing the dynamics of HVAC fan power load shifting. The model qualitatively matches experimental results better than previous models, and shows that, as mixing becomes worse, the virtual battery becomes less efficient. Unfortunately, air mixing is unmeasured/unmeasurable. However, we show that, by closing the loop around measurements of fan power, we can improve the virtual battery's performance without the need for air mixing measurements. For example, in one case, we show a roundtrip efficiency improvement from 0.75 to 0.99.
Global Convergence of Policy Gradient for Entropy Regularized Linear-Quadratic Control with multiplicative noise
Reinforcement Learning (RL) has emerged as a powerful framework for sequential decision-making in dynamic environments, particularly when system parameters are unknown. This paper investigates RL-based control for entropy-regularized Linear Quadratic control (LQC) problems with multiplicative noises over an infinite time horizon. First, we adapt the Regularized Policy Gradient (RPG) algorithm to stochastic optimal control settings, proving that despite the non-convexity of the problem, RPG converges globally under conditions of gradient domination and near-smoothness. Second, based on zero-order optimization approach, we introduce a novel model free RL algorithm: Sample-Based Regularized Policy Gradient (SB-RPG). SB-RPG operates without knowledge of system parameters yet still retains strong theoretical guarantees of global convergence. Our model leverages entropy regularization to accelerate convergence and address the exploration versus exploitation trade-off inherent in RL. Numerical simulations validate the theoretical results and demonstrate the efficacy of SB-RPG in unknown-parameters environments.
comment: 33 pages, 4 figures
Delay-Tolerant Augmented-Consensus-based Distributed Directed Optimization
Distributed optimization finds applications in large-scale machine learning, data processing and classification over multi-agent networks. In real-world scenarios, the communication network of agents may encounter latency that may affect the convergence of the optimization protocol. This paper addresses the case where the information exchange among the agents (computing nodes) over data-transmission channels (links) might be subject to communication time-delays, which is not well addressed in the existing literature. Our proposed algorithm improves the state-of-the-art by handling heterogeneous and arbitrary but bounded and fixed (time-invariant) delays over general strongly-connected directed networks. Arguments from matrix theory, algebraic graph theory, and augmented consensus formulation are applied to prove the convergence to the optimal value. Simulations are provided to verify the results and compare the performance with some existing delay-free algorithms.
comment: Systems & Control Letters
Life Estimation of HVDC Cable Insulation under Load Cycles: from Macroscopic to Microscopic Charge Conduction Modelling
This paper goes one step forward in the life estimation of HVDC cable insulation under load cycles by introducing for the first time a microscopic model of charge conduction and transport i.e., Bipolar Charge Transport BCT model for electric field calculation inside the insulation thickness. The paper firstly includes the development and the validation of BCT model with that found in literature. Then, the parameters of the developed BCT model are optimized using Pulsed Electro-Acoustic PEA space charge measurements. Followed by the integration of the developed, validated and optimized model into the electric field calculation for life estimation of a 500 kV DC-XLPE insulated cable subjected to Type Test load cycles according to Cigre Techical Brochure 852. The developed microscopic model is compared to the macroscopic models already found in the literature. The microscopic model shows a comparable electric field inversion similarly to macroscopic models. However, the behavior of the microscopic model is noticed to be different under heating and cooling load cycles. In hot cable, the maximum electric field stabilizes at different amplitude and position inside the insulation thickness in both models. This investigation has been carried out in the framework of the HEU-NEWGEN research project.
Assist-as-needed Control for FES in Foot Drop Management
Foot drop is commonly managed using Functional Electrical Stimulation (FES), typically delivered via open-loop controllers with fixed stimulation intensities. While users may manually adjust the intensity through external controls, this approach risks overstimulation, leading to muscle fatigue and discomfort, or understimulation, which compromises dorsiflexion and increases fall risk. In this study, we propose a novel closed-loop FES controller that dynamically adjusts the stimulation intensity based on real-time toe clearance, providing "assistance as needed". We evaluate this system by inducing foot drop in healthy participants and comparing the effects of the closed-loop controller with a traditional open-loop controller across various walking conditions, including different speeds and surface inclinations. Kinematic data reveal that our closed-loop controller maintains adequate toe clearance without significantly affecting the joint angles of the hips, the knees, and the ankles, and while using significantly lower stimulation intensities compared to the open-loop controller. These findings suggest that the proposed method not only matches the effectiveness of existing systems but also offers the potential for reduced muscle fatigue and improved long-term user comfort and adherence.
Periodic Event-Triggered Prescribed Time Control of Euler-Lagrange Systems under State and Input Constraints
This article proposes a periodic event-triggered adaptive barrier control policy for the trajectory tracking problem of perturbed Euler-Lagrangian systems with state, input, and temporal (SIT) constraints. In particular, an approximation-free adaptive-barrier control architecture is designed to ensure prescribed-time convergence of the tracking error to a prescribed bound while rejecting exogenous disturbances. In contrast to existing approaches that necessitate continuous real-time control action, the proposed controller generates event-based updates through periodic evaluation of the triggering condition. Additionally, we derive an upper bound on the monitoring period by analysing the performance degradation of the filtered tracking error to facilitate periodic evaluation of the event-triggered strategy. To this end, a time-varying threshold function is considered in the triggering mechanism to reduce the number of triggers during the transient phase of system behaviour. Notably, the proposed design avoids Zeno behaviour and precludes the need for continuous monitoring of the triggering condition. A simulation and experimental study is undertaken to demonstrate the efficacy of the proposed control scheme.
A Control-Barrier-Function-Based Algorithm for Policy Adaptation in Reinforcement Learning
This paper considers the problem of adapting a predesigned policy, represented by a parameterized function class, from a solution that minimizes a given original cost function to a trade-off solution between minimizing the original objective and an additional cost function. The problem is formulated as a constrained optimization problem, where deviations from the optimal value of the original cost are explicitly constrained. To solve it, we develop a closed-loop system that governs the evolution of the policy parameters, with a closed-loop controller designed to adjust the additional cost gradient to ensure the satisfaction of the constraint. The resulting closed-loop system, termed control-barrier-function-based policy adaptation, exploits the set-invariance property of control barrier functions to guarantee constraint satisfaction. The effectiveness of the proposed method is demonstrated through numerical experiments on the Cartpole and Lunar Lander benchmarks from OpenAI Gym, as well as a quadruped robot, thereby illustrating both its practicality and potential for real-world policy adaptation.
Guaranteed Time Control using Linear Matrix Inequalities
This paper presents a synthesis approach aiming to guarantee a minimum upper-bound for the time taken to reach a target set of non-zero measure that encompasses the origin, while taking into account uncertainties and input and state constraints. This approach is based on a harmonic transformation of the Lyapunov function and a novel piecewise quadratic representation of this transformed Lyapunov function over a simplicial partition of the state space. The problem is solved in a policy iteration fashion, whereas the evaluation and improvement steps are formulated as linear matrix inequalities employing the structural relaxation approach. Though initially formulated for uncertain polytopic systems, extensions to piecewise and nonlinear systems are discussed. Three examples illustrate the effectiveness of the proposed approach in different scenarios.
comment: Preprint - Initial submission submitted to Automatica
Generalization of Graph Neural Network Models for Distribution Grid Fault Detection
Fault detection in power distribution grids is critical for ensuring system reliability and preventing costly outages. Moreover, fault detection methodologies should remain robust to evolving grid topologies caused by factors such as reconfigurations, equipment failures, and Distributed Energy Resource (DER) integration. Current data-driven state-of-the-art methods use Recurrent Neural Networks (RNNs) for temporal modeling and Graph Neural Networks (GNNs) for spatial learning, in an RNN+GNN pipeline setting (RGNN in short). Specifically, for power system fault diagnosis, Graph Convolutional Networks (GCNs) have been adopted. Yet, various more advanced GNN architectures have been proposed and adopted in domains outside of power systems. In this paper, we set out to systematically and consistently benchmark various GNN architectures in an RNN+GNN pipeline model. Specifically, to the best of our knowledge, we are the first to (i) propose to use GraphSAGE and Graph Attention (GAT, GATv2) in an RGNN for fault diagnosis, and (ii) provide a comprehensive benchmark against earlier proposed RGNN solutions (RGCN) as well as pure RNN models (especially Gated Recurrent Unit (GRU)), particularly (iii) exploring their generalization potential for deployment in different settings than those used for training them. Our experimental results on the IEEE 123-node distribution network show that RGATv2 has superior generalization capabilities, maintaining high performance with an F1-score reduction of $\sim$12% across different topology settings. In contrast, pure RNN models largely fail, experiencing an F1-score reduction of up to $\sim$60%, while other RGNN variants also exhibit significant performance degradation, i.e., up to $\sim$25% lower F1-scores.
comment: This paper has been submitted and accepted for IEEE SmartGridComm 2025
Long-Term Mapping of the Douro River Plume with Multi-Agent Reinforcement Learning
We study the problem of long-term (multiple days) mapping of a river plume using multiple autonomous underwater vehicles (AUVs), focusing on the Douro river representative use-case. We propose an energy - and communication - efficient multi-agent reinforcement learning approach in which a central coordinator intermittently communicates with the AUVs, collecting measurements and issuing commands. Our approach integrates spatiotemporal Gaussian process regression (GPR) with a multi-head Q-network controller that regulates direction and speed for each AUV. Simulations using the Delft3D ocean model demonstrate that our method consistently outperforms both single- and multi-agent benchmarks, with scaling the number of agents both improving mean squared error (MSE) and operational endurance. In some instances, our algorithm demonstrates that doubling the number of AUVs can more than double endurance while maintaining or improving accuracy, underscoring the benefits of multi-agent coordination. Our learned policies generalize across unseen seasonal regimes over different months and years, demonstrating promise for future developments of data-driven long-term monitoring of dynamic plume environments.
Certifiable Safe RLHF: Fixed-Penalty Constraint Optimization for Safer Language Models
Ensuring safety is a foundational requirement for large language models (LLMs). Achieving an appropriate balance between enhancing the utility of model outputs and mitigating their potential for harm is a complex and persistent challenge. Contemporary approaches frequently formalize this problem within the framework of Constrained Markov Decision Processes (CMDPs) and employ established CMDP optimization techniques. However, these methods exhibit two notable limitations. First, their reliance on reward and cost functions renders performance highly sensitive to the underlying scoring mechanism, which must capture semantic meaning rather than being triggered by superficial keywords. Second, CMDP-based training entails tuning dual-variable, a process that is both computationally expensive and does not provide any provable safety guarantee for a fixed dual variable that can be exploitable through adversarial jailbreaks. To overcome these limitations, we introduce Certifiable Safe-RLHF (CS-RLHF) that introduces a cost model trained on a large-scale corpus to assign semantically grounded safety scores. In contrast to the lagrangian-based approach, CS-RLHF adopts a rectified penalty-based formulation. This design draws on the theory of exact penalty functions in constrained optimization, wherein constraint satisfaction is enforced directly through a suitably chosen penalty term. With an appropriately scaled penalty, feasibility of the safety constraints can be guaranteed at the optimizer, eliminating the need for dual-variable updates. Empirical evaluation demonstrates that CS-RLHF outperforms state-of-the-art LLM model responses rendering at-least 5 times efficient against nominal and jail-breaking prompts
Machine Learning-Driven Prediction of Lithium-Ion Battery Power Capability for eVTOL Aircraft
Electric vertical take-off and landing (eVTOL) aircraft have emerged as a promising solution to transform urban transportation. They present a few technical challenges for battery management, a prominent one of which is the prediction of the power capability of their lithium-ion battery systems. The challenge originates from the high C-rate discharging conditions required during eVTOL flights as well as the complexity of lithium-ion batteries' electro-thermal dynamics. This paper, for the first time, formulates a power limit prediction problem for eVTOL which explicitly considers long prediction horizons and the possible occurrence of emergency landings. We then harness machine learning to solve this problem in two intertwined ways. First, we adopt a dynamic model that integrates physics with machine learning to predict a lithium-ion battery's voltage and temperature behaviors with high accuracy. Second, while performing search for the maximum power, we leverage machine learning to predict the remaining discharge time and use the prediction to accelerate the search with fast computation. Our validation results show the effectiveness of the proposed study for eVTOL operations.
comment: 2025 American Control Conference (ACC)
CANOPI: Contingency-Aware Nodal Optimal Power Investments with High Temporal Resolution
We present CANOPI, a novel algorithmic framework, for solving the Contingency-Aware Nodal Power Investments problem, a large-scale nonlinear optimization problem that jointly optimizes generation, storage, and transmission expansion. The underlying problem is nonlinear due to the impact of transmission upgrades on impedances, and the problem's large scale arises from the confluence of spatial and temporal resolutions. We propose algorithmic approaches to address these computational challenges. We pose a linear approximation of the overall nonlinear model, and develop a fixed-point algorithm to adjust for the nonlinear impedance feedback effect. We solve the large-scale linear expansion model with a specialized level-bundle method leveraging a novel interleaved approach to contingency constraint generation. We introduce a minimal cycle basis algorithm that improves the numerical sparsity of cycle-based DC power flow formulations, accelerating solve times for the operational subproblems. CANOPI is demonstrated on a 1493-bus Western Interconnection test system built from realistic-geography network data, with hourly operations spanning 52 week-long scenarios and a total possible set of 20 billion individual transmission contingency constraints. Numerical results quantify the reliability and economic benefits of fully incorporating transmission contingencies in integrated planning models and highlight the computational advantages of the proposed methods.
comment: This work has been submitted to the IEEE for possible publication
Robust Permissive Controller Synthesis for Interval MDPs
We address the problem of robust permissive controller synthesis for robots operating under uncertain dynamics, modeled as Interval Markov Decision Processes (IMDPs). IMDPs generalize standard MDPs by allowing transition probabilities to vary within intervals, capturing epistemic uncertainty from sensing noise, actuation imprecision, and coarse system abstractions-common in robotics. Traditional controller synthesis typically yields a single deterministic strategy, limiting adaptability. In contrast, permissive controllers (multi-strategies) allow multiple actions per state, enabling runtime flexibility and resilience. However, prior work on permissive controller synthesis generally assumes exact transition probabilities, which is unrealistic in many robotic applications. We present the first framework for robust permissive controller synthesis on IMDPs, guaranteeing that all strategies compliant with the synthesized multi-strategy satisfy reachability or reward-based specifications under all admissible transitions. We formulate the problem as mixed-integer linear programs (MILPs) and propose two encodings: a baseline vertex-enumeration method and a scalable duality-based method that avoids explicit enumeration. Experiments on four benchmark domains show that both methods synthesize robust, maximally permissive controllers and scale to large IMDPs with up to hundreds of thousands of states.
A Sequential Quadratic Programming Perspective on Optimal Control
This paper offers a unified perspective on different approaches to the solution of optimal control problems through the lens of constrained sequential quadratic programming. In particular, it allows us to find the relationships between Newton's method, the iterative LQR (iLQR), and Differential Dynamic Programming (DDP) approaches to solve the problem. It is shown that the iLQR is a principled SQP approach, rather than simply an approximation of DDP by neglecting the Hessian terms, to solve optimal control problems that can be guaranteed to always produce a cost-descent direction and converge to an optimum; while Newton's approach or DDP do not have similar guarantees, especially far from an optimum. Our empirical evaluations on the pendulum and cart-pole swing-up tasks serve to corroborate the SQP-based analysis proposed in this paper.
Cooling Under Convexity: An Inventory Control Perspective on Industrial Refrigeration
Industrial refrigeration systems have substantial energy needs, but optimizing their operation remains challenging due to the tension between minimizing energy costs and meeting strict cooling requirements. Load shifting--strategic overcooling in anticipation of future demands--offers substantial efficiency gains. This work seeks to rigorously quantify these potential savings through the derivation of optimal load shifting policies. Our first contribution establishes a novel connection between industrial refrigeration and inventory control problems with convex ordering costs, where the convexity arises from the relationship between energy consumption and cooling capacity. Leveraging this formulation, we derive three main theoretical results: (1) an optimal algorithm for deterministic demand scenarios, along with proof that optimal trajectories are non-increasing (a valuable structural insight for practical control); (2) performance bounds that quantify the value of load shifting as a function of cost convexity, demand variability, and temporal patterns; (3) a computationally tractable load shifting heuristic with provable near-optimal performance under uncertainty. Numerical simulations validate our theoretical findings, and a case study using real industrial refrigeration data demonstrates an opportunity for improved load shifting.
Scalable Ground Station Selection for Large LEO Constellations
Effective ground station selection is critical for low Earth orbiting (LEO) satellite constellations to minimize operational costs, maximize data downlink volume, and reduce communication gaps between access windows. Traditional ground station selection typically begins by choosing from a fixed set of locations offered by Ground Station-as-a-Service (GSaaS) providers, which helps reduce the problem scope to optimizing locations over existing infrastructure. However, finding a globally optimal solution for stations using existing mixed-integer programming methods quickly becomes intractable at scale, especially when considering multiple providers and large satellite constellations. To address this issue, we introduce a scalable, hierarchical framework that decomposes the global selection problem into single-satellite, short time-window subproblems. Optimal station choices from each subproblem are clustered to identify consistently high-value locations across all decomposed cases. Cluster-level sets are then matched back to the closest GSaaS candidate sites to produce a globally feasible solution. This approach enables scalable coordination while maintaining near-optimal performance. We evaluate our method's performance on synthetic Walker-Star test cases (1-10 satellites, 1-10 stations), achieving solutions within 95% of the global IP optimum for all test cases. Real-world evaluations on Capella Space (5 satellites), ICEYE (40), and Planet's Flock (96) show that while exact IP solutions fail to scale, our framework continues to deliver high-quality site selections.
comment: 14 pages, 7 tables, 10 figures, submitted to IEEE Aeroconf 2026
Efficient Input-Constrained Impulsive Optimal Control of Linear Systems with Application to Spacecraft Relative Motion
This work presents a novel algorithm for impulsive optimal control of linear time-varying systems with the inclusion of input magnitude constraints. Impulsive optimal control problems, where the optimal input solution is a sum of delta functions, are typically formulated as an optimization over a normed function space subject to integral equality constraints and can be efficiently solved for linear time-varying systems in their dual formulation. In this dual setting, the problem takes the form of a semi-infinite program which is readily solvable in online scenarios for constructing maneuver plans. This work augments the approach with the inclusion of magnitude constraints on the input over time windows of interest, which is shown to preserve the impulsive nature of the optimal solution and enable efficient solution procedures via semi-infinite programming. The resulting algorithm is demonstrated on the highly relevant problem of relative motion control of spacecraft in Low Earth Orbit (LEO) and compared to several other proposed solutions from the literature.
Viability-Preserving Passive Torque Control
Conventional passivity-based torque controllers for manipulators are typically unconstrained, which can lead to safety violations under external perturbations. In this paper, we employ viability theory to pre-compute safe sets in the state-space of joint positions and velocities. These viable sets, constructed via data-driven and analytical methods for self-collision avoidance, external object collision avoidance and joint-position and joint-velocity limits, provide constraints on joint accelerations and thus joint torques via the robot dynamics. A quadratic programming-based control framework enforces these constraints on a passive controller tracking a dynamical system, ensuring the robot states remain within the safe set in an infinite time horizon. We validate the proposed approach through simulations and hardware experiments on a 7-DoF Franka Emika manipulator. In comparison to a baseline constrained passive controller, our method operates at higher control-loop rates and yields smoother trajectories.
comment: 8 pages, 7 figures, Project Website: https://vpp-tc.github.io/webpage/
Distributed Feedback-Feedforward Algorithms for Time-Varying Resource Allocation
This paper studies distributed Time-Varying Resource Allocation (TVRA) where the local cost functions, global equality constraints, and Local Feasibility Constraints (LFCs) vary with time. Algorithms that mimic the structure of feedback-feedforward control systems are proposed. Feedback and feedforward laws are generated using local estimates from a distributed estimator, while a distributed controller enforces the stationarity condition within a fixed time and updates the candidate solution accordingly. To handle the LFCs, feedback laws based on projection and feedforward laws that switch between different modes are introduced as an initialization-free alternative to the barrier-based methods used in most related works. Our projection-based method guarantees that, for any infeasible initial value, the state trajectory enters the locally feasible set within a fixed time and remains within it thereafter, and that the set is forward invariant if the initial value is locally feasible. Convergence analyses are conducted under mild assumptions. For cases without LFCs, the proposed algorithm converges to the optimal trajectory within a fixed time. For cases with LFCs, the proposed algorithm is globally asymptotically stable at the optimal trajectory while exhibiting fixed-time convergence between consecutive switching instants. Numerical examples and a power system application verify their effectiveness.
Scalable analysis of stop-and-go waves: Representation, measurements and insights
Analyzing stop-and-go waves at the scale of miles and hours of data is an emerging challenge in traffic research. The past 5 years have seen an explosion in the availability of large-scale traffic data containing traffic waves and complex congestion patterns, making existing approaches unsuitable for repeatable and scalable analysis of traffic waves in these data. This paper makes a first step towards addressing this challenge by introducing an automatic and scalable stop-and-go wave identification method capable of capturing wave generation, propagation, dissipation, as well as bifurcation and merging, which have previously been observed only very rarely. Using a concise and simple critical-speed based definition of a stop-and-go wave, the proposed method identifies all wave boundaries that encompass spatio-temporal points where vehicle speed is below a chosen critical speed. The method is built upon a graph representation of the spatio-temporal points associated with stop-and-go waves, specifically wave front (start) points and wave tail (end) points, and approaches the solution as a graph component identification problem. It enables the measurement of wave properties at scale. The method is implemented in Python and demonstrated on a large-scale dataset, I-24 MOTION INCEPTION. Our results show insights on the complexity of traffic waves. Traffic waves can bifurcate and merge at a scale that has never been observed or described before. The clustering analysis of all the identified wave components reveals the different topological structures of traffic waves. We explored that the wave merge or bifurcation points can be explained by spatial features. The gallery of all the identified wave topologies is demonstrated at https://trafficwaves.github.io/.
On the Eigenvalue Tracking of Large-Scale Systems
The paper focuses on the problem of tracking eigenvalue trajectories in large-scale power system models as system parameters vary. A continuation-based formulation is presented for tracing any single eigenvalue of interest, which supports sparse matrix representations and accommodates both explicit and semi-implicit differential-algebraic models. Key implementation aspects, such as numerical integration, matrix updates, derivative approximations, and handling defective eigenvalues, are discussed in detail and practical recommendations are duly provided. The tracking approach is demonstrated through a comprehensive case study on the IEEE 39-bus system, as well as on a realistic dynamic model of the Irish transmission system.
Online Feedback Optimization for Monotone Systems without Timescale Separation
Online Feedback Optimization (OFO) steers a dynamical plant to a cost-efficient steady-state, only relying on input-output sensitivity information, rather than on a full plant model. Unlike traditional feedforward approaches, OFO leverages real-time measurements from the plant, thereby inheriting the robustness and adaptability of feedback control. Unfortunately, existing theoretical guarantees for OFO assume that the controller operates on a slower timescale than the plant, which can affect responsiveness and transient performance. In this paper, we focus on relaxing this ``timescale separation'' assumption. Specifically, we consider the class of monotone systems, and we prove that OFO can achieve an optimal operating point, regardless of the time constants of controller and plant. By leveraging a small gain theorem for monotone systems, we derive several sufficient conditions for global convergence. Notably, these conditions depend only on the steady-state behavior of the plant, and they are entirely independent of the transient dynamics.
Neuromorphic Deployment of Spiking Neural Networks for Cognitive Load Classification in Air Traffic Control
This paper presents a neuromorphic system for cognitive load classification in a real-world setting, an Air Traffic Control (ATC) task, using a hardware implementation of Spiking Neural Networks (SNNs). Electroencephalogram (EEG) and eye-tracking features, extracted from an open-source dataset, were used to train and evaluate both conventional machine learning models and SNNs. Among the SNN architectures explored, a minimalistic, single-layer model trained with a biologically inspired delta-rule learning algorithm achieved competitive performance (80.6%). To enable deployment on neuromorphic hardware, the model was quantized and implemented on the mixed-signal DYNAP-SE chip. Despite hardware constraints and analog variability, the chip-deployed SNN maintained a classification accuracy of up to 73.5% using spike-based input. These results demonstrate the feasibility of event-driven neuromorphic systems for ultra-low-power, embedded cognitive state monitoring in dynamic real-world scenarios.
comment: Preprint version. Accepted at ACM/IEEE ICONS 2025 (to appear in Proceedings)
Improved Monte Carlo Planning via Causal Disentanglement for Structurally-Decomposed Markov Decision Processes
Markov Decision Processes (MDPs), as a general-purpose framework, often overlook the benefits of incorporating the causal structure of the transition and reward dynamics. For a subclass of resource allocation problems, we introduce the Structurally Decomposed MDP (SD-MDP), which leverages causal disentanglement to partition an MDP's temporal causal graph into independent components. By exploiting this disentanglement, SD-MDP enables dimensionality reduction and computational efficiency gains in optimal value function estimation. We reduce the sequential optimization problem to a fractional knapsack problem with log-linear complexity $O(T \log T)$, outperforming traditional stochastic programming methods that exhibit polynomial complexity with respect to the time horizon $T$. Additionally, SD-MDP's computational advantages are independent of state-action space size, making it viable for high-dimensional spaces. Furthermore, our approach integrates seamlessly with Monte Carlo Tree Search (MCTS), achieving higher expected rewards under constrained simulation budgets while providing a vanishing simple regret bound. Empirical results demonstrate superior policy performance over benchmarks across various logistics and finance domains.
comment: Conference Paper. 7th International Conference on Distributed Artificial Intelligence (DAI)
Fine Tuning a Simulation-Driven Estimator
Many industries now deploy high-fidelity simulators (digital twins) to represent physical systems, yet their parameters must be calibrated to match the true system. This motivated the construction of simulation-driven parameter estimators, built by generating synthetic observations for sampled parameter values and learning a supervised mapping from observations to parameters. However, when the true parameters lie outside the sampled range, predictions suffer from an out-of-distribution (OOD) error. This paper introduces a fine-tuning approach for the Two-Stage estimator that mitigates OOD effects and improves accuracy. The effectiveness of the proposed method is verified through numerical simulations.
comment: 6 pages, 4 figures
Linear-quadratic mean-field-type difference games with coupled affine inequality constraints
In this letter, we study a class of linear-quadratic mean-field-type difference games with coupled affine inequality constraints. We show that the mean-field-type equilibrium can be characterized by the existence of a multiplier process which satisfies some implicit complementarity conditions. Further, we show that the equilibrium strategies can be computed by reformulating these conditions as a single large-scale linear complementarity problem. We illustrate our results with an energy storage problem arising in the management of microgrids.
Stochastic Security Constrained AC Optimal Power Flow Using General Polynomial Chaos Expansion
Addressing the uncertainty introduced by increasing renewable integration is crucial for secure power system operation, yet capturing it while preserving the full nonlinear physics of the grid remains a significant challenge. This paper presents a stochastic security constrained optimal power flow model with chance constraints supporting nonlinear AC power flow equations and non Gaussian uncertainties. We use general polynomial chaos expansion to model arbitrary uncertainties of finite variance, enabling accurate moment computations and robust prediction of system states across diverse operating scenarios. The chance constraints probabilistically limit inequality violations, providing a more flexible representation of controllable variables and the consequent power system operation. Case studies validate the proposed models effectiveness in satisfying operational constraints and capturing uncertainty with high fidelity. Compared to the deterministic formulation, it also uncovers a wider set of unsecure contingencies, highlighting improved uncertainty capture and operational insight.
comment: 2025 IEEE International Conference on Energy Technologies for Future Grids (6 pages)
Distributed Koopman Learning with Incomplete Measurements
Koopman operator theory has emerged as a powerful tool for system identification, particularly for approximating nonlinear time-invariant systems (NTIS). This paper considers a network of agents with limited observation capabilities that collaboratively estimate the dynamics of an NTIS. A distributed deep Koopman learning algorithm is developed by integrating Koopman operator theory, deep neural networks, and consensus-based coordination. In the proposed framework, each agent approximates the system dynamics using its partial measurements and lifted states exchanged with its neighbors. This cooperative scheme enables accurate reconstruction of the global dynamics despite the absence of full-state information at individual agents. Simulation results on the Lunar Lander environment from OpenAI Gym demonstrate that the proposed method achieves performance comparable to the centralized deep Koopman learning with full-state access.
Distributed Koopman Learning using Partial Trajectories for Control
This paper proposes a distributed data-driven framework for dynamics learning, termed distributed deep Koopman learning using partial trajectories (DDKL-PT). In this framework, each agent in a multi-agent system is assigned a partial trajectory offline and locally approximates the unknown dynamics using a deep neural network within the Koopman operator framework. By exchanging local estimated dynamics rather than training data, agents achieve consensus on a global dynamics model without sharing their private training trajectories. Simulation studies on a surface vehicle demonstrate that DDKL-PT attains consensus with respect to the learned dynamics, with each agent achieving reasonably small approximation errors over the testing data. Furthermore, a model predictive control scheme is developed by integrating the learned Koopman dynamics with known kinematic relations. Results on goal-tracking and station-keeping tasks support that the distributedly learned dynamics are sufficiently accurate for model-based optimal control.
Local Stability and Region of Attraction Analysis for Neural Network Feedback Systems under Positivity Constraints
We study the local stability of nonlinear systems in the Lur'e form with static nonlinear feedback realized by feedforward neural networks (FFNNs). By leveraging positivity system constraints, we employ a localized variant of the Aizerman conjecture, which provides sufficient conditions for exponential stability of trajectories confined to a compact set. Using this foundation, we develop two distinct methods for estimating the Region of Attraction (ROA): (i) a less conservative Lyapunov-based approach that constructs invariant sublevel sets of a quadratic function satisfying a linear matrix inequality (LMI), and (ii) a novel technique for computing tight local sector bounds for FFNNs via layer-wise propagation of linear relaxations. These bounds are integrated into the localized Aizerman framework to certify local exponential stability. Numerical results demonstrate substantial improvements over existing integral quadratic constraint-based approaches in both ROA size and scalability.
comment: Accepted at 64th IEEE Conference on Decision and Control (CDC) 2025 - Rio de Janeiro, Brazil
Robust Stability Analysis of Positive Lure System with Neural Network Feedback
This paper investigates the robustness of the Lur'e problem under positivity constraints, drawing on results from the positive Aizerman conjecture and robustness properties of Metzler matrices. Specifically, we consider a control system of Lur'e type in which not only the linear part includes parametric uncertainty but also the nonlinear sector bound is unknown. We investigate tools from positive linear systems to effectively solve the problems in complicated and uncertain nonlinear systems. By leveraging the positivity characteristic of the system, we derive an explicit formula for the stability radius of Lur'e systems. Furthermore, we extend our analysis to systems with neural network (NN) feedback loops. Building on this approach, we also propose a refinement method for sector bounds of NNs. This study introduces a scalable and efficient approach for robustness analysis of both Lur'e and NN-controlled systems. Finally, the proposed results are supported by illustrative examples.
comment: Accepted at the 9th IEEE Conference on Control Technology and Applications (CCTA) 2025, San Diego, California
Systems and Control (EESS)
A Dimension-Decomposed Learning Framework for Online Disturbance Identification in Quadrotor SE(3) Control
Quadrotor stability under complex dynamic disturbances and model uncertainties poses significant challenges. One of them remains the underfitting problem in high-dimensional features, which limits the identification capability of current learning-based methods. To address this, we introduce a new perspective: Dimension-Decomposed Learning (DiD-L), from which we develop the Sliced Adaptive-Neuro Mapping (SANM) approach for geometric control. Specifically, the high-dimensional mapping for identification is axially ``sliced" into multiple low-dimensional submappings (``slices"). In this way, the complex high-dimensional problem is decomposed into a set of simple low-dimensional tasks addressed by shallow neural networks and adaptive laws. These neural networks and adaptive laws are updated online via Lyapunov-based adaptation without any pre-training or persistent excitation (PE) condition. To enhance the interpretability of the proposed approach, we prove that the full-state closed-loop system exhibits arbitrarily close to exponential stability despite multi-dimensional time-varying disturbances and model uncertainties. This result is novel as it demonstrates exponential convergence without requiring pre-training for unknown disturbances and specific knowledge of the model.
Eigenvalue Tracking of Large-Scale Systems Impacted by Time Delays
The paper focuses on tracking eigenvalue trajectories in power system models with time delays. We formulate a continuation-based approach that employs numerical integration to follow eigenvalues as system parameters vary, in the presence of one or multiple delayed variables. The formulation preserves the sparsity of the delay differential-algebraic equation (DDAE) system model and allows treating the delay magnitude itself as a varying parameter with implementation aspects discussed in detail. Accuracy is demonstrated on a modified IEEE 39-bus system with distributed energy resources. Scalability is discussed using a realistic dynamic model of the Irish transmission network.
Economic zone data-enabled predictive control for connected open water systems
Real-time regulation of water distribution in connected open water systems is critical for ensuring system safety and meeting operational requirements. In this work, we consider a connected open water system that includes linkage hydraulic structures such as weirs, pumps and sluice gates. We propose a mixed-integer economic zone data-enabled predictive control (DeePC) approach, which is used to maintain the water levels of the branches within desired zones to avoid floods and reduce the energy consumption of the pumps in the considered water system. The proposed DeePC-based approach predicts the future dynamics of the system water levels, and generates optimal control actions based on system input and output data, thereby eliminating the need for both first-principles modeling and explicit data-driven modeling. To achieve multiple control objectives in order of priority, we utilize lexicographic optimization and adapt traditional DeePC cost function for zone tracking and energy consumption minimization. Additionally, Bayesian optimization is utilized to determine the control target zone, which effectively balances zone tracking and energy consumption in the presence of external disturbances. Comprehensive simulations and comparative analyses demonstrate the effectiveness of the proposed method. The proposed method maintains water levels within the desired zone for 97.04% of the operating time, with an average energy consumption of 33.5 kWh per 0.5 h. Compared to baseline methods, the proposed approach reduces the zone-tracking mean square error by 98.82% relative to economic zone DeePC without Bayesian optimization, and lowers energy consumption by 44.08% relative to economic set-point tracking DeePC. As compared to passive pump/gate control, the proposed method lowers the frequency of zone violations by 86.94% and the average energy consumption by 4.69%.
Real-Time Peer-to-Peer Energy Trading for Multi-Microgrids: Improved Double Auction Mechanism and Prediction-Free Online Trading Approach
Peer-to-peer energy trading offers a promising solution for enhancing renewable energy utilization and economic benefits within interconnected microgrids. However, existing real-time P2P markets face two key challenges: high computational complexity in trading mechanisms, and suboptimal participant decision-making under diverse uncertainties. Existing prediction-based decision-making methods rely heavily on accurate forecasts, which are typically unavailable for microgrids, while prediction-free methods suffer from myopic behaviors. To address these challenges, this paper proposes an improved double auction mechanism combined with an adaptive step-size search algorithm to reduce computational burden, and a data-driven dual-reference online optimization (DDOO) framework to enhance participant decision-making. The improved mechanism simplifies bidding procedures, significantly reducing computational burden and ensuring rapid convergence to the market equilibrium. Additionally, the prediction-free DDOO framework mitigates myopic decision-making by introducing two informative reference signals. Case studies on a 20-microgrid system demonstrate the effectiveness and scalability of the proposed mechanism and approach. The improved mechanism significantly decreases the computational time while increasing local energy self-sufficiency periods from 0.01% to 29.86%, reducing reverse power flow periods from 24.51% to 3.96%, and lowering average operating costs by 19.20%. Compared with conventional approaches such as Lyapunov optimization and model predictive control, the DDOO framework achieves a 10%-13% reduction in operating costs with an optimality gap of only 5.76%.
Corrosion Risk Estimation for Heritage Preservation: An Internet of Things and Machine Learning Approach Using Temperature and Humidity
Proactive preservation of steel structures at culturally significant heritage sites like the San Sebastian Basilica in the Philippines requires accurate corrosion forecasting. This study developed an Internet of Things hardware system connected with LoRa wireless communications to monitor heritage buildings with steel structures. From a three year dataset generated by the IoT system, we built a machine learning framework for predicting atmospheric corrosion rates using only temperature and relative humidity data. Deployed via a Streamlit dashboard with ngrok tunneling for public access, the framework provides real-time corrosion monitoring and actionable preservation recommendations. This minimal-data approach is scalable and cost effective for heritage sites with limited monitoring resources, showing that advanced regression can extract accurate corrosion predictions from basic meteorological data enabling proactive preservation of culturally significant structures worldwide without requiring extensive sensor networks
comment: 17 pages
Single-Rod Brachiation Robot: Mechatronic Control Design and Validation of Prejump Phases
A complete mechatronic design of a minimal configuration brachiation robot is presented. The robot consists of a single rigid rod with gripper mechanisms attached to both ends. The grippers are used to hang the robot on a horizontal bar on which it swings or rotates. The motion is imposed by repositioning the robot's center of mass, which is performed using a crank-slide mechanism. Based on a non-linear model, an optimal control strategy is proposed, for repositioning the center of mass in a bang-bang manner. Consequently, utilizing the concept of input-output linearization, a continuous control strategy is proposed that takes into account the limited torque of the crank-slide mechanism and its geometry. An increased attention is paid to energy accumulation towards the subsequent jump stage of the brachiation. These two strategies are validated and compared in simulations. The continuous control strategy is then also implemented within a low-cost STM32-based control system, and both the swing and rotation stages of the brachiation motion are experimentally validated.
comment: 11 pages, 13 figures, 1 table, Accepted 27 July 2025, Available online 16 Sept 2025, Version of Record 28 Sept 2025
Incomplete Air Mixing Reduces the Efficiency of Commercial Buildings Behaving as Virtual Batteries
Commercial building Heating, Ventilation, and Air Conditioning (HVAC) systems can provide flexibility to the electricity grid. Some researchers have found it convenient to model HVAC systems as virtual batteries. These models also better align with models used by grid planners and operators. However, experiments have shown that HVAC load shifting can be inefficient, and virtual battery models do not capture this inefficiency well. While the models typically use the average room temperature as the system's ``state of charge," they do not capture other factors that affect HVAC power/energy such as airflow and mixing. Here, we develop a new analytical building model to explore how incomplete mixing of supply air into a conditioned space leads to inefficiency in a virtual battery capturing the dynamics of HVAC fan power load shifting. The model qualitatively matches experimental results better than previous models, and shows that, as mixing becomes worse, the virtual battery becomes less efficient. Unfortunately, air mixing is unmeasured/unmeasurable. However, we show that, by closing the loop around measurements of fan power, we can improve the virtual battery's performance without the need for air mixing measurements. For example, in one case, we show a roundtrip efficiency improvement from 0.75 to 0.99.
Global Convergence of Policy Gradient for Entropy Regularized Linear-Quadratic Control with multiplicative noise
Reinforcement Learning (RL) has emerged as a powerful framework for sequential decision-making in dynamic environments, particularly when system parameters are unknown. This paper investigates RL-based control for entropy-regularized Linear Quadratic control (LQC) problems with multiplicative noises over an infinite time horizon. First, we adapt the Regularized Policy Gradient (RPG) algorithm to stochastic optimal control settings, proving that despite the non-convexity of the problem, RPG converges globally under conditions of gradient domination and near-smoothness. Second, based on zero-order optimization approach, we introduce a novel model free RL algorithm: Sample-Based Regularized Policy Gradient (SB-RPG). SB-RPG operates without knowledge of system parameters yet still retains strong theoretical guarantees of global convergence. Our model leverages entropy regularization to accelerate convergence and address the exploration versus exploitation trade-off inherent in RL. Numerical simulations validate the theoretical results and demonstrate the efficacy of SB-RPG in unknown-parameters environments.
comment: 33 pages, 4 figures
Delay-Tolerant Augmented-Consensus-based Distributed Directed Optimization
Distributed optimization finds applications in large-scale machine learning, data processing and classification over multi-agent networks. In real-world scenarios, the communication network of agents may encounter latency that may affect the convergence of the optimization protocol. This paper addresses the case where the information exchange among the agents (computing nodes) over data-transmission channels (links) might be subject to communication time-delays, which is not well addressed in the existing literature. Our proposed algorithm improves the state-of-the-art by handling heterogeneous and arbitrary but bounded and fixed (time-invariant) delays over general strongly-connected directed networks. Arguments from matrix theory, algebraic graph theory, and augmented consensus formulation are applied to prove the convergence to the optimal value. Simulations are provided to verify the results and compare the performance with some existing delay-free algorithms.
comment: Systems & Control Letters
Life Estimation of HVDC Cable Insulation under Load Cycles: from Macroscopic to Microscopic Charge Conduction Modelling
This paper goes one step forward in the life estimation of HVDC cable insulation under load cycles by introducing for the first time a microscopic model of charge conduction and transport i.e., Bipolar Charge Transport BCT model for electric field calculation inside the insulation thickness. The paper firstly includes the development and the validation of BCT model with that found in literature. Then, the parameters of the developed BCT model are optimized using Pulsed Electro-Acoustic PEA space charge measurements. Followed by the integration of the developed, validated and optimized model into the electric field calculation for life estimation of a 500 kV DC-XLPE insulated cable subjected to Type Test load cycles according to Cigre Techical Brochure 852. The developed microscopic model is compared to the macroscopic models already found in the literature. The microscopic model shows a comparable electric field inversion similarly to macroscopic models. However, the behavior of the microscopic model is noticed to be different under heating and cooling load cycles. In hot cable, the maximum electric field stabilizes at different amplitude and position inside the insulation thickness in both models. This investigation has been carried out in the framework of the HEU-NEWGEN research project.
Assist-as-needed Control for FES in Foot Drop Management
Foot drop is commonly managed using Functional Electrical Stimulation (FES), typically delivered via open-loop controllers with fixed stimulation intensities. While users may manually adjust the intensity through external controls, this approach risks overstimulation, leading to muscle fatigue and discomfort, or understimulation, which compromises dorsiflexion and increases fall risk. In this study, we propose a novel closed-loop FES controller that dynamically adjusts the stimulation intensity based on real-time toe clearance, providing "assistance as needed". We evaluate this system by inducing foot drop in healthy participants and comparing the effects of the closed-loop controller with a traditional open-loop controller across various walking conditions, including different speeds and surface inclinations. Kinematic data reveal that our closed-loop controller maintains adequate toe clearance without significantly affecting the joint angles of the hips, the knees, and the ankles, and while using significantly lower stimulation intensities compared to the open-loop controller. These findings suggest that the proposed method not only matches the effectiveness of existing systems but also offers the potential for reduced muscle fatigue and improved long-term user comfort and adherence.
Periodic Event-Triggered Prescribed Time Control of Euler-Lagrange Systems under State and Input Constraints
This article proposes a periodic event-triggered adaptive barrier control policy for the trajectory tracking problem of perturbed Euler-Lagrangian systems with state, input, and temporal (SIT) constraints. In particular, an approximation-free adaptive-barrier control architecture is designed to ensure prescribed-time convergence of the tracking error to a prescribed bound while rejecting exogenous disturbances. In contrast to existing approaches that necessitate continuous real-time control action, the proposed controller generates event-based updates through periodic evaluation of the triggering condition. Additionally, we derive an upper bound on the monitoring period by analysing the performance degradation of the filtered tracking error to facilitate periodic evaluation of the event-triggered strategy. To this end, a time-varying threshold function is considered in the triggering mechanism to reduce the number of triggers during the transient phase of system behaviour. Notably, the proposed design avoids Zeno behaviour and precludes the need for continuous monitoring of the triggering condition. A simulation and experimental study is undertaken to demonstrate the efficacy of the proposed control scheme.
A Control-Barrier-Function-Based Algorithm for Policy Adaptation in Reinforcement Learning
This paper considers the problem of adapting a predesigned policy, represented by a parameterized function class, from a solution that minimizes a given original cost function to a trade-off solution between minimizing the original objective and an additional cost function. The problem is formulated as a constrained optimization problem, where deviations from the optimal value of the original cost are explicitly constrained. To solve it, we develop a closed-loop system that governs the evolution of the policy parameters, with a closed-loop controller designed to adjust the additional cost gradient to ensure the satisfaction of the constraint. The resulting closed-loop system, termed control-barrier-function-based policy adaptation, exploits the set-invariance property of control barrier functions to guarantee constraint satisfaction. The effectiveness of the proposed method is demonstrated through numerical experiments on the Cartpole and Lunar Lander benchmarks from OpenAI Gym, as well as a quadruped robot, thereby illustrating both its practicality and potential for real-world policy adaptation.
Guaranteed Time Control using Linear Matrix Inequalities
This paper presents a synthesis approach aiming to guarantee a minimum upper-bound for the time taken to reach a target set of non-zero measure that encompasses the origin, while taking into account uncertainties and input and state constraints. This approach is based on a harmonic transformation of the Lyapunov function and a novel piecewise quadratic representation of this transformed Lyapunov function over a simplicial partition of the state space. The problem is solved in a policy iteration fashion, whereas the evaluation and improvement steps are formulated as linear matrix inequalities employing the structural relaxation approach. Though initially formulated for uncertain polytopic systems, extensions to piecewise and nonlinear systems are discussed. Three examples illustrate the effectiveness of the proposed approach in different scenarios.
comment: Preprint - Initial submission submitted to Automatica
Generalization of Graph Neural Network Models for Distribution Grid Fault Detection
Fault detection in power distribution grids is critical for ensuring system reliability and preventing costly outages. Moreover, fault detection methodologies should remain robust to evolving grid topologies caused by factors such as reconfigurations, equipment failures, and Distributed Energy Resource (DER) integration. Current data-driven state-of-the-art methods use Recurrent Neural Networks (RNNs) for temporal modeling and Graph Neural Networks (GNNs) for spatial learning, in an RNN+GNN pipeline setting (RGNN in short). Specifically, for power system fault diagnosis, Graph Convolutional Networks (GCNs) have been adopted. Yet, various more advanced GNN architectures have been proposed and adopted in domains outside of power systems. In this paper, we set out to systematically and consistently benchmark various GNN architectures in an RNN+GNN pipeline model. Specifically, to the best of our knowledge, we are the first to (i) propose to use GraphSAGE and Graph Attention (GAT, GATv2) in an RGNN for fault diagnosis, and (ii) provide a comprehensive benchmark against earlier proposed RGNN solutions (RGCN) as well as pure RNN models (especially Gated Recurrent Unit (GRU)), particularly (iii) exploring their generalization potential for deployment in different settings than those used for training them. Our experimental results on the IEEE 123-node distribution network show that RGATv2 has superior generalization capabilities, maintaining high performance with an F1-score reduction of $\sim$12% across different topology settings. In contrast, pure RNN models largely fail, experiencing an F1-score reduction of up to $\sim$60%, while other RGNN variants also exhibit significant performance degradation, i.e., up to $\sim$25% lower F1-scores.
comment: This paper has been submitted and accepted for IEEE SmartGridComm 2025
Long-Term Mapping of the Douro River Plume with Multi-Agent Reinforcement Learning
We study the problem of long-term (multiple days) mapping of a river plume using multiple autonomous underwater vehicles (AUVs), focusing on the Douro river representative use-case. We propose an energy - and communication - efficient multi-agent reinforcement learning approach in which a central coordinator intermittently communicates with the AUVs, collecting measurements and issuing commands. Our approach integrates spatiotemporal Gaussian process regression (GPR) with a multi-head Q-network controller that regulates direction and speed for each AUV. Simulations using the Delft3D ocean model demonstrate that our method consistently outperforms both single- and multi-agent benchmarks, with scaling the number of agents both improving mean squared error (MSE) and operational endurance. In some instances, our algorithm demonstrates that doubling the number of AUVs can more than double endurance while maintaining or improving accuracy, underscoring the benefits of multi-agent coordination. Our learned policies generalize across unseen seasonal regimes over different months and years, demonstrating promise for future developments of data-driven long-term monitoring of dynamic plume environments.
Certifiable Safe RLHF: Fixed-Penalty Constraint Optimization for Safer Language Models
Ensuring safety is a foundational requirement for large language models (LLMs). Achieving an appropriate balance between enhancing the utility of model outputs and mitigating their potential for harm is a complex and persistent challenge. Contemporary approaches frequently formalize this problem within the framework of Constrained Markov Decision Processes (CMDPs) and employ established CMDP optimization techniques. However, these methods exhibit two notable limitations. First, their reliance on reward and cost functions renders performance highly sensitive to the underlying scoring mechanism, which must capture semantic meaning rather than being triggered by superficial keywords. Second, CMDP-based training entails tuning dual-variable, a process that is both computationally expensive and does not provide any provable safety guarantee for a fixed dual variable that can be exploitable through adversarial jailbreaks. To overcome these limitations, we introduce Certifiable Safe-RLHF (CS-RLHF) that introduces a cost model trained on a large-scale corpus to assign semantically grounded safety scores. In contrast to the lagrangian-based approach, CS-RLHF adopts a rectified penalty-based formulation. This design draws on the theory of exact penalty functions in constrained optimization, wherein constraint satisfaction is enforced directly through a suitably chosen penalty term. With an appropriately scaled penalty, feasibility of the safety constraints can be guaranteed at the optimizer, eliminating the need for dual-variable updates. Empirical evaluation demonstrates that CS-RLHF outperforms state-of-the-art LLM model responses rendering at-least 5 times efficient against nominal and jail-breaking prompts
Machine Learning-Driven Prediction of Lithium-Ion Battery Power Capability for eVTOL Aircraft
Electric vertical take-off and landing (eVTOL) aircraft have emerged as a promising solution to transform urban transportation. They present a few technical challenges for battery management, a prominent one of which is the prediction of the power capability of their lithium-ion battery systems. The challenge originates from the high C-rate discharging conditions required during eVTOL flights as well as the complexity of lithium-ion batteries' electro-thermal dynamics. This paper, for the first time, formulates a power limit prediction problem for eVTOL which explicitly considers long prediction horizons and the possible occurrence of emergency landings. We then harness machine learning to solve this problem in two intertwined ways. First, we adopt a dynamic model that integrates physics with machine learning to predict a lithium-ion battery's voltage and temperature behaviors with high accuracy. Second, while performing search for the maximum power, we leverage machine learning to predict the remaining discharge time and use the prediction to accelerate the search with fast computation. Our validation results show the effectiveness of the proposed study for eVTOL operations.
comment: 2025 American Control Conference (ACC)
CANOPI: Contingency-Aware Nodal Optimal Power Investments with High Temporal Resolution
We present CANOPI, a novel algorithmic framework, for solving the Contingency-Aware Nodal Power Investments problem, a large-scale nonlinear optimization problem that jointly optimizes generation, storage, and transmission expansion. The underlying problem is nonlinear due to the impact of transmission upgrades on impedances, and the problem's large scale arises from the confluence of spatial and temporal resolutions. We propose algorithmic approaches to address these computational challenges. We pose a linear approximation of the overall nonlinear model, and develop a fixed-point algorithm to adjust for the nonlinear impedance feedback effect. We solve the large-scale linear expansion model with a specialized level-bundle method leveraging a novel interleaved approach to contingency constraint generation. We introduce a minimal cycle basis algorithm that improves the numerical sparsity of cycle-based DC power flow formulations, accelerating solve times for the operational subproblems. CANOPI is demonstrated on a 1493-bus Western Interconnection test system built from realistic-geography network data, with hourly operations spanning 52 week-long scenarios and a total possible set of 20 billion individual transmission contingency constraints. Numerical results quantify the reliability and economic benefits of fully incorporating transmission contingencies in integrated planning models and highlight the computational advantages of the proposed methods.
comment: This work has been submitted to the IEEE for possible publication
Robust Permissive Controller Synthesis for Interval MDPs
We address the problem of robust permissive controller synthesis for robots operating under uncertain dynamics, modeled as Interval Markov Decision Processes (IMDPs). IMDPs generalize standard MDPs by allowing transition probabilities to vary within intervals, capturing epistemic uncertainty from sensing noise, actuation imprecision, and coarse system abstractions-common in robotics. Traditional controller synthesis typically yields a single deterministic strategy, limiting adaptability. In contrast, permissive controllers (multi-strategies) allow multiple actions per state, enabling runtime flexibility and resilience. However, prior work on permissive controller synthesis generally assumes exact transition probabilities, which is unrealistic in many robotic applications. We present the first framework for robust permissive controller synthesis on IMDPs, guaranteeing that all strategies compliant with the synthesized multi-strategy satisfy reachability or reward-based specifications under all admissible transitions. We formulate the problem as mixed-integer linear programs (MILPs) and propose two encodings: a baseline vertex-enumeration method and a scalable duality-based method that avoids explicit enumeration. Experiments on four benchmark domains show that both methods synthesize robust, maximally permissive controllers and scale to large IMDPs with up to hundreds of thousands of states.
A Sequential Quadratic Programming Perspective on Optimal Control
This paper offers a unified perspective on different approaches to the solution of optimal control problems through the lens of constrained sequential quadratic programming. In particular, it allows us to find the relationships between Newton's method, the iterative LQR (iLQR), and Differential Dynamic Programming (DDP) approaches to solve the problem. It is shown that the iLQR is a principled SQP approach, rather than simply an approximation of DDP by neglecting the Hessian terms, to solve optimal control problems that can be guaranteed to always produce a cost-descent direction and converge to an optimum; while Newton's approach or DDP do not have similar guarantees, especially far from an optimum. Our empirical evaluations on the pendulum and cart-pole swing-up tasks serve to corroborate the SQP-based analysis proposed in this paper.
Cooling Under Convexity: An Inventory Control Perspective on Industrial Refrigeration
Industrial refrigeration systems have substantial energy needs, but optimizing their operation remains challenging due to the tension between minimizing energy costs and meeting strict cooling requirements. Load shifting--strategic overcooling in anticipation of future demands--offers substantial efficiency gains. This work seeks to rigorously quantify these potential savings through the derivation of optimal load shifting policies. Our first contribution establishes a novel connection between industrial refrigeration and inventory control problems with convex ordering costs, where the convexity arises from the relationship between energy consumption and cooling capacity. Leveraging this formulation, we derive three main theoretical results: (1) an optimal algorithm for deterministic demand scenarios, along with proof that optimal trajectories are non-increasing (a valuable structural insight for practical control); (2) performance bounds that quantify the value of load shifting as a function of cost convexity, demand variability, and temporal patterns; (3) a computationally tractable load shifting heuristic with provable near-optimal performance under uncertainty. Numerical simulations validate our theoretical findings, and a case study using real industrial refrigeration data demonstrates an opportunity for improved load shifting.
Scalable Ground Station Selection for Large LEO Constellations
Effective ground station selection is critical for low Earth orbiting (LEO) satellite constellations to minimize operational costs, maximize data downlink volume, and reduce communication gaps between access windows. Traditional ground station selection typically begins by choosing from a fixed set of locations offered by Ground Station-as-a-Service (GSaaS) providers, which helps reduce the problem scope to optimizing locations over existing infrastructure. However, finding a globally optimal solution for stations using existing mixed-integer programming methods quickly becomes intractable at scale, especially when considering multiple providers and large satellite constellations. To address this issue, we introduce a scalable, hierarchical framework that decomposes the global selection problem into single-satellite, short time-window subproblems. Optimal station choices from each subproblem are clustered to identify consistently high-value locations across all decomposed cases. Cluster-level sets are then matched back to the closest GSaaS candidate sites to produce a globally feasible solution. This approach enables scalable coordination while maintaining near-optimal performance. We evaluate our method's performance on synthetic Walker-Star test cases (1-10 satellites, 1-10 stations), achieving solutions within 95% of the global IP optimum for all test cases. Real-world evaluations on Capella Space (5 satellites), ICEYE (40), and Planet's Flock (96) show that while exact IP solutions fail to scale, our framework continues to deliver high-quality site selections.
comment: 14 pages, 7 tables, 10 figures, submitted to IEEE Aeroconf 2026
Efficient Input-Constrained Impulsive Optimal Control of Linear Systems with Application to Spacecraft Relative Motion
This work presents a novel algorithm for impulsive optimal control of linear time-varying systems with the inclusion of input magnitude constraints. Impulsive optimal control problems, where the optimal input solution is a sum of delta functions, are typically formulated as an optimization over a normed function space subject to integral equality constraints and can be efficiently solved for linear time-varying systems in their dual formulation. In this dual setting, the problem takes the form of a semi-infinite program which is readily solvable in online scenarios for constructing maneuver plans. This work augments the approach with the inclusion of magnitude constraints on the input over time windows of interest, which is shown to preserve the impulsive nature of the optimal solution and enable efficient solution procedures via semi-infinite programming. The resulting algorithm is demonstrated on the highly relevant problem of relative motion control of spacecraft in Low Earth Orbit (LEO) and compared to several other proposed solutions from the literature.
Viability-Preserving Passive Torque Control
Conventional passivity-based torque controllers for manipulators are typically unconstrained, which can lead to safety violations under external perturbations. In this paper, we employ viability theory to pre-compute safe sets in the state-space of joint positions and velocities. These viable sets, constructed via data-driven and analytical methods for self-collision avoidance, external object collision avoidance and joint-position and joint-velocity limits, provide constraints on joint accelerations and thus joint torques via the robot dynamics. A quadratic programming-based control framework enforces these constraints on a passive controller tracking a dynamical system, ensuring the robot states remain within the safe set in an infinite time horizon. We validate the proposed approach through simulations and hardware experiments on a 7-DoF Franka Emika manipulator. In comparison to a baseline constrained passive controller, our method operates at higher control-loop rates and yields smoother trajectories.
comment: 8 pages, 7 figures, Project Website: https://vpp-tc.github.io/webpage/
Distributed Feedback-Feedforward Algorithms for Time-Varying Resource Allocation
This paper studies distributed Time-Varying Resource Allocation (TVRA) where the local cost functions, global equality constraints, and Local Feasibility Constraints (LFCs) vary with time. Algorithms that mimic the structure of feedback-feedforward control systems are proposed. Feedback and feedforward laws are generated using local estimates from a distributed estimator, while a distributed controller enforces the stationarity condition within a fixed time and updates the candidate solution accordingly. To handle the LFCs, feedback laws based on projection and feedforward laws that switch between different modes are introduced as an initialization-free alternative to the barrier-based methods used in most related works. Our projection-based method guarantees that, for any infeasible initial value, the state trajectory enters the locally feasible set within a fixed time and remains within it thereafter, and that the set is forward invariant if the initial value is locally feasible. Convergence analyses are conducted under mild assumptions. For cases without LFCs, the proposed algorithm converges to the optimal trajectory within a fixed time. For cases with LFCs, the proposed algorithm is globally asymptotically stable at the optimal trajectory while exhibiting fixed-time convergence between consecutive switching instants. Numerical examples and a power system application verify their effectiveness.
Scalable analysis of stop-and-go waves: Representation, measurements and insights
Analyzing stop-and-go waves at the scale of miles and hours of data is an emerging challenge in traffic research. The past 5 years have seen an explosion in the availability of large-scale traffic data containing traffic waves and complex congestion patterns, making existing approaches unsuitable for repeatable and scalable analysis of traffic waves in these data. This paper makes a first step towards addressing this challenge by introducing an automatic and scalable stop-and-go wave identification method capable of capturing wave generation, propagation, dissipation, as well as bifurcation and merging, which have previously been observed only very rarely. Using a concise and simple critical-speed based definition of a stop-and-go wave, the proposed method identifies all wave boundaries that encompass spatio-temporal points where vehicle speed is below a chosen critical speed. The method is built upon a graph representation of the spatio-temporal points associated with stop-and-go waves, specifically wave front (start) points and wave tail (end) points, and approaches the solution as a graph component identification problem. It enables the measurement of wave properties at scale. The method is implemented in Python and demonstrated on a large-scale dataset, I-24 MOTION INCEPTION. Our results show insights on the complexity of traffic waves. Traffic waves can bifurcate and merge at a scale that has never been observed or described before. The clustering analysis of all the identified wave components reveals the different topological structures of traffic waves. We explored that the wave merge or bifurcation points can be explained by spatial features. The gallery of all the identified wave topologies is demonstrated at https://trafficwaves.github.io/.
On the Eigenvalue Tracking of Large-Scale Systems
The paper focuses on the problem of tracking eigenvalue trajectories in large-scale power system models as system parameters vary. A continuation-based formulation is presented for tracing any single eigenvalue of interest, which supports sparse matrix representations and accommodates both explicit and semi-implicit differential-algebraic models. Key implementation aspects, such as numerical integration, matrix updates, derivative approximations, and handling defective eigenvalues, are discussed in detail and practical recommendations are duly provided. The tracking approach is demonstrated through a comprehensive case study on the IEEE 39-bus system, as well as on a realistic dynamic model of the Irish transmission system.
Online Feedback Optimization for Monotone Systems without Timescale Separation
Online Feedback Optimization (OFO) steers a dynamical plant to a cost-efficient steady-state, only relying on input-output sensitivity information, rather than on a full plant model. Unlike traditional feedforward approaches, OFO leverages real-time measurements from the plant, thereby inheriting the robustness and adaptability of feedback control. Unfortunately, existing theoretical guarantees for OFO assume that the controller operates on a slower timescale than the plant, which can affect responsiveness and transient performance. In this paper, we focus on relaxing this ``timescale separation'' assumption. Specifically, we consider the class of monotone systems, and we prove that OFO can achieve an optimal operating point, regardless of the time constants of controller and plant. By leveraging a small gain theorem for monotone systems, we derive several sufficient conditions for global convergence. Notably, these conditions depend only on the steady-state behavior of the plant, and they are entirely independent of the transient dynamics.
Neuromorphic Deployment of Spiking Neural Networks for Cognitive Load Classification in Air Traffic Control
This paper presents a neuromorphic system for cognitive load classification in a real-world setting, an Air Traffic Control (ATC) task, using a hardware implementation of Spiking Neural Networks (SNNs). Electroencephalogram (EEG) and eye-tracking features, extracted from an open-source dataset, were used to train and evaluate both conventional machine learning models and SNNs. Among the SNN architectures explored, a minimalistic, single-layer model trained with a biologically inspired delta-rule learning algorithm achieved competitive performance (80.6%). To enable deployment on neuromorphic hardware, the model was quantized and implemented on the mixed-signal DYNAP-SE chip. Despite hardware constraints and analog variability, the chip-deployed SNN maintained a classification accuracy of up to 73.5% using spike-based input. These results demonstrate the feasibility of event-driven neuromorphic systems for ultra-low-power, embedded cognitive state monitoring in dynamic real-world scenarios.
comment: Preprint version. Accepted at ACM/IEEE ICONS 2025 (to appear in Proceedings)
Improved Monte Carlo Planning via Causal Disentanglement for Structurally-Decomposed Markov Decision Processes
Markov Decision Processes (MDPs), as a general-purpose framework, often overlook the benefits of incorporating the causal structure of the transition and reward dynamics. For a subclass of resource allocation problems, we introduce the Structurally Decomposed MDP (SD-MDP), which leverages causal disentanglement to partition an MDP's temporal causal graph into independent components. By exploiting this disentanglement, SD-MDP enables dimensionality reduction and computational efficiency gains in optimal value function estimation. We reduce the sequential optimization problem to a fractional knapsack problem with log-linear complexity $O(T \log T)$, outperforming traditional stochastic programming methods that exhibit polynomial complexity with respect to the time horizon $T$. Additionally, SD-MDP's computational advantages are independent of state-action space size, making it viable for high-dimensional spaces. Furthermore, our approach integrates seamlessly with Monte Carlo Tree Search (MCTS), achieving higher expected rewards under constrained simulation budgets while providing a vanishing simple regret bound. Empirical results demonstrate superior policy performance over benchmarks across various logistics and finance domains.
comment: Conference Paper. 7th International Conference on Distributed Artificial Intelligence (DAI)
Fine Tuning a Simulation-Driven Estimator
Many industries now deploy high-fidelity simulators (digital twins) to represent physical systems, yet their parameters must be calibrated to match the true system. This motivated the construction of simulation-driven parameter estimators, built by generating synthetic observations for sampled parameter values and learning a supervised mapping from observations to parameters. However, when the true parameters lie outside the sampled range, predictions suffer from an out-of-distribution (OOD) error. This paper introduces a fine-tuning approach for the Two-Stage estimator that mitigates OOD effects and improves accuracy. The effectiveness of the proposed method is verified through numerical simulations.
comment: 6 pages, 4 figures
Linear-quadratic mean-field-type difference games with coupled affine inequality constraints
In this letter, we study a class of linear-quadratic mean-field-type difference games with coupled affine inequality constraints. We show that the mean-field-type equilibrium can be characterized by the existence of a multiplier process which satisfies some implicit complementarity conditions. Further, we show that the equilibrium strategies can be computed by reformulating these conditions as a single large-scale linear complementarity problem. We illustrate our results with an energy storage problem arising in the management of microgrids.
Stochastic Security Constrained AC Optimal Power Flow Using General Polynomial Chaos Expansion
Addressing the uncertainty introduced by increasing renewable integration is crucial for secure power system operation, yet capturing it while preserving the full nonlinear physics of the grid remains a significant challenge. This paper presents a stochastic security constrained optimal power flow model with chance constraints supporting nonlinear AC power flow equations and non Gaussian uncertainties. We use general polynomial chaos expansion to model arbitrary uncertainties of finite variance, enabling accurate moment computations and robust prediction of system states across diverse operating scenarios. The chance constraints probabilistically limit inequality violations, providing a more flexible representation of controllable variables and the consequent power system operation. Case studies validate the proposed models effectiveness in satisfying operational constraints and capturing uncertainty with high fidelity. Compared to the deterministic formulation, it also uncovers a wider set of unsecure contingencies, highlighting improved uncertainty capture and operational insight.
comment: 2025 IEEE International Conference on Energy Technologies for Future Grids (6 pages)
Distributed Koopman Learning with Incomplete Measurements
Koopman operator theory has emerged as a powerful tool for system identification, particularly for approximating nonlinear time-invariant systems (NTIS). This paper considers a network of agents with limited observation capabilities that collaboratively estimate the dynamics of an NTIS. A distributed deep Koopman learning algorithm is developed by integrating Koopman operator theory, deep neural networks, and consensus-based coordination. In the proposed framework, each agent approximates the system dynamics using its partial measurements and lifted states exchanged with its neighbors. This cooperative scheme enables accurate reconstruction of the global dynamics despite the absence of full-state information at individual agents. Simulation results on the Lunar Lander environment from OpenAI Gym demonstrate that the proposed method achieves performance comparable to the centralized deep Koopman learning with full-state access.
Distributed Koopman Learning using Partial Trajectories for Control
This paper proposes a distributed data-driven framework for dynamics learning, termed distributed deep Koopman learning using partial trajectories (DDKL-PT). In this framework, each agent in a multi-agent system is assigned a partial trajectory offline and locally approximates the unknown dynamics using a deep neural network within the Koopman operator framework. By exchanging local estimated dynamics rather than training data, agents achieve consensus on a global dynamics model without sharing their private training trajectories. Simulation studies on a surface vehicle demonstrate that DDKL-PT attains consensus with respect to the learned dynamics, with each agent achieving reasonably small approximation errors over the testing data. Furthermore, a model predictive control scheme is developed by integrating the learned Koopman dynamics with known kinematic relations. Results on goal-tracking and station-keeping tasks support that the distributedly learned dynamics are sufficiently accurate for model-based optimal control.
Local Stability and Region of Attraction Analysis for Neural Network Feedback Systems under Positivity Constraints
We study the local stability of nonlinear systems in the Lur'e form with static nonlinear feedback realized by feedforward neural networks (FFNNs). By leveraging positivity system constraints, we employ a localized variant of the Aizerman conjecture, which provides sufficient conditions for exponential stability of trajectories confined to a compact set. Using this foundation, we develop two distinct methods for estimating the Region of Attraction (ROA): (i) a less conservative Lyapunov-based approach that constructs invariant sublevel sets of a quadratic function satisfying a linear matrix inequality (LMI), and (ii) a novel technique for computing tight local sector bounds for FFNNs via layer-wise propagation of linear relaxations. These bounds are integrated into the localized Aizerman framework to certify local exponential stability. Numerical results demonstrate substantial improvements over existing integral quadratic constraint-based approaches in both ROA size and scalability.
comment: Accepted at 64th IEEE Conference on Decision and Control (CDC) 2025 - Rio de Janeiro, Brazil
Robust Stability Analysis of Positive Lure System with Neural Network Feedback
This paper investigates the robustness of the Lur'e problem under positivity constraints, drawing on results from the positive Aizerman conjecture and robustness properties of Metzler matrices. Specifically, we consider a control system of Lur'e type in which not only the linear part includes parametric uncertainty but also the nonlinear sector bound is unknown. We investigate tools from positive linear systems to effectively solve the problems in complicated and uncertain nonlinear systems. By leveraging the positivity characteristic of the system, we derive an explicit formula for the stability radius of Lur'e systems. Furthermore, we extend our analysis to systems with neural network (NN) feedback loops. Building on this approach, we also propose a refinement method for sector bounds of NNs. This study introduces a scalable and efficient approach for robustness analysis of both Lur'e and NN-controlled systems. Finally, the proposed results are supported by illustrative examples.
comment: Accepted at the 9th IEEE Conference on Control Technology and Applications (CCTA) 2025, San Diego, California
Robotics
ARMADA: Autonomous Online Failure Detection and Human Shared Control Empower Scalable Real-world Deployment and Adaptation
Imitation learning has shown promise in learning from large-scale real-world datasets. However, pretrained policies usually perform poorly without sufficient in-domain data. Besides, human-collected demonstrations entail substantial labour and tend to encompass mixed-quality data and redundant information. As a workaround, human-in-the-loop systems gather domain-specific data for policy post-training, and exploit closed-loop policy feedback to offer informative guidance, but usually require full-time human surveillance during policy rollout. In this work, we devise ARMADA, a multi-robot deployment and adaptation system with human-in-the-loop shared control, featuring an autonomous online failure detection method named FLOAT. Thanks to FLOAT, ARMADA enables paralleled policy rollout and requests human intervention only when necessary, significantly reducing reliance on human supervision. Hence, ARMADA enables efficient acquisition of in-domain data, and leads to more scalable deployment and faster adaptation to new scenarios. We evaluate the performance of ARMADA on four real-world tasks. FLOAT achieves nearly 95% accuracy on average, surpassing prior state-of-the-art failure detection approaches by over 20%. Besides, ARMADA manifests more than 4$\times$ increase in success rate and greater than 2$\times$ reduction in human intervention rate over multiple rounds of policy rollout and post-training, compared to previous human-in-the-loop learning methods.
Do You Know Where Your Camera Is? View-Invariant Policy Learning with Camera Conditioning
We study view-invariant imitation learning by explicitly conditioning policies on camera extrinsics. Using Plucker embeddings of per-pixel rays, we show that conditioning on extrinsics significantly improves generalization across viewpoints for standard behavior cloning policies, including ACT, Diffusion Policy, and SmolVLA. To evaluate policy robustness under realistic viewpoint shifts, we introduce six manipulation tasks in RoboSuite and ManiSkill that pair "fixed" and "randomized" scene variants, decoupling background cues from camera pose. Our analysis reveals that policies without extrinsics often infer camera pose using visual cues from static backgrounds in fixed scenes; this shortcut collapses when workspace geometry or camera placement shifts. Conditioning on extrinsics restores performance and yields robust RGB-only control without depth. We release the tasks, demonstrations, and code at https://ripl.github.io/know_your_camera/ .
comment: Code and project materials are available at ripl.github.io/know_your_camera
Retargeting Matters: General Motion Retargeting for Humanoid Motion Tracking
Humanoid motion tracking policies are central to building teleoperation pipelines and hierarchical controllers, yet they face a fundamental challenge: the embodiment gap between humans and humanoid robots. Current approaches address this gap by retargeting human motion data to humanoid embodiments and then training reinforcement learning (RL) policies to imitate these reference trajectories. However, artifacts introduced during retargeting, such as foot sliding, self-penetration, and physically infeasible motion are often left in the reference trajectories for the RL policy to correct. While prior work has demonstrated motion tracking abilities, they often require extensive reward engineering and domain randomization to succeed. In this paper, we systematically evaluate how retargeting quality affects policy performance when excessive reward tuning is suppressed. To address issues that we identify with existing retargeting methods, we propose a new retargeting method, General Motion Retargeting (GMR). We evaluate GMR alongside two open-source retargeters, PHC and ProtoMotions, as well as with a high-quality closed-source dataset from Unitree. Using BeyondMimic for policy training, we isolate retargeting effects without reward tuning. Our experiments on a diverse subset of the LAFAN1 dataset reveal that while most motions can be tracked, artifacts in retargeted data significantly reduce policy robustness, particularly for dynamic or long sequences. GMR consistently outperforms existing open-source methods in both tracking performance and faithfulness to the source motion, achieving perceptual fidelity and policy success rates close to the closed-source baseline. Website: https://jaraujo98.github.io/retargeting_matters. Code: https://github.com/YanjieZe/GMR.
Performance-Guided Refinement for Visual Aerial Navigation using Editable Gaussian Splatting in FalconGym 2.0
Visual policy design is crucial for aerial navigation. However, state-of-the-art visual policies often overfit to a single track and their performance degrades when track geometry changes. We develop FalconGym 2.0, a photorealistic simulation framework built on Gaussian Splatting (GSplat) with an Edit API that programmatically generates diverse static and dynamic tracks in milliseconds. Leveraging FalconGym 2.0's editability, we propose a Performance-Guided Refinement (PGR) algorithm, which concentrates visual policy's training on challenging tracks while iteratively improving its performance. Across two case studies (fixed-wing UAVs and quadrotors) with distinct dynamics and environments, we show that a single visual policy trained with PGR in FalconGym 2.0 outperforms state-of-the-art baselines in generalization and robustness: it generalizes to three unseen tracks with 100% success without per-track retraining and maintains higher success rates under gate-pose perturbations. Finally, we demonstrate that the visual policy trained with PGR in FalconGym 2.0 can be zero-shot sim-to-real transferred to a quadrotor hardware, achieving a 98.6% success rate (69 / 70 gates) over 30 trials spanning two three-gate tracks and a moving-gate track.
DisCo-Layout: Disentangling and Coordinating Semantic and Physical Refinement in a Multi-Agent Framework for 3D Indoor Layout Synthesis
3D indoor layout synthesis is crucial for creating virtual environments. Traditional methods struggle with generalization due to fixed datasets. While recent LLM and VLM-based approaches offer improved semantic richness, they often lack robust and flexible refinement, resulting in suboptimal layouts. We develop DisCo-Layout, a novel framework that disentangles and coordinates physical and semantic refinement. For independent refinement, our Semantic Refinement Tool (SRT) corrects abstract object relationships, while the Physical Refinement Tool (PRT) resolves concrete spatial issues via a grid-matching algorithm. For collaborative refinement, a multi-agent framework intelligently orchestrates these tools, featuring a planner for placement rules, a designer for initial layouts, and an evaluator for assessment. Experiments demonstrate DisCo-Layout's state-of-the-art performance, generating realistic, coherent, and generalizable 3D indoor layouts. Our code will be publicly available.
Product Digital Twin Supporting End-of-life Phase of Electric Vehicle Batteries Utilizing Product-Process-Resource Asset Network
In the context of the circular economy, products in their end-of-life phase should be either remanufactured or recycled. Both of these processes are crucial for sustainability and environmental conservation. However, manufacturers often do not support these processes enough by not sharing relevant data. This paper proposes use of a digital twin technology, which is capable to help optimizing the disassembly processes to reduce ecological impact and enhance sustainability. The proposed approach is demonstrated through a disassembly use-case of the product digital twin of an electric vehicle battery. By utilizing product digital twins, challenges associated with the disassembly of electric vehicle batteries can be solved flexibly and efficiently for various battery types. As a backbone for the product digital twin representation, the paper uses the paradigm of product-process-resource asset networks (PAN). Such networks enable to model relevant relationships across products, production resources, manufacturing processes, and specific production operations that have to be done in the manufacturing phase of a product. This paper introduces a Bi-Flow Product-Process-Resource Asset Network (Bi-PAN) representation, which extends the PAN paradigm to cover not only the manufacturing, but also the remanufacturing/recycling phase.
comment: This work has been submitted to the IEEE for possible publication. 6 pages, 4 figures
SCANS: A Soft Gripper with Curvature and Spectroscopy Sensors for In-Hand Material Differentiation
We introduce the soft curvature and spectroscopy (SCANS) system: a versatile, electronics-free, fluidically actuated soft manipulator capable of assessing the spectral properties of objects either in hand or through pre-touch caging. This platform offers a wider spectral sensing capability than previous soft robotic counterparts. We perform a material analysis to explore optimal soft substrates for spectral sensing, and evaluate both pre-touch and in-hand performance. Experiments demonstrate explainable, statistical separation across diverse object classes and sizes (metal, wood, plastic, organic, paper, foam), with large spectral angle differences between items. Through linear discriminant analysis, we show that sensitivity in the near-infrared wavelengths is critical to distinguishing visually similar objects. These capabilities advance the potential of optics as a multi-functional sensory modality for soft robots. The complete parts list, assembly guidelines, and processing code for the SCANS gripper are accessible at: https://parses-lab.github.io/scans/.
comment: Accepted to IEEE Robotics & Automation Letters Special Issue on Interdisciplinarity and Widening Horizons in Soft Robotics
Stand Up, NAO! Increasing the Reliability of Stand-Up Motions Through Error Compensation in Position Control
Stand-up motions are an indispensable part of humanoid robot soccer. A robot incapable of standing up by itself is removed from the game for some time. In this paper, we present our stand-up motions for the NAO robot. Our approach dates back to 2019 and has been evaluated and slightly expanded over the past six years. We claim that the main reason for failed stand-up attempts are large errors in the executed joint positions. By addressing such problems by either executing special motions to free up stuck limbs such as the arms, or by compensating large errors with other joints, we significantly increased the overall success rate of our stand-up routine. The motions presented in this paper are also used by several other teams in the Standard Platform League, which thereby achieve similar success rates, as shown in an analysis of videos from multiple tournaments.
LangGrasp: Leveraging Fine-Tuned LLMs for Language Interactive Robot Grasping with Ambiguous Instructions
The existing language-driven grasping methods struggle to fully handle ambiguous instructions containing implicit intents. To tackle this challenge, we propose LangGrasp, a novel language-interactive robotic grasping framework. The framework integrates fine-tuned large language models (LLMs) to leverage their robust commonsense understanding and environmental perception capabilities, thereby deducing implicit intents from linguistic instructions and clarifying task requirements along with target manipulation objects. Furthermore, our designed point cloud localization module, guided by 2D part segmentation, enables partial point cloud localization in scenes, thereby extending grasping operations from coarse-grained object-level to fine-grained part-level manipulation. Experimental results show that the LangGrasp framework accurately resolves implicit intents in ambiguous instructions, identifying critical operations and target information that are unstated yet essential for task completion. Additionally, it dynamically selects optimal grasping poses by integrating environmental information. This enables high-precision grasping from object-level to part-level manipulation, significantly enhancing the adaptability and task execution efficiency of robots in unstructured environments. More information and code are available here: https://github.com/wu467/LangGrasp.
comment: 8 pages, 6 figures
Cooperative Guidance for Aerial Defense in Multiagent Systems
This paper addresses a critical aerial defense challenge in contested airspace, involving three autonomous aerial vehicles -- a hostile drone (the pursuer), a high-value drone (the evader), and a protective drone (the defender). We present a cooperative guidance framework for the evader-defender team that guarantees interception of the pursuer before it can capture the evader, even under highly dynamic and uncertain engagement conditions. Unlike traditional heuristic, optimal control, or differential game-based methods, we approach the problem within a time-constrained guidance framework, leveraging true proportional navigation based approach that ensures robust and guaranteed solutions to the aerial defense problem. The proposed strategy is computationally lightweight, scalable to a large number of agent configurations, and does not require knowledge of the pursuer's strategy or control laws. From arbitrary initial geometries, our method guarantees that key engagement errors are driven to zero within a fixed time, leading to a successful mission. Extensive simulations across diverse and adversarial scenarios confirm the effectiveness of the proposed strategy and its relevance for real-time autonomous defense in contested airspace environments.
EC3R-SLAM: Efficient and Consistent Monocular Dense SLAM with Feed-Forward 3D Reconstruction
The application of monocular dense Simultaneous Localization and Mapping (SLAM) is often hindered by high latency, large GPU memory consumption, and reliance on camera calibration. To relax this constraint, we propose EC3R-SLAM, a novel calibration-free monocular dense SLAM framework that jointly achieves high localization and mapping accuracy, low latency, and low GPU memory consumption. This enables the framework to achieve efficiency through the coupling of a tracking module, which maintains a sparse map of feature points, and a mapping module based on a feed-forward 3D reconstruction model that simultaneously estimates camera intrinsics. In addition, both local and global loop closures are incorporated to ensure mid-term and long-term data association, enforcing multi-view consistency and thereby enhancing the overall accuracy and robustness of the system. Experiments across multiple benchmarks show that EC3R-SLAM achieves competitive performance compared to state-of-the-art methods, while being faster and more memory-efficient. Moreover, it runs effectively even on resource-constrained platforms such as laptops and Jetson Orin NX, highlighting its potential for real-world robotics applications.
Reducing Discomfort in Driving Simulators: Motion Cueing for Motion Sickness Mitigation
Driving simulators are increasingly used in research and development. However, simulators often cause motion sickness due to downscaled motion and unscaled veridical visuals. In this paper, a motion cueing algorithm is proposed that reduces motion sickness as predicted by the subjective vertical conflict (SVC) model using model predictive control (MPC). Both sensory conflict and specific force errors are penalised in the cost function, allowing the algorithm to jointly optimise fidelity and comfort. Human-in-the-loop experiments were conducted to compare four simulator motion settings: two variations of our MPC-based algorithm, one focused on pure specific force tracking and the second compromising specific force tracking and motion sickness minimisation, as well as reference adaptive washout and no motion cases. The experiments were performed on a hexapod driving simulator with participants exposed to passive driving. Experimental motion sickness results closely matched the sickness model predictions. As predicted by the model, the no motion condition yielded the lowest sickness levels. However, it was rated lowest in terms of fidelity. The compromise solution reduced sickness by over 50% (average MISC level 3 to 1.5) compared to adaptive washout and the algorithm focusing on specific force tracking, without any significant reduction in fidelity rating. The proposed approach for developing MCA that takes into account both the simulator dynamics and time evolution of motion sickness offers a significant advancement in achieving an optimal control of motion sickness and specific force recreation in driving simulators, supporting broader simulator use.
SPARC: Spine with Prismatic and Revolute Compliance for Quadruped Robot
We present SPARC, a compact, open-source 3-DoF sagittal-plane spine module that combines revolute (pitch) and prismatic (axial) motion with programmable task-space impedance for quadruped robots. The system integrates three torque-controlled actuators, a custom 1 kHz control board, and a protected power unit in a 1.26 kg package, enabling closed-loop stiffness and damping shaping along x, z, and theta. We develop an RNEA-based computed-acceleration controller with smooth Stribeck friction compensation to render spring-damper behavior without explicit inertia shaping. Bench experiments validate the approach. Quasi-static push-pull tests show linear force-displacement characteristics with commanded horizontal stiffness spanning 300-700 N/m and <= 1.5% relative error (R^2 >= 0.992, narrow 95% CIs). Dynamic displace-and-release trials confirm mass-spring-damper responses over multiple damping settings, with small, interpretable phase deviations due to configuration-dependent inertia and low-speed friction effects. A task-space PD controller produces roughly linear stiffness but with greater variability and coupling sensitivity. SPARC provides a portable platform for systematic studies of spine compliance in legged locomotion and will be released with complete hardware and firmware resources.
TACOS: Task Agnostic COordinator of a multi-drone System
When a single pilot is responsible for managing a multi-drone system, the task demands varying levels of autonomy, from direct control of individual UAVs, to group-level coordination, to fully autonomous swarm behaviors for accomplishing high-level tasks. Enabling such flexible interaction requires a framework that supports multiple modes of shared autonomy. As language models continue to improve in reasoning and planning, they provide a natural foundation for such systems, reducing pilot workload by enabling high-level task delegation through intuitive, language-based interfaces. In this paper we present TACOS (Task-Agnostic COordinator of a multi-drone System), a unified framework that enables high-level natural language control of multi-UAV systems through Large Language Models (LLMs). TACOS integrates three key capabilities into a single architecture: a one-to-many natural language interface for intuitive user interaction, an intelligent coordinator for translating user intent into structured task plans, and an autonomous agent that executes plans interacting with the real-world. TACOS allows a LLM to interact with a library of executable APIs, bridging semantic reasoning with real-time multi-robot coordination. We demonstrate the system in real-world multi-drone system and conduct an ablation study to assess the contribution of each module.
comment: 6 pages, 6 figures, accepted as poster at 2025 IEEE International Symposium on Multi-Robot & Multi-Agent Systems
GreenhouseSplat: A Dataset of Photorealistic Greenhouse Simulations for Mobile Robotics
Simulating greenhouse environments is critical for developing and evaluating robotic systems for agriculture, yet existing approaches rely on simplistic or synthetic assets that limit simulation-to-real transfer. Recent advances in radiance field methods, such as Gaussian splatting, enable photorealistic reconstruction but have so far been restricted to individual plants or controlled laboratory conditions. In this work, we introduce GreenhouseSplat, a framework and dataset for generating photorealistic greenhouse assets directly from inexpensive RGB images. The resulting assets are integrated into a ROS-based simulation with support for camera and LiDAR rendering, enabling tasks such as localization with fiducial markers. We provide a dataset of 82 cucumber plants across multiple row configurations and demonstrate its utility for robotics evaluation. GreenhouseSplat represents the first step toward greenhouse-scale radiance-field simulation and offers a foundation for future research in agricultural robotics.
Like Playing a Video Game: Spatial-Temporal Optimization of Foot Trajectories for Controlled Football Kicking in Bipedal Robots
Humanoid robot soccer presents several challenges, particularly in maintaining system stability during aggressive kicking motions while achieving precise ball trajectory control. Current solutions, whether traditional position-based control methods or reinforcement learning (RL) approaches, exhibit significant limitations. Model predictive control (MPC) is a prevalent approach for ordinary quadruped and biped robots. While MPC has demonstrated advantages in legged robots, existing studies often oversimplify the leg swing progress, relying merely on simple trajectory interpolation methods. This severely constrains the foot's environmental interaction capability, hindering tasks such as ball kicking. This study innovatively adapts the spatial-temporal trajectory planning method, which has been successful in drone applications, to bipedal robotic systems. The proposed approach autonomously generates foot trajectories that satisfy constraints on target kicking position, velocity, and acceleration while simultaneously optimizing swing phase duration. Experimental results demonstrate that the optimized trajectories closely mimic human kicking behavior, featuring a backswing motion. Simulation and hardware experiments confirm the algorithm's efficiency, with trajectory planning times under 1 ms, and its reliability, achieving nearly 100 % task completion accuracy when the soccer goal is within the range of -90{\deg} to 90{\deg}.
comment: 8 pages, 8 figures, conference paper
What Matters in RL-Based Methods for Object-Goal Navigation? An Empirical Study and A Unified Framework
Object-Goal Navigation (ObjectNav) is a critical component toward deploying mobile robots in everyday, uncontrolled environments such as homes, schools, and workplaces. In this context, a robot must locate target objects in previously unseen environments using only its onboard perception. Success requires the integration of semantic understanding, spatial reasoning, and long-horizon planning, which is a combination that remains extremely challenging. While reinforcement learning (RL) has become the dominant paradigm, progress has spanned a wide range of design choices, yet the field still lacks a unifying analysis to determine which components truly drive performance. In this work, we conduct a large-scale empirical study of modular RL-based ObjectNav systems, decomposing them into three key components: perception, policy, and test-time enhancement. Through extensive controlled experiments, we isolate the contribution of each and uncover clear trends: perception quality and test-time strategies are decisive drivers of performance, whereas policy improvements with current methods yield only marginal gains. Building on these insights, we propose practical design guidelines and demonstrate an enhanced modular system that surpasses State-of-the-Art (SotA) methods by 6.6% on SPL and by a 2.7% success rate. We also introduce a human baseline under identical conditions, where experts achieve an average 98% success, underscoring the gap between RL agents and human-level navigation. Our study not only sets the SotA performance but also provides principled guidance for future ObjectNav development and evaluation.
Nav-EE: Navigation-Guided Early Exiting for Efficient Vision-Language Models in Autonomous Driving
Vision-Language Models (VLMs) are increasingly applied in autonomous driving for unified perception and reasoning, but high inference latency hinders real-time deployment. Early-exit reduces latency by terminating inference at intermediate layers, yet its task-dependent nature limits generalization across diverse scenarios. We observe that this limitation aligns with autonomous driving: navigation systems can anticipate upcoming contexts (e.g., intersections, traffic lights), indicating which tasks will be required. We propose Nav-EE, a navigation-guided early-exit framework that precomputes task-specific exit layers offline and dynamically applies them online based on navigation priors. Experiments on CODA, Waymo, and BOSCH show that Nav-EE achieves accuracy comparable to full inference while reducing latency by up to 63.9%. Real-vehicle integration with Autoware Universe further demonstrates reduced inference latency (600ms to 300ms), supporting faster decision-making in complex scenarios. These results suggest that coupling navigation foresight with early-exit offers a viable path toward efficient deployment of large models in autonomous systems. Code and data are available at our anonymous repository: https://anonymous.4open.science/r/Nav-EE-BBC4
An Anytime, Scalable and Complete Algorithm for Embedding a Manufacturing Procedure in a Smart Factory
Modern automated factories increasingly run manufacturing procedures using a matrix of programmable machines, such as 3D printers, interconnected by a programmable transport system, such as a fleet of tabletop robots. To embed a manufacturing procedure into a smart factory, an operator must: (a) assign each of its processes to a machine and (b) specify how agents should transport parts between machines. The problem of embedding a manufacturing process into a smart factory is termed the Smart Factory Embedding (SFE) problem. State-of-the-art SFE solvers can only scale to factories containing a couple dozen machines. Modern smart factories, however, may contain hundreds of machines. We fill this hole by introducing the first highly scalable solution to the SFE, TS-ACES, the Traffic System based Anytime Cyclic Embedding Solver. We show that TS-ACES is complete and can scale to SFE instances based on real industrial scenarios with more than a hundred machines.
Dual-Mode Magnetic Continuum Robot for Targeted Drug Delivery ICRA 2026
Magnetic continuum robots (MCRs) enable minimally invasive navigation through tortuous anatomical channels, yet axially magnetized designs have largely been limited to bending-only motion. To expand deformation capabilities, this paper presents a simple assembly that embeds permanent magnets radially within the catheter wall, allowing a single externally steered permanent magnet to independently induce either bending or torsion. A physics-based formulation together with finite-element analysis establishes the actuation principles, and benchtop experiments validate decoupled mode control under practical fields. Building on this, a dual-layer blockage mechanism consisting of outer grooves and inner plates leverages torsional shear to achieve on-demand drug release. Finally, an in-phantom intervention experiment demonstrates end-to-end operation: lumen following by bending for target approach, followed by twist-activated release at the site. The resulting compact, cable-free platform combines versatile deformation with precise payload delivery, indicating strong potential for next-generation, site-specific therapies.
comment: 7 pages, 3 figures, under review of ICRA 2026
Contrastive Representation Regularization for Vision-Language-Action Models
Vision-Language-Action (VLA) models have shown its capabilities in robot manipulation by leveraging rich representations from pre-trained Vision-Language Models (VLMs). However, their representations arguably remain suboptimal, lacking sensitivity to robotic signals such as control actions and proprioceptive states. To address the issue, we introduce Robot State-aware Contrastive Loss (RS-CL), a simple and effective representation regularization for VLA models, designed to bridge the gap between VLM representations and robotic signals. In particular, RS-CL aligns the representations more closely with the robot's proprioceptive states, by using relative distances between the states as soft supervision. Complementing the original action prediction objective, RS-CL effectively enhances control-relevant representation learning, while being lightweight and fully compatible with standard VLA training pipeline. Our empirical results demonstrate that RS-CL substantially improves the manipulation performance of state-of-the-art VLA models; it pushes the prior art from 30.8% to 41.5% on pick-and-place tasks in RoboCasa-Kitchen, through more accurate positioning during grasping and placing, and boosts success rates from 45.0% to 58.3% on challenging real-robot manipulation tasks.
comment: 20 pages, 12 figures
PolySim: Bridging the Sim-to-Real Gap for Humanoid Control via Multi-Simulator Dynamics Randomization
Humanoid whole-body control (WBC) policies trained in simulation often suffer from the sim-to-real gap, which fundamentally arises from simulator inductive bias, the inherent assumptions and limitations of any single simulator. These biases lead to nontrivial discrepancies both across simulators and between simulation and the real world. To mitigate the effect of simulator inductive bias, the key idea is to train policies jointly across multiple simulators, encouraging the learned controller to capture dynamics that generalize beyond any single simulator's assumptions. We thus introduce PolySim, a WBC training platform that integrates multiple heterogeneous simulators. PolySim can launch parallel environments from different engines simultaneously within a single training run, thereby realizing dynamics-level domain randomization. Theoretically, we show that PolySim yields a tighter upper bound on simulator inductive bias than single-simulator training. In experiments, PolySim substantially reduces motion-tracking error in sim-to-sim evaluations; for example, on MuJoCo, it improves execution success by 52.8 over an IsaacSim baseline. PolySim further enables zero-shot deployment on a real Unitree G1 without additional fine-tuning, showing effective transfer from simulation to the real world. We will release the PolySim code upon acceptance of this work.
comment: 8 pages, 5 figures
Geometric Backstepping Control of Omnidirectional Tiltrotors Incorporating Servo-Rotor Dynamics for Robustness against Sudden Disturbances
This work presents a geometric backstepping controller for a variable-tilt omnidirectional multirotor that explicitly accounts for both servo and rotor dynamics. Considering actuator dynamics is essential for more effective and reliable operation, particularly during aggressive flight maneuvers or recovery from sudden disturbances. While prior studies have investigated actuator-aware control for conventional and fixed-tilt multirotors, these approaches rely on linear relationships between actuator input and wrench, which cannot capture the nonlinearities induced by variable tilt angles. In this work, we exploit the cascade structure between the rigid-body dynamics of the multirotor and its nonlinear actuator dynamics to design the proposed backstepping controller and establish exponential stability of the overall system. Furthermore, we reveal parametric uncertainty in the actuator model through experiments, and we demonstrate that the proposed controller remains robust against such uncertainty. The controller was compared against a baseline that does not account for actuator dynamics across three experimental scenarios: fast translational tracking, rapid rotational tracking, and recovery from sudden disturbance. The proposed method consistently achieved better tracking performance, and notably, while the baseline diverged and crashed during the fastest translational trajectory tracking and the recovery experiment, the proposed controller maintained stability and successfully completed the tasks, thereby demonstrating its effectiveness.
Non-Rigid Structure-from-Motion via Differential Geometry with Recoverable Conformal Scale
Non-rigid structure-from-motion (NRSfM), a promising technique for addressing the mapping challenges in monocular visual deformable simultaneous localization and mapping (SLAM), has attracted growing attention. We introduce a novel method, called Con-NRSfM, for NRSfM under conformal deformations, encompassing isometric deformations as a subset. Our approach performs point-wise reconstruction using 2D selected image warps optimized through a graph-based framework. Unlike existing methods that rely on strict assumptions, such as locally planar surfaces or locally linear deformations, and fail to recover the conformal scale, our method eliminates these constraints and accurately computes the local conformal scale. Additionally, our framework decouples constraints on depth and conformal scale, which are inseparable in other approaches, enabling more precise depth estimation. To address the sensitivity of the formulated problem, we employ a parallel separable iterative optimization strategy. Furthermore, a self-supervised learning framework, utilizing an encoder-decoder network, is incorporated to generate dense 3D point clouds with texture. Simulation and experimental results using both synthetic and real datasets demonstrate that our method surpasses existing approaches in terms of reconstruction accuracy and robustness. The code for the proposed method will be made publicly available on the project website: https://sites.google.com/view/con-nrsfm.
Symskill: Symbol and Skill Co-Invention for Data-Efficient and Real-Time Long-Horizon Manipulation
Multi-step manipulation in dynamic environments remains challenging. Two major families of methods fail in distinct ways: (i) imitation learning (IL) is reactive but lacks compositional generalization, as monolithic policies do not decide which skill to reuse when scenes change; (ii) classical task-and-motion planning (TAMP) offers compositionality but has prohibitive planning latency, preventing real-time failure recovery. We introduce SymSkill, a unified learning framework that combines the benefits of IL and TAMP, allowing compositional generalization and failure recovery in real-time. Offline, SymSkill jointly learns predicates, operators, and skills directly from unlabeled and unsegmented demonstrations. At execution time, upon specifying a conjunction of one or more learned predicates, SymSkill uses a symbolic planner to compose and reorder learned skills to achieve the symbolic goals, while performing recovery at both the motion and symbolic levels in real time. Coupled with a compliant controller, SymSkill enables safe and uninterrupted execution under human and environmental disturbances. In RoboCasa simulation, SymSkill can execute 12 single-step tasks with 85% success rate. Without additional data, it composes these skills into multi-step plans requiring up to 6 skill recompositions, recovering robustly from execution failures. On a real Franka robot, we demonstrate SymSkill, learning from 5 minutes of unsegmented and unlabeled play data, is capable of performing multiple tasks simply by goal specifications. The source code and additional analysis can be found on https://sites.google.com/view/symskill.
comment: CoRL 2025 Learning Effective Abstractions for Planning (LEAP) Workshop Best Paper Award (https://sites.google.com/view/symskill)
Statistical Uncertainty Learning for Robust Visual-Inertial State Estimation
A fundamental challenge in robust visual-inertial odometry (VIO) is to dynamically assess the reliability of sensor measurements. This assessment is crucial for properly weighting the contribution of each measurement to the state estimate. Conventional methods often simplify this by assuming a static, uniform uncertainty for all measurements. This heuristic, however, may be limited in its ability to capture the dynamic error characteristics inherent in real-world data. To improve this limitation, we present a statistical framework that learns measurement reliability assessment online, directly from sensor data and optimization results. Our approach leverages multi-view geometric consistency as a form of self-supervision. This enables the system to infer landmark uncertainty and adaptively weight visual measurements during optimization. We evaluated our method on the public EuRoC dataset, demonstrating improvements in tracking accuracy with average reductions of approximately 24\% in translation error and 42\% in rotation error compared to baseline methods with fixed uncertainty parameters. The resulting framework operates in real time while showing enhanced accuracy and robustness. To facilitate reproducibility and encourage further research, the source code will be made publicly available.
FailSafe: Reasoning and Recovery from Failures in Vision-Language-Action Models
Recent advances in robotic manipulation have integrated low-level robotic control into Vision-Language Models (VLMs), extending them into Vision-Language-Action (VLA) models. Although state-of-the-art VLAs achieve strong performance in downstream robotic applications, supported by large-scale crowd-sourced robot training data, they still inevitably encounter failures during execution. Enabling robots to reason about and recover from unpredictable and abrupt failures remains a critical challenge. Existing robotic manipulation datasets, collected in either simulation or the real world, primarily provide only ground-truth trajectories, leaving robots unable to recover once failures occur. Moreover, the few datasets that address failure detection typically offer only textual explanations, which are difficult to utilize directly in VLA models. To address this gap, we introduce FailSafe, a novel failure generation and recovery system that automatically produces diverse failure cases paired with executable recovery actions. FailSafe can be seamlessly applied to any manipulation task in any simulator, enabling scalable creation of failure-action data. To demonstrate its effectiveness, we fine-tune LLaVa-OneVision-7B (LLaVa-OV-7B) to build FailSafe-VLM. Experimental results show that FailSafe-VLM successfully helps robotic arm detect and recover from potential failures, improving the performance of three state-of-the-art VLA models pi0-FAST, OpenVLA, OpenVLA-OFT) by up to 22.6% on average across several tasks in Maniskill. Furthermore, FailSafe-VLM could generalize across different spatial configurations, camera viewpoints, and robotic embodiments. We plan to release the FailSafe code to the community.
comment: Project Page: https://jimntu.github.io/FailSafe/
VLA-R1: Enhancing Reasoning in Vision-Language-Action Models
Vision-Language-Action (VLA) models aim to unify perception, language understanding, and action generation, offering strong cross-task and cross-scene generalization with broad impact on embodied AI. However, current VLA models often lack explicit step-by-step reasoning, instead emitting final actions without considering affordance constraints or geometric relations. Their post-training pipelines also rarely reinforce reasoning quality, relying primarily on supervised fine-tuning with weak reward design. To address these challenges, we present VLA-R1, a reasoning-enhanced VLA that integrates Reinforcement Learning from Verifiable Rewards (RLVR) with Group Relative Policy Optimization (GRPO) to systematically optimize both reasoning and execution. Specifically, we design an RLVR-based post-training strategy with verifiable rewards for region alignment, trajectory consistency, and output formatting, thereby strengthening reasoning robustness and execution accuracy. Moreover, we develop VLA-CoT-13K, a high-quality dataset that provides chain-of-thought supervision explicitly aligned with affordance and trajectory annotations. Furthermore, extensive evaluations on in-domain, out-of-domain, simulation, and real-robot platforms demonstrate that VLA-R1 achieves superior generalization and real-world performance compared to prior VLA methods. We plan to release the model, code, and dataset following the publication of this work. Code: https://github.com/GigaAI-research/VLA-R1. Website: https://gigaai-research.github.io/VLA-R1.
ActiveUMI: Robotic Manipulation with Active Perception from Robot-Free Human Demonstrations
We present ActiveUMI, a framework for a data collection system that transfers in-the-wild human demonstrations to robots capable of complex bimanual manipulation. ActiveUMI couples a portable VR teleoperation kit with sensorized controllers that mirror the robot's end-effectors, bridging human-robot kinematics via precise pose alignment. To ensure mobility and data quality, we introduce several key techniques, including immersive 3D model rendering, a self-contained wearable computer, and efficient calibration methods. ActiveUMI's defining feature is its capture of active, egocentric perception. By recording an operator's deliberate head movements via a head-mounted display, our system learns the crucial link between visual attention and manipulation. We evaluate ActiveUMI on six challenging bimanual tasks. Policies trained exclusively on ActiveUMI data achieve an average success rate of 70\% on in-distribution tasks and demonstrate strong generalization, retaining a 56\% success rate when tested on novel objects and in new environments. Our results demonstrate that portable data collection systems, when coupled with learned active perception, provide an effective and scalable pathway toward creating generalizable and highly capable real-world robot policies.
comment: technique report. The website is available at https://activeumi.github.io
MiniBEE: A New Form Factor for Compact Bimanual Dexterity
Bimanual robot manipulators can achieve impressive dexterity, but typically rely on two full six- or seven- degree-of-freedom arms so that paired grippers can coordinate effectively. This traditional framework increases system complexity while only exploiting a fraction of the overall workspace for dexterous interaction. We introduce the MiniBEE (Miniature Bimanual End-effector), a compact system in which two reduced-mobility arms (3+ DOF each) are coupled into a kinematic chain that preserves full relative positioning between grippers. To guide our design, we formulate a kinematic dexterity metric that enlarges the dexterous workspace while keeping the mechanism lightweight and wearable. The resulting system supports two complementary modes: (i) wearable kinesthetic data collection with self-tracked gripper poses, and (ii) deployment on a standard robot arm, extending dexterity across its entire workspace. We present kinematic analysis and design optimization methods for maximizing dexterous range, and demonstrate an end-to-end pipeline in which wearable demonstrations train imitation learning policies that perform robust, real-world bimanual manipulation.
Real-time Multi-Plane Segmentation Based on GPU Accelerated High-Resolution 3D Voxel Mapping for Legged Robot Locomotion
This paper proposes a real-time multi-plane segmentation method based on GPU-accelerated high-resolution 3D voxel mapping for legged robot locomotion. Existing online planar mapping approaches struggle to balance accuracy and computational efficiency: direct depth image segmentation from specific sensors suffers from poor temporal integration, height map-based methods cannot represent complex 3D structures like overhangs, and voxel-based plane segmentation remains unexplored for real-time applications. To address these limitations, we develop a novel framework that integrates vertex-based connected component labeling with random sample consensus based plane detection and convex hull, leveraging GPU parallel computing to rapidly extract planar regions from point clouds accumulated in high-resolution 3D voxel maps. Experimental results demonstrate that the proposed method achieves fast and accurate 3D multi-plane segmentation at over 30 Hz update rate even at a resolution of 0.01 m, enabling the detected planes to be utilized in real time for locomotion tasks. Furthermore, we validate the effectiveness of our approach through experiments in both simulated environments and physical legged robot platforms, confirming robust locomotion performance when considering 3D planar structures.
comment: 8 pages, 12 figures, This work has been submitted to the IEEE for possible publication. Copyright may be transfered without notice, after which this version may no longer be accessible
Predictive Preference Learning from Human Interventions NeurIPS 2025
Learning from human involvement aims to incorporate the human subject to monitor and correct agent behavior errors. Although most interactive imitation learning methods focus on correcting the agent's action at the current state, they do not adjust its actions in future states, which may be potentially more hazardous. To address this, we introduce Predictive Preference Learning from Human Interventions (PPL), which leverages the implicit preference signals contained in human interventions to inform predictions of future rollouts. The key idea of PPL is to bootstrap each human intervention into L future time steps, called the preference horizon, with the assumption that the agent follows the same action and the human makes the same intervention in the preference horizon. By applying preference optimization on these future states, expert corrections are propagated into the safety-critical regions where the agent is expected to explore, significantly improving learning efficiency and reducing human demonstrations needed. We evaluate our approach with experiments on both autonomous driving and robotic manipulation benchmarks and demonstrate its efficiency and generality. Our theoretical analysis further shows that selecting an appropriate preference horizon L balances coverage of risky states with label correctness, thereby bounding the algorithmic optimality gap. Demo and code are available at: https://metadriverse.github.io/ppl
comment: NeurIPS 2025 Spotlight. Project page: https://metadriverse.github.io/ppl
Information Seeking for Robust Decision Making under Partial Observability
Explicit information seeking is essential to human problem-solving in practical environments characterized by incomplete information and noisy dynamics. When the true environmental state is not directly observable, humans seek information to update their internal dynamics and inform future decision-making. Although existing Large Language Model (LLM) planning agents have addressed observational uncertainty, they often overlook discrepancies between their internal dynamics and the actual environment. We introduce Information Seeking Decision Planner (InfoSeeker), an LLM decision-making framework that integrates task-oriented planning with information seeking to align internal dynamics and make optimal decisions under uncertainty in both agent observations and environmental dynamics. InfoSeeker prompts an LLM to actively gather information by planning actions to validate its understanding, detect environmental changes, or test hypotheses before generating or revising task-oriented plans. To evaluate InfoSeeker, we introduce a novel benchmark suite featuring partially observable environments with incomplete observations and uncertain dynamics. Experiments demonstrate that InfoSeeker achieves a 74% absolute performance gain over prior methods without sacrificing sample efficiency. Moreover, InfoSeeker generalizes across LLMs and outperforms baselines on established benchmarks such as robotic manipulation and web navigation. These findings underscore the importance of tightly integrating planning and information seeking for robust behavior in partially observable environments. The project page is available at https://infoseekerllm.github.io
comment: The project page is available at https://infoseekerllm.github.io
RSV-SLAM: Toward Real-Time Semantic Visual SLAM in Indoor Dynamic Environments
Simultaneous Localization and Mapping (SLAM) plays an important role in many robotics fields, including social robots. Many of the available visual SLAM methods are based on the assumption of a static world and struggle in dynamic environments. In the current study, we introduce a real-time semantic RGBD SLAM approach designed specifically for dynamic environments. Our proposed system can effectively detect moving objects and maintain a static map to ensure robust camera tracking. The key innovation of our approach is the incorporation of deep learning-based semantic information into SLAM systems to mitigate the impact of dynamic objects. Additionally, we enhance the semantic segmentation process by integrating an Extended Kalman filter to identify dynamic objects that may be temporarily idle. We have also implemented a generative network to fill in the missing regions of input images belonging to dynamic objects. This highly modular framework has been implemented on the ROS platform and can achieve around 22 fps on a GTX1080. Benchmarking the developed pipeline on dynamic sequences from the TUM dataset suggests that the proposed approach delivers competitive localization error in comparison with the state-of-the-art methods, all while operating in near real-time. The source code is publicly available.
comment: Proceedings of SAI Intelligent Systems Conference 2023
UMI-on-Air: Embodiment-Aware Guidance for Embodiment-Agnostic Visuomotor Policies
We introduce UMI-on-Air, a framework for embodiment-aware deployment of embodiment-agnostic manipulation policies. Our approach leverages diverse, unconstrained human demonstrations collected with a handheld gripper (UMI) to train generalizable visuomotor policies. A central challenge in transferring these policies to constrained robotic embodiments-such as aerial manipulators-is the mismatch in control and robot dynamics, which often leads to out-of-distribution behaviors and poor execution. To address this, we propose Embodiment-Aware Diffusion Policy (EADP), which couples a high-level UMI policy with a low-level embodiment-specific controller at inference time. By integrating gradient feedback from the controller's tracking cost into the diffusion sampling process, our method steers trajectory generation towards dynamically feasible modes tailored to the deployment embodiment. This enables plug-and-play, embodiment-aware trajectory adaptation at test time. We validate our approach on multiple long-horizon and high-precision aerial manipulation tasks, showing improved success rates, efficiency, and robustness under disturbances compared to unguided diffusion baselines. Finally, we demonstrate deployment in previously unseen environments, using UMI demonstrations collected in the wild, highlighting a practical pathway for scaling generalizable manipulation skills across diverse-and even highly constrained-embodiments. All code, data, and checkpoints will be publicly released after acceptance. Result videos can be found at umi-on-air.github.io.
comment: Result videos can be found at umi-on-air.github.io
SubSense: VR-Haptic and Motor Feedback for Immersive Control in Subsea Telerobotics
This paper investigates the integration of haptic feedback and virtual reality (VR) control interfaces to enhance teleoperation and telemanipulation of underwater ROVs (remotely operated vehicles). Traditional ROV teleoperation relies on low-resolution 2D camera feeds and lacks immersive and sensory feedback, which diminishes situational awareness in complex subsea environments. We propose SubSense -- a novel VR-Haptic framework incorporating a non-invasive feedback interface to an otherwise 1-DOF (degree of freedom) manipulator, which is paired with the teleoperator's glove to provide haptic feedback and grasp status. Additionally, our framework integrates end-to-end software for managing control inputs and displaying immersive camera views through a VR platform. We validate the system through comprehensive experiments and user studies, demonstrating its effectiveness over conventional teleoperation interfaces, particularly for delicate manipulation tasks. Our results highlight the potential of multisensory feedback in immersive virtual environments to significantly improve remote situational awareness and mission performance, offering more intuitive and accessible ROV operations in the field.
comment: Presented at the OCEANS 2025 Great Lakes Conference
Efficient Optimal Path Planning in Dynamic Environments Using Koopman MPC
This paper presents a data-driven model predictive control framework for mobile robots navigating in dynamic environments, leveraging Koopman operator theory. Unlike the conventional Koopman-based approaches that focus on the linearization of system dynamics only, our work focuses on finding a global linear representation for the optimal path planning problem that includes both the nonlinear robot dynamics and collision-avoidance constraints. We deploy extended dynamic mode decomposition to identify linear and bilinear Koopman realizations from input-state data. Our open-loop analysis demonstrates that only the bilinear Koopman model can accurately capture nonlinear state-input couplings and quadratic terms essential for collision avoidance, whereas linear realizations fail to do so. We formulate a quadratic program for the robot path planning in the presence of moving obstacles in the lifted space and determine the optimal robot action in an MPC framework. Our approach is capable of finding the safe optimal action 320 times faster than a nonlinear MPC counterpart that solves the path planning problem in the original state space. Our work highlights the potential of bilinear Koopman realizations for linearization of highly nonlinear optimal control problems subject to nonlinear state and input constraints to achieve computational efficiency similar to linear problems.
comment: This work has been submitted to the ACC2026 conference
A Recipe for Efficient Sim-to-Real Transfer in Manipulation with Online Imitation-Pretrained World Models
We are interested in solving the problem of imitation learning with a limited amount of real-world expert data. Existing offline imitation methods often struggle with poor data coverage and severe performance degradation. We propose a solution that leverages robot simulators to achieve online imitation learning. Our sim-to-real framework is based on world models and combines online imitation pretraining with offline finetuning. By leveraging online interactions, our approach alleviates the data coverage limitations of offline methods, leading to improved robustness and reduced performance degradation during finetuning. It also enhances generalization during domain transfer. Our empirical results demonstrate its effectiveness, improving success rates by at least 31.7% in sim-to-sim transfer and 23.3% in sim-to-real transfer over existing offline imitation learning baselines.
U-LAG: Uncertainty-Aware, Lag-Adaptive Goal Retargeting for Robotic Manipulation IROS 2025
Robots manipulating in changing environments must act on percepts that are late, noisy, or stale. We present U-LAG, a mid-execution goal-retargeting layer that leaves the low-level controller unchanged while re-aiming task goals (pre-contact, contact, post) as new observations arrive. Unlike motion retargeting or generic visual servoing, U-LAG treats in-flight goal re-aiming as a first-class, pluggable module between perception and control. Our main technical contribution is UAR-PF, an uncertainty-aware retargeter that maintains a distribution over object pose under sensing lag and selects goals that maximize expected progress. We instantiate a reproducible Shift x Lag stress test in PyBullet/PandaGym for pick, push, stacking, and peg insertion, where the object undergoes abrupt in-plane shifts while synthetic perception lag is injected during approach. Across 0-10 cm shifts and 0-400 ms lags, UAR-PF and ICP degrade gracefully relative to a no-retarget baseline, achieving higher success with modest end-effector travel and fewer aborts; simple operational safeguards further improve stability. Contributions: (1) UAR-PF for lag-adaptive, uncertainty-aware goal retargeting; (2) a pluggable retargeting interface; and (3) a reproducible Shift x Lag benchmark with evaluation on pick, push, stacking, and peg insertion.
comment: 8 pages, 5 figures. Accepted to the IROS 2025 Workshop on Perception and Planning for Mobile Manipulation in Changing Environments
SIMSplat: Predictive Driving Scene Editing with Language-aligned 4D Gaussian Splatting
Driving scene manipulation with sensor data is emerging as a promising alternative to traditional virtual driving simulators. However, existing frameworks struggle to generate realistic scenarios efficiently due to limited editing capabilities. To address these challenges, we present SIMSplat, a predictive driving scene editor with language-aligned Gaussian splatting. As a language-controlled editor, SIMSplat enables intuitive manipulation using natural language prompts. By aligning language with Gaussian-reconstructed scenes, it further supports direct querying of road objects, allowing precise and flexible editing. Our method provides detailed object-level editing, including adding new objects and modifying the trajectories of both vehicles and pedestrians, while also incorporating predictive path refinement through multi-agent motion prediction to generate realistic interactions among all agents in the scene. Experiments on the Waymo dataset demonstrate SIMSplat's extensive editing capabilities and adaptability across a wide range of scenarios. Project page: https://sungyeonparkk.github.io/simsplat/
ERUPT: An Open Toolkit for Interfacing with Robot Motion Planners in Extended Reality
We propose the Extended Reality Universal Planning Toolkit (ERUPT), an extended reality (XR) system for interactive motion planning. Our system allows users to create and dynamically reconfigure environments while they plan robot paths. In immersive three-dimensional XR environments, users gain a greater spatial understanding. XR also unlocks a broader range of natural interaction capabilities, allowing users to grab and adjust objects in the environment similarly to the real world, rather than using a mouse and keyboard with the scene projected onto a two-dimensional computer screen. Our system integrates with MoveIt, a manipulation planning framework, allowing users to send motion planning requests and visualize the resulting robot paths in virtual or augmented reality. We provide a broad range of interaction modalities, allowing users to modify objects in the environment and interact with a virtual robot. Our system allows operators to visualize robot motions, ensuring desired behavior as it moves throughout the environment, without risk of collisions within a virtual space, and to then deploy planned paths on physical robots in the real world.
Poutine: Vision-Language-Trajectory Pre-Training and Reinforcement Learning Post-Training Enable Robust End-to-End Autonomous Driving
Maintaining good driving behavior in out-of-distribution scenarios remains a critical challenge in autonomous driving. A promising direction is to leverage the generalist knowledge and reasoning capabilities of large-language models by treating unusual driving scenarios as a logical reasoning task. In this work, we present Poutine, a method that uses an off-the-shelf 3B-parameter vision-language model (VLM) - without any additional components - to achieve robust end-to-end autonomous driving via a simple and scalable training recipe. To learn strong base driving capabilities, we first train Poutine-Base using self-supervised next-token prediction over vision, language, and trajectory (VLT) tokens, leveraging both nominal and long-tail driving data. In the second stage, we fine-tune Poutine-Base using Group Relative Policy Optimization (GRPO) with a small set of human preference-labeled examples. We evaluated our approach on the Waymo end-to-end driving benchmark curated for long-tail scenarios. The final Poutine model achieves an RFS of 7.99 on the test set, placing 1st in the 2025 Waymo Vision-Based End-to-End Driving Challenge by a significant margin. Our results suggest that handcrafted tokenizers or custom architectural components added to base VLMs in prior work are not necessary to achieve strong driving performance. Instead, this work highlights the potential of scalable VLT pretraining combined with lightweight RL fine-tuning to enable robust and generalizable autonomous driving.
World Model for AI Autonomous Navigation in Mechanical Thrombectomy MICCAI 2025
Autonomous navigation for mechanical thrombectomy (MT) remains a critical challenge due to the complexity of vascular anatomy and the need for precise, real-time decision-making. Reinforcement learning (RL)-based approaches have demonstrated potential in automating endovascular navigation, but current methods often struggle with generalization across multiple patient vasculatures and long-horizon tasks. We propose a world model for autonomous endovascular navigation using TD-MPC2, a model-based RL algorithm. We trained a single RL agent across multiple endovascular navigation tasks in ten real patient vasculatures, comparing performance against the state-of-the-art Soft Actor-Critic (SAC) method. Results indicate that TD-MPC2 significantly outperforms SAC in multi-task learning, achieving a 65% mean success rate compared to SAC's 37%, with notable improvements in path ratio. TD-MPC2 exhibited increased procedure times, suggesting a trade-off between success rate and execution speed. These findings highlight the potential of world models for improving autonomous endovascular navigation and lay the foundation for future research in generalizable AI-driven robotic interventions.
comment: Published in Medical Image Computing and Computer Assisted Intervention - MICCAI 2025, Lecture Notes in Computer Science, vol 15968
VITA: Vision-to-Action Flow Matching Policy
Conventional flow matching and diffusion-based policies sample through iterative denoising from standard noise distributions (e.g., Gaussian), and require conditioning mechanisms to incorporate visual information during the generative process, incurring substantial time and memory overhead. To reduce the complexity, we develop VITA(VIsion-To-Action policy), a noise-free and conditioning-free policy learning framework that directly maps visual representations to latent actions using flow matching. VITA treats latent visual representations as the source of the flow, thus eliminating the need of conditioning. As expected, bridging vision and action is challenging, because actions are lower-dimensional, less structured, and sparser than visual representations; moreover, flow matching requires the source and target to have the same dimensionality. To overcome this, we introduce an action autoencoder that maps raw actions into a structured latent space aligned with visual latents, trained jointly with flow matching. To further prevent latent space collapse, we propose flow latent decoding, which anchors the latent generation process by backpropagating the action reconstruction loss through the flow matching ODE (ordinary differential equations) solving steps. We evaluate VITA on 8 simulation and 2 real-world tasks from ALOHA and Robomimic. VITA outperforms or matches state-of-the-art generative policies, while achieving 1.5-2.3x faster inference compared to conventional methods with conditioning. Project page: https://ucd-dare.github.io/VITA/
comment: Project page: https://ucd-dare.github.io/VITA/ Code: https://github.com/ucd-dare/VITA
FalconWing: An Ultra-Light Indoor Fixed-Wing UAV Platform for Vision-Based Autonomy
We introduce FalconWing, an ultra-light (150 g) indoor fixed-wing UAV platform for vision-based autonomy. Controlled indoor environment enables year-round repeatable UAV experiment but imposes strict weight and maneuverability limits on the UAV, motivating our ultra-light FalconWing design. FalconWing couples a lightweight hardware stack (137g airframe with a 9g camera) and offboard computation with a software stack featuring a photorealistic 3D Gaussian Splat (GSplat) simulator for developing and evaluating vision-based controllers. We validate FalconWing on two challenging vision-based aerial case studies. In the leader-follower case study, our best vision-based controller, trained via imitation learning on GSplat-rendered data augmented with domain randomization, achieves 100% tracking success across 3 types of leader maneuvers over 30 trials and shows robustness to leader's appearance shifts in simulation. In the autonomous landing case study, our vision-based controller trained purely in simulation transfers zero-shot to real hardware, achieving an 80% success rate over ten landing trials. We will release hardware designs, GSplat scenes, and dynamics models upon publication to make FalconWing an open-source flight kit for engineering students and research labs.
LPAC: Learnable Perception-Action-Communication Loops with Applications to Coverage Control
Coverage control is the problem of navigating a robot swarm to collaboratively monitor features or a phenomenon of interest not known a priori. The problem is challenging in decentralized settings with robots that have limited communication and sensing capabilities. We propose a learnable Perception-Action-Communication (LPAC) architecture for the problem, wherein a convolutional neural network (CNN) processes localized perception; a graph neural network (GNN) facilitates robot communications; finally, a shallow multi-layer perceptron (MLP) computes robot actions. The GNN enables collaboration in the robot swarm by computing what information to communicate with nearby robots and how to incorporate received information. Evaluations show that the LPAC models -- trained using imitation learning -- outperform standard decentralized and centralized coverage control algorithms. The learned policy generalizes to environments different from the training dataset, transfers to larger environments with more robots, and is robust to noisy position estimates. The results indicate the suitability of LPAC architectures for decentralized navigation in robot swarms to achieve collaborative behavior.
comment: 20 Pages, 20 figures, Accepted for publication in the IEEE Transactions on Robotics
SARM: Stage-Aware Reward Modeling for Long Horizon Robot Manipulation
Large-scale robot learning has recently shown promise for enabling robots to perform complex tasks by integrating perception, control, and language understanding. Yet, it struggles with long-horizon, contact-rich manipulation such as deformable object handling, where demonstration quality is inconsistent. Reward modeling offers a natural solution: by providing grounded progress signals, it transforms noisy demonstrations into stable supervision that generalizes across diverse trajectories. We introduce a stage-aware, video-based reward modeling framework that jointly predicts high-level task stages and fine-grained progress. Reward labels are automatically derived from natural language subtask annotations, ensuring consistent progress estimation across variable-length demonstrations. This design overcomes frame-index labeling, which fails in variable-duration tasks like folding a T-shirt. Our reward model demonstrates robustness to variability, generalization to out-of-distribution settings, and strong utility for policy training. Building on it, we propose Reward-Aligned Behavior Cloning (RA-BC), which filters high-quality data and reweights samples by reward. Experiments show the reward model alone outperforms baselines on validation and real robot rollouts. Integrated into RA-BC, our approach achieves 83\% success on folding T-shirts from the flattened state and 67\% from the crumpled state -- far surpassing vanilla behavior cloning, which attains only 8\% and 0\% success. Overall, our results highlight reward modeling as a key enabler for scalable, annotation-efficient, and robust imitation learning in long-horizon manipulation.
A Tactile Feedback Approach to Path Recovery after High-Speed Impacts for Collision-Resilient Drones
Aerial robots are a well-established solution for exploration, monitoring, and inspection, thanks to their superior maneuverability and agility. However, in many environments, they risk crashing and sustaining damage after collisions. Traditional methods focus on avoiding obstacles entirely, but these approaches can be limiting, particularly in cluttered spaces or on weight-and compute-constrained platforms such as drones. This paper presents a novel approach to enhance drone robustness and autonomy by developing a path recovery and adjustment method for a high-speed collision-resilient aerial robot equipped with lightweight, distributed tactile sensors. The proposed system explicitly models collisions using pre-collision velocities, rates and tactile feedback to predict post-collision dynamics, improving state estimation accuracy. Additionally, we introduce a computationally efficient vector-field-based path representation that guarantees convergence to a user-specified path, while naturally avoiding known obstacles. Post-collision, contact point locations are incorporated into the vector field as a repulsive potential, enabling the drone to avoid obstacles while naturally returning to its path. The effectiveness of this method is validated through Monte Carlo simulations and demonstrated on a physical prototype, showing successful path following, collision recovery, and adjustment at speeds up to 3.7 m/s.
NeoARCADE: Robust Calibration for Distance Estimation to Support Assistive Drones for the Visually Impaired
Autonomous navigation by drones using onboard sensors, combined with deep learning and computer vision algorithms, is impacting a number of domains. We examine the use of drones to autonomously follow and assist Visually Impaired People (VIPs) in navigating urban environments. Estimating the absolute distance between the drone and the VIP, and to nearby objects, is essential to design obstacle avoidance algorithms. Here, we present NeoARCADE (Neo), which uses depth maps over monocular video feeds, common in consumer drones, to estimate absolute distances to the VIP and obstacles. Neo proposes robust calibration technique based on depth score normalization and coefficient estimations to translate relative distances from depth map to absolute ones. It further develops a dynamic recalibration method that can adapt to changing scenarios. We also develop two baseline models, Regression and Geometric, and compare Neo with SOTA depth map approaches and the baselines. We provide detailed evaluations to validate their robustness and generalizability for distance estimation to VIPs and other obstacles in diverse and dynamic conditions, using datasets collected in a campus environment. Neo predicts distances to VIP with an error <30cm, and to different obstacles like cars and bicycles within a maximum error of 60cm, which are better than the baselines. Neo also clearly out-performs SOTA depth map methods, reporting errors up to 5.3-14.6x lower.
comment: 34 pages, 20 figures, 3 tables
Physics-Constrained Robot Grasp Planning for Dynamic Tool Use
Tool use requires not only selecting appropriate tools but also generating grasps and motions that remain stable under dynamic interactions. Existing approaches largely focus on high-level tool grounding or quasi-static manipulation, overlooking stability in dynamic and cluttered regimes. We introduce iTuP (inverse Tool-use Planning), a framework that outputs robot grasps explicitly tailored for tool use. iTuP integrates a physics-constrained grasp generator with a task-conditional scoring function to produce grasps that remain stable during dynamic tool interactions. These grasps account for manipulation trajectories, torque requirements, and slip prevention, enabling reliable execution of real-world tasks. Experiments across hammering, sweeping, knocking, and reaching tasks demonstrate that iTuP outperforms geometry-based and vision-language model (VLM)-based baselines in grasp stability and task success. Our results underscore that physics-constrained grasping is essential for robust robot tool use in quasi-static, dynamic, and cluttered environments.
comment: In submission and under review
VFP: Variational Flow-Matching Policy for Multi-Modal Robot Manipulation
Flow-matching-based policies have recently emerged as a promising approach for learning-based robot manipulation, offering significant acceleration in action sampling compared to diffusion-based policies. However, conventional flow-matching methods struggle with multi-modality, often collapsing to averaged or ambiguous behaviors in complex manipulation tasks. To address this, we propose the Variational Flow-Matching Policy (VFP), which introduces a variational latent prior for mode-aware action generation and effectively captures both task-level and trajectory-level multi-modality. VFP further incorporates Kantorovich Optimal Transport (K-OT) for distribution-level alignment and utilizes a Mixture-of-Experts (MoE) decoder for mode specialization and efficient inference. We comprehensively evaluate VFP on 41 simulated tasks and 3 real-robot tasks, demonstrating its effectiveness and sampling efficiency in both simulated and real-world settings. Results show that VFP achieves a 49% relative improvement in task success rate over standard flow-based baselines in simulation, and further outperforms them on real-robot tasks, while still maintaining fast inference and a compact model size. More details are available on our project page: https://sites.google.com/view/varfp/
Interactive Expressive Motion Generation Using Dynamic Movement Primitives IROS
Our goal is to enable social robots to interact autonomously with humans in a realistic, engaging, and expressive manner. The 12 Principles of Animation are a well-established framework animators use to create movements that make characters appear convincing, dynamic, and emotionally expressive. This paper proposes a novel approach that leverages Dynamic Movement Primitives (DMPs) to implement key animation principles, providing a learnable, explainable, modulable, online adaptable and composable model for automatic expressive motion generation. DMPs, originally developed for general imitation learning in robotics and grounded in a spring-damper system design, offer mathematical properties that make them particularly suitable for this task. Specifically, they enable modulation of the intensities of individual principles and facilitate the decomposition of complex, expressive motion sequences into learnable and parametrizable primitives. We present the mathematical formulation of the parameterized animation principles and demonstrate the effectiveness of our framework through experiments and application on three robotic platforms with different kinematic configurations, in simulation, on actual robots and in a user study. Our results show that the approach allows for creating diverse and nuanced expressions using a single base model.
comment: This paper has been accepted for publication at the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Model Evaluation of a Transformable CubeSat for Nonholonomic Attitude Reorientation Using a Drop Tower
This paper presents a design for a drop tower test to evaluate a numerical model for a structurally reconfigurable spacecraft with actuatable joints, referred to as a transformable spacecraft. A mock-up robot for a 3U-sized transformable spacecraft is designed to fit in a limited time and space of the microgravity environment available in the drop tower. The robot performs agile reorientation, referred to as nonholonomic attitude control, by actuating joints in a particular manner. To adapt to the very short duration of microgravity in the drop tower test, a successive joint actuation maneuver is optimized to maximize the amount of attitude reorientation within the time constraint. The robot records the angular velocity history of all four bodies, and the data is analyzed to evaluate the accuracy of the numerical model. We confirm that the constructed numerical model sufficiently replicates the robot's motion and show that the post-experiment model corrections further improve the accuracy of the numerical simulations. Finally, the difference between this drop tower test and the actual orbit demonstration is discussed to show the prospect.
comment: 22 pages, 22 figures
Temporal Overlapping Prediction: A Self-supervised Pre-training Method for LiDAR Moving Object Segmentation
Moving object segmentation (MOS) on LiDAR point clouds is crucial for autonomous systems like self-driving vehicles. Previous supervised approaches rely heavily on costly manual annotations, while LiDAR sequences naturally capture temporal motion cues that can be leveraged for self-supervised learning. In this paper, we propose Temporal Overlapping Prediction (TOP), a self-supervised pre-training method that alleviate the labeling burden for MOS. TOP explores the temporal overlapping points that commonly observed by current and adjacent scans, and learns spatiotemporal representations by predicting the occupancy states of temporal overlapping points. Moreover, we utilize current occupancy reconstruction as an auxiliary pre-training objective, which enhances the current structural awareness of the model. We conduct extensive experiments and observe that the conventional metric Intersection-over-Union (IoU) shows strong bias to objects with more scanned points, which might neglect small or distant objects. To compensate for this bias, we introduce an additional metric called mIoU_obj to evaluate object-level performance. Experiments on nuScenes and SemanticKITTI show that TOPoutperforms both supervised training-from-scratch baseline and other self-supervised pre-training baselines by up to 28.77% relative improvement, demonstrating strong transferability across LiDAR setups and generalization to other tasks. Code and pre-trained models will be publicly available upon publication.
Software Engineering for Self-Adaptive Robotics: A Research Agenda
Self-adaptive robotic systems operate autonomously in dynamic and uncertain environments, requiring robust real-time monitoring and adaptive behaviour. Unlike traditional robotic software with predefined logic, self-adaptive robots exploit artificial intelligence (AI), machine learning, and model-driven engineering to adapt continuously to changing conditions, thereby ensuring reliability, safety, and optimal performance. This paper presents a research agenda for software engineering in self-adaptive robotics, structured along two dimensions. The first concerns the software engineering lifecycle, requirements, design, development, testing, and operations, tailored to the challenges of self-adaptive robotics. The second focuses on enabling technologies such as digital twins, AI-driven adaptation, and quantum computing, which support runtime monitoring, fault detection, and automated decision-making. We identify open challenges, including verifying adaptive behaviours under uncertainty, balancing trade-offs between adaptability, performance, and safety, and integrating self-adaptation frameworks like MAPE-K/MAPLE-K. By consolidating these challenges into a roadmap toward 2030, this work contributes to the foundations of trustworthy and efficient self-adaptive robotic systems capable of meeting the complexities of real-world deployment.
LRFusionPR: A Polar BEV-Based LiDAR-Radar Fusion Network for Place Recognition
In autonomous driving, place recognition is critical for global localization in GPS-denied environments. LiDAR and radar-based place recognition methods have garnered increasing attention, as LiDAR provides precise ranging, whereas radar excels in adverse weather resilience. However, effectively leveraging LiDAR-radar fusion for place recognition remains challenging. The noisy and sparse nature of radar data limits its potential to further improve recognition accuracy. In addition, heterogeneous radar configurations complicate the development of unified cross-modality fusion frameworks. In this paper, we propose LRFusionPR, which improves recognition accuracy and robustness by fusing LiDAR with either single-chip or scanning radar. Technically, a dual-branch network is proposed to fuse different modalities within the unified polar coordinate bird's eye view (BEV) representation. In the fusion branch, cross-attention is utilized to perform cross-modality feature interactions. The knowledge from the fusion branch is simultaneously transferred to the distillation branch, which takes radar as its only input to further improve the robustness. Ultimately, the descriptors from both branches are concatenated, producing the multimodal global descriptor for place retrieval. Extensive evaluations on multiple datasets demonstrate that our LRFusionPR achieves accurate place recognition, while maintaining robustness under varying weather conditions. Our open-source code will be released at https://github.com/QiZS-BIT/LRFusionPR.
comment: Accepted by IEEE Robotics and Automation Letters (RAL) 2025
An effective control of large systems of active particles: An application to evacuation problem
Manipulation of large systems of active particles is a serious challenge across diverse domains, including crowd management, control of robotic swarms, and coordinated material transport. The development of advanced control strategies for complex scenarios is hindered, however, by the lack of scalability and robustness of the existing methods, in particular, due to the need of an individual control for each agent. One possible solution involves controlling a system through a leader or a group of leaders, which other agents tend to follow. Using such an approach we develop an effective control strategy for a leader, combining reinforcement learning (RL) with artificial forces acting on the system. To describe the guidance of active particles by a leader we introduce the generalized Vicsek model. This novel method is then applied to the problem of the effective evacuation by a robot-rescuer (leader) of large groups of people from hazardous places. We demonstrate, that while a straightforward application of RL yields suboptimal results, even for advanced architectures, our approach provides a robust and efficient evacuation strategy. The source code supporting this study is publicly available at: https://github.com/cinemere/evacuation.
PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes ICCV 2025
We introduce the novel task of Language-Guided Object Placement in Real 3D Scenes. Our model is given a 3D scene's point cloud, a 3D asset, and a textual prompt broadly describing where the 3D asset should be placed. The task here is to find a valid placement for the 3D asset that respects the prompt. Compared with other language-guided localization tasks in 3D scenes such as grounding, this task has specific challenges: it is ambiguous because it has multiple valid solutions, and it requires reasoning about 3D geometric relationships and free space. We inaugurate this task by proposing a new benchmark and evaluation protocol. We also introduce a new dataset for training 3D LLMs on this task, as well as the first method to serve as a non-trivial baseline. We believe that this challenging task and our new benchmark could become part of the suite of benchmarks used to evaluate and compare generalist 3D LLM models.
comment: ICCV 2025. Project page: https://nianticlabs.github.io/placeit3d/
DexUMI: Using Human Hand as the Universal Manipulation Interface for Dexterous Manipulation
We present DexUMI - a data collection and policy learning framework that uses the human hand as the natural interface to transfer dexterous manipulation skills to various robot hands. DexUMI includes hardware and software adaptations to minimize the embodiment gap between the human hand and various robot hands. The hardware adaptation bridges the kinematics gap using a wearable hand exoskeleton. It allows direct haptic feedback in manipulation data collection and adapts human motion to feasible robot hand motion. The software adaptation bridges the visual gap by replacing the human hand in video data with high-fidelity robot hand inpainting. We demonstrate DexUMI's capabilities through comprehensive real-world experiments on two different dexterous robot hand hardware platforms, achieving an average task success rate of 86%.
HAMLET: Switch your Vision-Language-Action Model into a History-Aware Policy
Inherently, robotic manipulation tasks are history-dependent: leveraging past context could be beneficial. However, most existing Vision-Language-Action models (VLAs) have been designed without considering this aspect, i.e., they rely solely on the current observation, ignoring preceding context. In this paper, we propose HAMLET, a scalable framework to adapt VLAs to attend to the historical context during action prediction. Specifically, we introduce moment tokens that compactly encode perceptual information at each timestep. Their representations are initialized with time-contrastive learning, allowing them to better capture temporally distinctive aspects. Next, we employ a lightweight memory module that integrates the moment tokens across past timesteps into memory features, which are then leveraged for action prediction. Through empirical evaluation, we show that HAMLET successfully transforms a state-of-the-art VLA into a history-aware policy, especially demonstrating significant improvements on long-horizon tasks that require historical context. In particular, on top of GR00T N1.5, HAMLET achieves an average success rate of 76.4% on history-dependent real-world tasks, surpassing the baseline performance by 47.2%. Furthermore, HAMLET pushes prior art performance from 64.1% to 66.4% on RoboCasa Kitchen (100-demo setup) and from 95.6% to 97.7% on LIBERO, highlighting its effectiveness even under generic robot-manipulation benchmarks.
comment: Project page: https://myungkyukoo.github.io/hamlet/
Safe Navigation of Bipedal Robots via Koopman Operator-Based Model Predictive Control
Nonlinearity in dynamics has long been a major challenge in robotics, often causing significant performance degradation in existing control algorithms. For example, the navigation of bipedal robots can exhibit nonlinear behaviors even under simple velocity commands, as their actual dynamics are governed by complex whole-body movements and discrete contacts. In this work, we propose a novel safe navigation framework inspired by Koopman operator theory. We first train a low-level locomotion policy using deep reinforcement learning, and then capture its low-frequency, base-level dynamics by learning linearized dynamics in a high-dimensional lifted space using Dynamic Mode Decomposition. Then, our model-predictive controller (MPC) efficiently optimizes control signals via standard quadratic objective and the linear dynamics constraint in the lifted space. We demonstrate that the Koopman-based model more accurately predicts bipedal robot trajectories than baseline approaches. Furthermore, we show that the proposed navigation framework achieves improved safety with better success rates in dense environments with narrow passages.
comment: 8 pages
Optimal Modified Feedback Strategies in LQ Games under Control Imperfections
Game-theoretic approaches and Nash equilibrium have been widely applied across various engineering domains. However, practical challenges such as disturbances, delays, and actuator limitations can hinder the precise execution of Nash equilibrium strategies. This work investigates the impact of such implementation imperfections on game trajectories and players' costs in the context of a two-player finite-horizon linear quadratic (LQ) nonzero-sum game. Specifically, we analyze how small deviations by one player, measured or estimated at each stage, affect the state and cost function of the other player. To mitigate these effects, we propose an adjusted control policy that optimally compensates for the deviations under the stated information structure and can, under certain conditions, exploit them to improve performance. Rigorous mathematical analysis and proofs are provided, and the effectiveness of the proposed method is demonstrated through a representative numerical example.
comment: 6 pages, 2 figures, Preprint version of a paper submitted to ACC 2026
A Benchmarking Study of Vision-based Robotic Grasping Algorithms
We present a benchmarking study of vision-based robotic grasping algorithms with distinct approaches, and provide a comparative analysis. In particular, we compare two machine-learning-based and two analytical algorithms using an existing benchmarking protocol from the literature and determine the algorithm's strengths and weaknesses under different experimental conditions. These conditions include variations in lighting, background textures, cameras with different noise levels, and grippers. We also run analogous experiments in simulations and with real robots and present the discrepancies. Some experiments are also run in two different laboratories using same protocols to further analyze the repeatability of our results. We believe that this study, comprising 5040 experiments, provides important insights into the role and challenges of systematic experimentation in robotic manipulation, and guides the development of new algorithms by considering the factors that could impact the performance. The experiment recordings and our benchmarking software are publicly available.
comment: This work was intended as a replacement of arXiv:2307.11622. I will upload it as a replacement to arXiv:2307.11622 simultaneously
VOTE: Vision-Language-Action Optimization with Trajectory Ensemble Voting
Recent large-scale Vision Language Action (VLA) models have shown superior performance in robotic manipulation tasks guided by natural language. However, current VLA models suffer from two drawbacks: (i) generation of massive tokens leading to high inference latency and increased training cost, and (ii) insufficient utilization of generated actions resulting in potential performance loss. To address these issues, we develop a training framework to finetune VLA models for generating significantly fewer action tokens with high parallelism, effectively reducing inference latency and training cost. Furthermore, we introduce an inference optimization technique with a novel voting-based ensemble strategy to combine current and previous action predictions, improving the utilization of generated actions and overall performance. Our results demonstrate that we achieve superior performance compared with state-of-the-art VLA models, achieving significantly higher success rates and 39$\times$ faster inference than OpenVLA with 46 Hz throughput on edge platforms, demonstrating practical deployability. The code is available at https://github.com/LukeLIN-web/VOTE.
Active Alignments of Lens Systems with Reinforcement Learning
Aligning a lens system relative to an imager is a critical challenge in camera manufacturing. While optimal alignment can be mathematically computed under ideal conditions, real-world deviations caused by manufacturing tolerances often render this approach impractical. Measuring these tolerances can be costly or even infeasible, and neglecting them may result in suboptimal alignments. We propose a reinforcement learning (RL) approach that learns exclusively in the pixel space of the sensor output, eliminating the need to develop expert-designed alignment concepts. We conduct an extensive benchmark study and show that our approach surpasses other methods in speed, precision, and robustness. We further introduce relign, a realistic, freely explorable, open-source simulation utilizing physically based rendering that models optical systems with non-deterministic manufacturing tolerances and noise in robotic alignment movement. It provides an interface to popular machine learning frameworks, enabling seamless experimentation and development. Our work highlights the potential of RL in a manufacturing environment to enhance efficiency of optical alignments while minimizing the need for manual intervention.
comment: This work has been submitted to the IEEE for possible publication
A Multi-Fidelity Control Variate Approach for Policy Gradient Estimation
Many reinforcement learning (RL) algorithms are impractical for deployment in operational systems or for training with computationally expensive high-fidelity simulations, as they require large amounts of data. Meanwhile, low-fidelity simulators -- such as reduced-order models, heuristic rewards, or generative world models -- can cheaply provide useful data for RL training, even if they are too coarse for zero-shot transfer. We propose multi-fidelity policy gradients (MFPGs), an RL framework that mixes a small amount of data from the target environment with a control variate formed from a large volume of low-fidelity simulation data to construct an unbiased, variance-reduced estimator for on-policy policy gradients. We instantiate the framework with a multi-fidelity variant of the classical REINFORCE algorithm. We show that under standard assumptions, the MFPG estimator guarantees asymptotic convergence of REINFORCE to locally optimal policies in the target environment, and achieves faster finite-sample convergence rates compared to training with high-fidelity data alone. Empirically, we evaluate the MFPG algorithm across a suite of simulated robotics benchmark tasks with limited high-fidelity data but abundant off-dynamics, low-fidelity data. With mild-moderate dynamics gaps, MFPG reliably improves the median performance over a high-fidelity-only baseline, matching the performance of leading multi-fidelity baselines despite its simplicity and minimal tuning overhead. Under large dynamics gaps, MFPG demonstrates the strongest robustness among the evaluated multi-fidelity approaches. An additional experiment shows that MFPG can remain effective even under low-fidelity reward misspecification. Thus, MFPG not only offers a novel paradigm for efficient sim-to-real transfer but also provides a principled approach to managing the trade-off between policy performance and data collection costs.
Multiagent Systems
FalseCrashReducer: Mitigating False Positive Crashes in OSS-Fuzz-Gen Using Agentic AI
Fuzz testing has become a cornerstone technique for identifying software bugs and security vulnerabilities, with broad adoption in both industry and open-source communities. Directly fuzzing a function requires fuzz drivers, which translate random fuzzer inputs into valid arguments for the target function. Given the cost and expertise required to manually develop fuzz drivers, methods exist that leverage program analysis and Large Language Models to automatically generate these drivers. However, the generated fuzz drivers frequently lead to false positive crashes, especially in functions highly structured input and complex state requirements. This problem is especially crucial in industry-scale fuzz driver generation efforts like OSS-Fuzz-en, as reporting false positive crashes to maintainers impede trust in both the system and the team. This paper presents two AI-driven strategies to reduce false positives in OSS-Fuzz-Gen, a multi-agent system for automated fuzz driver generation. First, constraint-based fuzz driver generation proactively enforces constraints on a function's inputs and state to guide driver creation. Second, context-based crash validation reactively analyzes function callers to determine whether reported crashes are feasible from program entry points. Using 1,500 benchmark functions from OSS-Fuzz, we show that these strategies reduce spurious crashes by up to 8%, cut reported crashes by more than half, and demonstrate that frontier LLMs can serve as reliable program analysis agents. Our results highlight the promise and challenges of integrating AI into large-scale fuzzing pipelines.
comment: 12 pages, 2 figures
BioinfoMCP: A Unified Platform Enabling MCP Interfaces in Agentic Bioinformatics
Bioinformatics tools are essential for complex computational biology tasks, yet their integration with emerging AI-agent frameworks is hindered by incompatible interfaces, heterogeneous input-output formats, and inconsistent parameter conventions. The Model Context Protocol (MCP) provides a standardized framework for tool-AI communication, but manually converting hundreds of existing and rapidly growing specialized bioinformatics tools into MCP-compliant servers is labor-intensive and unsustainable. Here, we present BioinfoMCP, a unified platform comprising two components: BioinfoMCP Converter, which automatically generates robust MCP servers from tool documentation using large language models, and BioinfoMCP Benchmark, which systematically validates the reliability and versatility of converted tools across diverse computational tasks. We present a platform of 38 MCP-converted bioinformatics tools, extensively validated to show that 94.7% successfully executed complex workflows across three widely used AI-agent platforms. By removing technical barriers to AI automation, BioinfoMCP enables natural-language interaction with sophisticated bioinformatics analyses without requiring extensive programming expertise, offering a scalable path to intelligent, interoperable computational biology.
comment: 20 pages, 8 figures, 3 tables
Cooperative Guidance for Aerial Defense in Multiagent Systems
This paper addresses a critical aerial defense challenge in contested airspace, involving three autonomous aerial vehicles -- a hostile drone (the pursuer), a high-value drone (the evader), and a protective drone (the defender). We present a cooperative guidance framework for the evader-defender team that guarantees interception of the pursuer before it can capture the evader, even under highly dynamic and uncertain engagement conditions. Unlike traditional heuristic, optimal control, or differential game-based methods, we approach the problem within a time-constrained guidance framework, leveraging true proportional navigation based approach that ensures robust and guaranteed solutions to the aerial defense problem. The proposed strategy is computationally lightweight, scalable to a large number of agent configurations, and does not require knowledge of the pursuer's strategy or control laws. From arbitrary initial geometries, our method guarantees that key engagement errors are driven to zero within a fixed time, leading to a successful mission. Extensive simulations across diverse and adversarial scenarios confirm the effectiveness of the proposed strategy and its relevance for real-time autonomous defense in contested airspace environments.
To Mask or to Mirror: Human-AI Alignment in Collective Reasoning
As large language models (LLMs) are increasingly used to model and augment collective decision-making, it is critical to examine their alignment with human social reasoning. We present an empirical framework for assessing collective alignment, in contrast to prior work on the individual level. Using the Lost at Sea social psychology task, we conduct a large-scale online experiment (N=748), randomly assigning groups to leader elections with either visible demographic attributes (e.g. name, gender) or pseudonymous aliases. We then simulate matched LLM groups conditioned on the human data, benchmarking Gemini 2.5, GPT 4.1, Claude Haiku 3.5, and Gemma 3. LLM behaviors diverge: some mirror human biases; others mask these biases and attempt to compensate for them. We empirically demonstrate that human-AI alignment in collective reasoning depends on context, cues, and model-specific inductive biases. Understanding how LLMs align with collective human behavior is critical to advancing socially-aligned AI, and demands dynamic benchmarks that capture the complexities of collective reasoning.
TACOS: Task Agnostic COordinator of a multi-drone System
When a single pilot is responsible for managing a multi-drone system, the task demands varying levels of autonomy, from direct control of individual UAVs, to group-level coordination, to fully autonomous swarm behaviors for accomplishing high-level tasks. Enabling such flexible interaction requires a framework that supports multiple modes of shared autonomy. As language models continue to improve in reasoning and planning, they provide a natural foundation for such systems, reducing pilot workload by enabling high-level task delegation through intuitive, language-based interfaces. In this paper we present TACOS (Task-Agnostic COordinator of a multi-drone System), a unified framework that enables high-level natural language control of multi-UAV systems through Large Language Models (LLMs). TACOS integrates three key capabilities into a single architecture: a one-to-many natural language interface for intuitive user interaction, an intelligent coordinator for translating user intent into structured task plans, and an autonomous agent that executes plans interacting with the real-world. TACOS allows a LLM to interact with a library of executable APIs, bridging semantic reasoning with real-time multi-robot coordination. We demonstrate the system in real-world multi-drone system and conduct an ablation study to assess the contribution of each module.
comment: 6 pages, 6 figures, accepted as poster at 2025 IEEE International Symposium on Multi-Robot & Multi-Agent Systems
FOR-Prompting: From Objection to Revision via an Asymmetric Prompting Protocol
Reasoning protocols such as Chain of Thought (CoT) and Tree of Thought (ToT) organize internal deliberation but lack an explicit mechanism for external questioning that elicits self-revision. We present FOR-Prompting (From Objection to Revision Prompting), an asymmetric protocol where a Defender proposes an answer, an Objectioner raises question-style objections with no direct fixes, and a Host enforces consistency and closure. On GSM8K we observe about a 22% point gain over single-prompt and accuracy on par with CoT, with more than 10% higher ratings in reasoning and coherence from a uniform GPT 4.1 judge. FOR-Prompting also corrects mistakes without tools or human supervision on tricky queries, and improves performance for small-scale model (approx. 19% accuracy improved on Llama3.2:1b for GSM8K task), highlighting promise for small models and on personal device use. Beyond factual QA, qualitative analyses on open-ended tasks show enhanced exploration and refinement, with dialogue traces that make assumptions and trade-offs explicit. The protocol is model agnostic and operates purely at the prompt level through role-structured turns, so it works with hosted and local models of different sizes without retraining, and it supports large-scale study of objection-guided reasoning.
CLARITY: Clinical Assistant for Routing, Inference, and Triage EMNLP 2025
We present CLARITY (Clinical Assistant for Routing, Inference, and Triage), an AI-driven platform designed to facilitate patient-to-specialist routing, clinical consultations, and severity assessment of patients' conditions. Its hybrid architecture combines a Finite State Machine (FSM) for structured dialogue flows with collaborative agents that employ Large Language Model (LLM) to analyze symptoms and prioritize referrals to appropriate specialists. Built on a modular microservices framework, CLARITY ensures safe, efficient, and robust performance, flexible and readily scalable to meet the demands of existing workflows and IT solutions in healthcare. We report integration of our clinical assistant into a large-scale nation-wide inter-hospital IT platform, with over 55,000 content-rich user dialogues completed within the two months of deployment, 2,500 of which were expert-annotated for a consequent validation. The validation results show that CLARITY surpasses human-level performance in terms of the first-attempt routing precision, naturally requiring up to 3 times shorter duration of the consultation than with a human.
comment: Accepted to EMNLP 2025 (Industrial Track)
KVComm: Enabling Efficient LLM Communication through Selective KV Sharing
Large Language Models (LLMs) are increasingly deployed in multi-agent systems, where effective inter-model communication is crucial. Existing communication protocols either rely on natural language, incurring high inference costs and information loss, or on hidden states, which suffer from information concentration bias and inefficiency. To address these limitations, we propose KVComm, a novel communication framework that enables efficient communication between LLMs through selective sharing of KV pairs. KVComm leverages the rich information encoded in the KV pairs while avoiding the pitfalls of hidden states. We introduce a KV layer-wise selection strategy based on attention importance scores with a Gaussian prior to identify the most informative KV pairs for communication. Extensive experiments across diverse tasks and model pairs demonstrate that KVComm achieves comparable performance to the upper-bound method, which directly merges inputs to one model without any communication, while transmitting as few as 30\% of layers' KV pairs. Our study highlights the potential of KV pairs as an effective medium for inter-LLM communication, paving the way for scalable and efficient multi-agent systems.
FlashResearch: Real-time Agent Orchestration for Efficient Deep Research
Deep research agents, which synthesize information across diverse sources, are significantly constrained by their sequential reasoning processes. This architectural bottleneck results in high latency, poor runtime adaptability, and inefficient resource allocation, making them impractical for interactive applications. To overcome this, we introduce FlashResearch, a novel framework for efficient deep research that transforms sequential processing into parallel, runtime orchestration by dynamically decomposing complex queries into tree-structured sub-tasks. Our core contributions are threefold: (1) an adaptive planner that dynamically allocates computational resources by determining research breadth and depth based on query complexity; (2) a real-time orchestration layer that monitors research progress and prunes redundant paths to reallocate resources and optimize efficiency; and (3) a multi-dimensional parallelization framework that enables concurrency across both research breadth and depth. Experiments show that FlashResearch consistently improves final report quality within fixed time budgets, and can deliver up to a 5x speedup while maintaining comparable quality.
Implementing Agents in JavaScript
This chapter gives an introduction to agent-oriented programming in JavaScript. It provides an example-based walk-through of how to implement abstractions for reasoning loop agents in vanilla JavaScript. The initial example is used as a stepping stone for explaining how to implement slightly more advanced agents and multi-agent systems using JS-son, a JavaScript library for agent-oriented programming. In this context, the chapter also explains how to integrate reasoning loop agents with generative AI technologies--specifically, large language models. Finally, application scenarios in several technology ecosystems and future research directions are sketched.
comment: This chapter will eventually by published in the book "Agents and Multi-Agent Systems Development -- Platforms, Toolkits, Technologies", edited by Collier, Mascardi, and Ricci
Programming Distributed Collective Processes in the eXchange Calculus
Recent trends like the Internet of Things (IoT) suggest a vision of dense and multi-scale deployments of computing devices in nearly all kinds of environments. A prominent engineering challenge revolves around programming the collective adaptive behaviour of such computational ecosystems. This requires abstractions able to capture concepts like ensembles (dynamic groups of cooperating devices) and collective tasks (joint activities carried out by ensembles). In this work, we consider collections of devices interacting with neighbours and that execute in nearly-synchronised sense-compute-interact rounds, where the computation is given by a single program mapping sensing values and incoming messages to output and outcoming messages. To support programming whole computational collectives, we propose the abstraction of a distributed collective process, which can be used to define at once the ensemble formation logic and its collective task. We formalise the abstraction in the eXchange Calculus (XC), a core functional language based on neighbouring values (maps from neighbours to values) where state and interaction is handled through a single primitive, exchange, and provide a corresponding implementation in the FCPP language. Then, we exercise distributed collective processes using two case studies: multi-hop message propagation and distributed monitoring of spatial properties. Finally, we discuss the features of the abstraction and its suitability for different kinds of distributed computing applications.
An Architecture for Spatial Networking
Physical spaces are increasingly dense with networked devices, promising seamless coordination and ambient intelligence. Yet today, cloud-first architectures force all communication through wide-area networks regardless of physical proximity. We lack an abstraction for spatial networking: using physical spaces to create boundaries for private, robust, and low-latency communication. We introduce $\textit{Bifr\"ost}$, a programming model that realizes spatial networking using bigraphs to express both containment and connectivity, enabling policies to be scoped by physical boundaries, devices to be named by location, the instantiation of spatial services, and the composition of spaces while maintaining local autonomy. Bifr\"ost enables a new class of spatially-aware applications, where co-located devices communicate directly, physical barriers require explicit gateways, and local control bridges to global coordination.
AniMaker: Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation
Despite rapid advancements in video generation models, generating coherent storytelling videos that span multiple scenes and characters remains challenging. Current methods often rigidly convert pre-generated keyframes into fixed-length clips, resulting in disjointed narratives and pacing issues. Furthermore, the inherent instability of video generation models means that even a single low-quality clip can significantly degrade the entire output animation's logical coherence and visual continuity. To overcome these obstacles, we introduce AniMaker, a multi-agent framework enabling efficient multi-candidate clip generation and storytelling-aware clip selection, thus creating globally consistent and story-coherent animation solely from text input. The framework is structured around specialized agents, including the Director Agent for storyboard generation, the Photography Agent for video clip generation, the Reviewer Agent for evaluation, and the Post-Production Agent for editing and voiceover. Central to AniMaker's approach are two key technical components: MCTS-Gen in Photography Agent, an efficient Monte Carlo Tree Search (MCTS)-inspired strategy that intelligently navigates the candidate space to generate high-potential clips while optimizing resource usage; and AniEval in Reviewer Agent, the first framework specifically designed for multi-shot animation evaluation, which assesses critical aspects such as story-level consistency, action completion, and animation-specific features by considering each clip in the context of its preceding and succeeding clips. Experiments demonstrate that AniMaker achieves superior quality as measured by popular metrics including VBench and our proposed AniEval framework, while significantly improving the efficiency of multi-candidate generation, pushing AI-generated storytelling animation closer to production standards.
Optimal Modified Feedback Strategies in LQ Games under Control Imperfections
Game-theoretic approaches and Nash equilibrium have been widely applied across various engineering domains. However, practical challenges such as disturbances, delays, and actuator limitations can hinder the precise execution of Nash equilibrium strategies. This work investigates the impact of such implementation imperfections on game trajectories and players' costs in the context of a two-player finite-horizon linear quadratic (LQ) nonzero-sum game. Specifically, we analyze how small deviations by one player, measured or estimated at each stage, affect the state and cost function of the other player. To mitigate these effects, we propose an adjusted control policy that optimally compensates for the deviations under the stated information structure and can, under certain conditions, exploit them to improve performance. Rigorous mathematical analysis and proofs are provided, and the effectiveness of the proposed method is demonstrated through a representative numerical example.
comment: 6 pages, 2 figures, Preprint version of a paper submitted to ACC 2026
Systems and Control (CS)
Game-theoretic Social Distancing in Competitive Bi-Virus SIS Epidemics
Numerous elements drive the spread of infectious diseases in complex real-world networks. Of particular interest is social behaviors that evolve in tandem with the spread of disease. Moreover, recent studies highlight the importance of understanding how multiple strains spread simultaneously through a population (e.g. Delta and Omicron variants of SARS-CoV-2). In this paper, we propose a bi-virus SIS epidemic model coupled with a game-theoretic social distancing behavior model. The behaviors are governed by replicator equations from evolutionary game theory. The prevalence of each strain impacts the choice of an individual to social distance, and, in turn, their behavior affects the spread of each virus in the SIS model. Our analysis identifies equilibria of the system and their local stability properties, which reveal several isolated fixed points with varying levels of social distancing. We find that endemic co-existence is possible only when the reproduction numbers of both strains are equal. Assuming the reproduction number for each virus is the same, we identify suitable parameter regimes that give rise to lines of coexistence equilibria. Moreover, we also identify conditions for local exponential stability of said lines of equilibria. We illustrate our findings with several numerical simulations.
Computing Control Lyapunov-Barrier Functions: Softmax Relaxation and Smooth Patching with Formal Guarantees
We present a computational framework for synthesizing a single smooth Lyapunov function that certifies both asymptotic stability and safety. We show that the existence of a strictly compatible pair of control barrier and control Lyapunov functions (CBF-CLF) guarantees the existence of such a function on the exact safe set certified by the barrier. To maximize the certifiable safe domain while retaining differentiability, we employ a log-sum-exp (softmax) relaxation of the nonsmooth maximum barrier, together with a counterexample-guided refinement that inserts half-space cuts until a strict barrier condition is verifiable. We then patch the softmax barrier with a CLF via an explicit smooth bump construction, which is always feasible under the strict compatibility condition. All conditions are formally verified using a satisfiability modulo theories (SMT) solver, enabled by a reformulation of Farkas' lemma for encoding strict compatibility. On benchmark systems, including a power converter, we show that the certified safe stabilization regions obtained with the proposed approach are often less conservative than those achieved by state-of-the-art sum-of-squares (SOS) compatible CBF-CLF designs.
Detection and Identification of Sensor Attacks Using Data
In this paper, we investigate data-driven attack detection and identification in a model-free setting. Unlike existing studies, we consider the case where the available output data include malicious false-data injections. We aim to detect and identify such attacks solely from the compromised data. We address this problem in two scenarios: (1) when the system operator is aware of the system's sparse observability condition, and (2) when the data are partially clean (i.e., attack-free). In both scenarios, we derive conditions and algorithms for detecting and identifying attacks using only the compromised data. Finally, we demonstrate the effectiveness of the proposed framework via numerical simulations on a three-inertia system.
Product Digital Twin Supporting End-of-life Phase of Electric Vehicle Batteries Utilizing Product-Process-Resource Asset Network
In the context of the circular economy, products in their end-of-life phase should be either remanufactured or recycled. Both of these processes are crucial for sustainability and environmental conservation. However, manufacturers often do not support these processes enough by not sharing relevant data. This paper proposes use of a digital twin technology, which is capable to help optimizing the disassembly processes to reduce ecological impact and enhance sustainability. The proposed approach is demonstrated through a disassembly use-case of the product digital twin of an electric vehicle battery. By utilizing product digital twins, challenges associated with the disassembly of electric vehicle batteries can be solved flexibly and efficiently for various battery types. As a backbone for the product digital twin representation, the paper uses the paradigm of product-process-resource asset networks (PAN). Such networks enable to model relevant relationships across products, production resources, manufacturing processes, and specific production operations that have to be done in the manufacturing phase of a product. This paper introduces a Bi-Flow Product-Process-Resource Asset Network (Bi-PAN) representation, which extends the PAN paradigm to cover not only the manufacturing, but also the remanufacturing/recycling phase.
comment: This work has been submitted to the IEEE for possible publication. 6 pages, 4 figures
On the (almost) Global Exponential Convergence of the Overparameterized Policy Optimization for the LQR Problem
In this work we study the convergence of gradient methods for nonconvex optimization problems -- specifically the effect of the problem formulation to the convergence behavior of the solution of a gradient flow. We show through a simple example that, surprisingly, the gradient flow solution can be exponentially or asymptotically convergent, depending on how the problem is formulated. We then deepen the analysis and show that a policy optimization strategy for the continuous-time linear quadratic regulator (LQR) (which is known to present only asymptotic convergence globally) presents almost global exponential convergence if the problem is overparameterized through a linear feed-forward neural network (LFFNN). We prove this qualitative improvement always happens for a simplified version of the LQR problem and derive explicit convergence rates for the gradient flow. Finally, we show that both the qualitative improvement and the quantitative rate gains persist in the general LQR through numerical simulations.
comment: This version is currently under review for the 2026 IEEE American Control Conference (ACC)
Recurrent Control Barrier Functions: A Path Towards Nonparametric Safety Verification
Ensuring the safety of complex dynamical systems often relies on Hamilton-Jacobi (HJ) Reachability Analysis or Control Barrier Functions (CBFs). Both methods require computing a function that characterizes a safe set that can be made (control) invariant. However, the computational burden of solving high-dimensional partial differential equations (for HJ Reachability) or large-scale semidefinite programs (for CBFs) makes finding such functions challenging. In this paper, we introduce the notion of Recurrent Control Barrier Functions (RCBFs), a novel class of CBFs that leverages a recurrent property of the trajectories, i.e., coming back to a safe set, for safety verification. Under mild assumptions, we show that the RCBF condition holds for the signed-distance function, turning function design into set identification. Notably, the resulting set need not be invariant to certify safety. We further propose a data-driven nonparametric method to compute safe sets that is massively parallelizable and trades off conservativeness against computational cost.
comment: 8 Pages, 3 Figures
Cooperative Guidance for Aerial Defense in Multiagent Systems
This paper addresses a critical aerial defense challenge in contested airspace, involving three autonomous aerial vehicles -- a hostile drone (the pursuer), a high-value drone (the evader), and a protective drone (the defender). We present a cooperative guidance framework for the evader-defender team that guarantees interception of the pursuer before it can capture the evader, even under highly dynamic and uncertain engagement conditions. Unlike traditional heuristic, optimal control, or differential game-based methods, we approach the problem within a time-constrained guidance framework, leveraging true proportional navigation based approach that ensures robust and guaranteed solutions to the aerial defense problem. The proposed strategy is computationally lightweight, scalable to a large number of agent configurations, and does not require knowledge of the pursuer's strategy or control laws. From arbitrary initial geometries, our method guarantees that key engagement errors are driven to zero within a fixed time, leading to a successful mission. Extensive simulations across diverse and adversarial scenarios confirm the effectiveness of the proposed strategy and its relevance for real-time autonomous defense in contested airspace environments.
Event-triggered control and communication for single-master multi-slave teleoperation systems with Try-Once-Discard protocol
Single-master multi-slave (SMMS) teleoperation systems can perform multiple tasks remotely in a shorter time, cover large-scale areas, and adapt more easily to single-point failures, thereby effectively encompassing a broader range of applications. As the number of slave manipulators sharing a communication network increases, the limitation of communication bandwidth becomes critical. To alleviate bandwidth usage, the Try-Once-Discard (TOD) scheduling protocol and event-triggered mechanisms are often employed separately. In this paper, we combine both strategies to optimize network bandwidth and energy consumption for SMMS teleoperation systems. Specifically, we propose event-triggered control and communication schemes for a class of SMMS teleoperation systems using the TOD scheduling protocol. Considering dynamic uncertainties, the unavailability of relative velocities, and time-varying delays, we develop adaptive controllers with virtual observers based on event-triggered schemes to achieve master-slave synchronization. Stability criteria for the SMMS teleoperation systems under these event-triggered control and communication schemes are established, demonstrating that Zeno behavior is excluded. Finally, experiments are conducted to validate the effectiveness of the proposed algorithms.
LLM-Enhanced, Data-Driven Personalized and Equitable Clinician Scheduling: A Predict-then-Optimize Approach ICDM 2025
Clinician scheduling remains a persistent challenge due to limited clinical resources and fluctuating demands. This complexity is especially acute in large academic anesthesiology departments as physicians balance responsibilities across multiple clinical sites with conflicting priorities. Further, scheduling must account for individual clinical and lifestyle preferences to ensure job satisfaction and well-being. Traditional approaches, often based on statistical or rule-based optimization models, rely on structured data and explicit domain knowledge. However, these methods often overlook unstructured information, e.g., free-text notes from routinely administered clinician well-being surveys and scheduling platforms. These notes may reveal implicit and underutilized clinical resources. Neglecting such information can lead to misaligned schedules, increased burnout, overlooked staffing flexibility, and suboptimal utilization of available resources. To address this gap, we propose a predict-then-optimize framework that integrates classification-based clinician availability predictions with a mixed-integer programming schedule optimization model. Large language models (LLMs) are employed to extract actionable preferences and implicit constraints from unstructured schedule notes, enhancing the reliability of availability predictions. These predictions then inform the schedule optimization considering four objectives: first, ensuring clinical full-time equivalent compliance, second, reducing workload imbalances by enforcing equitable proportions of shift types, third, maximizing clinician availability for assigned shifts, and fourth, schedule consistency. By combining the interpretive power of LLMs with the rigor of mathematical optimization, our framework provides a robust, data-driven solution that enhances operational efficiency while supporting equity and clinician well-being.
comment: 10 pages, 5 figures, Accepted to IEEE ICDM 2025 Workshops Proceedings; IEEE Computer Society Press
Coordinated Car-following Using Distributed MPC
Within the modeling framework of Markov games, we propose a series of algorithms for coordinated car-following using distributed model predictive control (DMPC). Instead of tracking prescribed feasible trajectories, driving policies are solved directly as outcomes of the DMPC optimization given the driver's perceivable states. The coordinated solutions are derived using the best response dynamics via iterated self-play, and are facilitated by direct negotiation using inter-agent or agent-infrastructure communication. These solutions closely approximate either Nash equilibrium or centralized optimization. By re-parameterizing the action sequence in DMPC as a curve along the planning horizon, we are able to systematically reduce the original DMPC to very efficient grid searches such that the optimal solution to the original DMPC can be well executed in real-time. Within our modeling framework, it is natural to cast traffic control problems as mechanism design problems, in which all agents are endogenized on an equal footing with full incentive compatibility. We show how traffic efficiency can be dramatically improved while keeping stop-and-go phantom waves tamed at high vehicle densities. Our approach can be viewed as an alternative way to formulate coordinated adaptive cruise control (CACC) without an explicit platooning (or with all vehicles in the traffic system treated as a single extended platoon). We also address the issue of linear stability of the associated discrete-time traffic dynamics and demonstrate why it does not always tell the full story about the traffic stability.
Wearable and Ultra-Low-Power Fusion of EMG and A-Mode US for Hand-Wrist Kinematic Tracking
Hand gesture recognition based on biosignals has shown strong potential for developing intuitive human-machine interaction strategies that closely mimic natural human behavior. In particular, sensor fusion approaches have gained attention for combining complementary information and overcoming the limitations of individual sensing modalities, thereby enabling more robust and reliable systems. Among them, the fusion of surface electromyography (EMG) and A-mode ultrasound (US) is very promising. However, prior solutions rely on power-hungry platforms unsuitable for multi-day use and are limited to discrete gesture classification. In this work, we present an ultra-low-power (sub-50 mW) system for concurrent acquisition of 8-channel EMG and 4-channel A-mode US signals, integrating two state-of-the-art platforms into fully wearable, dry-contact armbands. We propose a framework for continuous tracking of 23 degrees of freedom (DoFs), 20 for the hand and 3 for the wrist, using a kinematic glove for ground-truth labeling. Our method employs lightweight encoder-decoder architectures with multi-task learning to simultaneously estimate hand and wrist joint angles. Experimental results under realistic sensor repositioning conditions demonstrate that EMG-US fusion achieves a root mean squared error of $10.6^\circ\pm2.0^\circ$, compared to $12.0^\circ\pm1^\circ$ for EMG and $13.1^\circ\pm2.6^\circ$ for US, and a R$^2$ score of $0.61\pm0.1$, with $0.54\pm0.03$ for EMG and $0.38\pm0.20$ for US.
Reducing Discomfort in Driving Simulators: Motion Cueing for Motion Sickness Mitigation
Driving simulators are increasingly used in research and development. However, simulators often cause motion sickness due to downscaled motion and unscaled veridical visuals. In this paper, a motion cueing algorithm is proposed that reduces motion sickness as predicted by the subjective vertical conflict (SVC) model using model predictive control (MPC). Both sensory conflict and specific force errors are penalised in the cost function, allowing the algorithm to jointly optimise fidelity and comfort. Human-in-the-loop experiments were conducted to compare four simulator motion settings: two variations of our MPC-based algorithm, one focused on pure specific force tracking and the second compromising specific force tracking and motion sickness minimisation, as well as reference adaptive washout and no motion cases. The experiments were performed on a hexapod driving simulator with participants exposed to passive driving. Experimental motion sickness results closely matched the sickness model predictions. As predicted by the model, the no motion condition yielded the lowest sickness levels. However, it was rated lowest in terms of fidelity. The compromise solution reduced sickness by over 50% (average MISC level 3 to 1.5) compared to adaptive washout and the algorithm focusing on specific force tracking, without any significant reduction in fidelity rating. The proposed approach for developing MCA that takes into account both the simulator dynamics and time evolution of motion sickness offers a significant advancement in achieving an optimal control of motion sickness and specific force recreation in driving simulators, supporting broader simulator use.
SPARC: Spine with Prismatic and Revolute Compliance for Quadruped Robot
We present SPARC, a compact, open-source 3-DoF sagittal-plane spine module that combines revolute (pitch) and prismatic (axial) motion with programmable task-space impedance for quadruped robots. The system integrates three torque-controlled actuators, a custom 1 kHz control board, and a protected power unit in a 1.26 kg package, enabling closed-loop stiffness and damping shaping along x, z, and theta. We develop an RNEA-based computed-acceleration controller with smooth Stribeck friction compensation to render spring-damper behavior without explicit inertia shaping. Bench experiments validate the approach. Quasi-static push-pull tests show linear force-displacement characteristics with commanded horizontal stiffness spanning 300-700 N/m and <= 1.5% relative error (R^2 >= 0.992, narrow 95% CIs). Dynamic displace-and-release trials confirm mass-spring-damper responses over multiple damping settings, with small, interpretable phase deviations due to configuration-dependent inertia and low-speed friction effects. A task-space PD controller produces roughly linear stiffness but with greater variability and coupling sensitivity. SPARC provides a portable platform for systematic studies of spine compliance in legged locomotion and will be released with complete hardware and firmware resources.
A TSO-DSO Coordination Framework via Analytical Representation and Monetization of PQV-Based Distribution System Flexibility
As the role of distribution system (DS) flexibility in transmission system operator (TSO) network management becomes increasingly vital, data privacy concerns hinder seamless interoperability. The notion of the feasible operating region (FOR), defined in the PQ domain, has emerged as a promising privacy-preserving approach. However, effectively leveraging FOR in TSO operations remains challenging due to three key factors: its accurate determination in large-scale, meshed DS networks; its tractable analytical representation; and its economic valuation. In the present paper, we propose a novel AC optimal power flow (OPF)-based method to construct a three-dimensional PQV-FOR, explicitly accounting for voltage variability and diverse flexibility-providing unit (FPU) characteristics. The construction process employs a two-stage sampling strategy that combines bounding box projection and Fibonacci direction techniques to efficiently capture the FOR. We then introduce an implicit polynomial fitting approach to analytically represent the FOR. Furthermore, we derive a quadratic cost function over the PQV domain to monetize the FOR. Thus, the proposed framework enables single-round TSO-DSO coordination: the DSO provides an analytical FOR and cost model; the TSO determines operating point at the point of common coupling (PCC) within the FOR-based AC-OPF; and the DSO computes FPU dispatch by solving its local OPF, without computationally intensive disaggregation or iterative coordination. Case studies on meshed DS with up to 533 buses, integrated into TS, demonstrates the method's efficiency compared to standard AC-OPF. On average, the proposed approach yields negligible cost deviations of at most 0.058% across test cases, while reducing computation times by up to 58.11%.
Robust MPC for Large-scale Linear Systems
State-of-the-art approaches of Robust Model Predictive Control (MPC) are restricted to linear systems of relatively small scale, i.e., with no more than about 5 states. The main reason is the computational burden of determining a robust positively invariant (RPI) set, whose complexity suffers from the curse of dimensionality. The recently proposed approach of Deadbeat Robust Model Predictive Control (DRMPC) is the first that does not rely on an RPI set. Yet it comes with the full set of essential system theoretic guarantees. DRMPC is hence a viable option, in particular, for large-scale systems. This paper introduces a detailed design procedure for DRMPC. It is shown that the optimal control problem generated for DRMPC has exactly the same computational complexity as Nominal MPC. A numerical study validates its applicability to randomly generated large-scale linear systems of various dimensions.
Dual-Mode Magnetic Continuum Robot for Targeted Drug Delivery ICRA 2026
Magnetic continuum robots (MCRs) enable minimally invasive navigation through tortuous anatomical channels, yet axially magnetized designs have largely been limited to bending-only motion. To expand deformation capabilities, this paper presents a simple assembly that embeds permanent magnets radially within the catheter wall, allowing a single externally steered permanent magnet to independently induce either bending or torsion. A physics-based formulation together with finite-element analysis establishes the actuation principles, and benchtop experiments validate decoupled mode control under practical fields. Building on this, a dual-layer blockage mechanism consisting of outer grooves and inner plates leverages torsional shear to achieve on-demand drug release. Finally, an in-phantom intervention experiment demonstrates end-to-end operation: lumen following by bending for target approach, followed by twist-activated release at the site. The resulting compact, cable-free platform combines versatile deformation with precise payload delivery, indicating strong potential for next-generation, site-specific therapies.
comment: 7 pages, 3 figures, under review of ICRA 2026
Geometric Backstepping Control of Omnidirectional Tiltrotors Incorporating Servo-Rotor Dynamics for Robustness against Sudden Disturbances
This work presents a geometric backstepping controller for a variable-tilt omnidirectional multirotor that explicitly accounts for both servo and rotor dynamics. Considering actuator dynamics is essential for more effective and reliable operation, particularly during aggressive flight maneuvers or recovery from sudden disturbances. While prior studies have investigated actuator-aware control for conventional and fixed-tilt multirotors, these approaches rely on linear relationships between actuator input and wrench, which cannot capture the nonlinearities induced by variable tilt angles. In this work, we exploit the cascade structure between the rigid-body dynamics of the multirotor and its nonlinear actuator dynamics to design the proposed backstepping controller and establish exponential stability of the overall system. Furthermore, we reveal parametric uncertainty in the actuator model through experiments, and we demonstrate that the proposed controller remains robust against such uncertainty. The controller was compared against a baseline that does not account for actuator dynamics across three experimental scenarios: fast translational tracking, rapid rotational tracking, and recovery from sudden disturbance. The proposed method consistently achieved better tracking performance, and notably, while the baseline diverged and crashed during the fastest translational trajectory tracking and the recovery experiment, the proposed controller maintained stability and successfully completed the tasks, thereby demonstrating its effectiveness.
Stability and Robustness of Time-Varying Opinion Dynamics: A Graph-Theoretic Approach
We study the stability of opinion dynamics in the time-varying Friedkin-Johnsen (TVFJ) model, which captures both persistent individual biases and adaptive social influence. We introduce two temporal structures, defected temporal graphs (DTGs) and weakly defected temporal graphs (WDTGs), that serve as graph-theoretic certificates linking stubborn influence and temporal connectivity to contraction of the state-transition matrix. Using these tools, we prove asymptotic stability of TVFJ dynamics under infinitely recurring DTGs, exponential stability in semi-periodic defected networks, and asymptotic stability of a trust-based extension under the weaker condition of recurring WDTGs. We also establish boundedness of the omega-limit set, showing that long-run opinions remain within the convex hull of innate beliefs, and characterize the limit set for periodically switching systems via a p-LTI decomposition with the tight bound that the size of the omega-limit set is at most p. Finally, we show that exponential stability persists under bounded perturbations, ensuring robustness in noisy or imperfect networks. These results unify algebraic contraction tests with interpretable graph-based reasoning, providing scalable and resilient tools for analyzing opinion formation in evolving social and human-AI networks.
Bi-Virus SIS Epidemic Propagation under Mutation and Game-theoretic Protection Adoption
We study a bi-virus susceptible-infected-susceptible (SIS) epidemic model in which individuals are either susceptible or infected with one of two virus strains, and consider mutation-driven transitions between strains. The general case of bi-directional mutation is first analyzed, where we characterize the disease-free equilibrium and establish its global asymptotic stability, as well as the existence, uniqueness, and stability of an endemic equilibrium. We then present a game-theoretic framework where susceptible individuals strategically choose whether to adopt protection or remain unprotected, to maximize their instantaneous payoffs. We derive Nash strategies under bi-directional mutation, and subsequently consider the special case of unidirectional mutation. In the latter case, we show that coexistence of both strains is impossible when mutation occurs from the strain with lower reproduction number and transmission rate to the other strain. Furthermore, we fully characterize the stationary Nash equilibrium (SNE) in the setting permitting coexistence, and examine how mutation rates influence protection adoption and infection prevalence at the SNE. Numerical simulations corroborate the analytical results, demonstrating that infection levels decrease monotonically with higher protection adoption, and highlight the impact of mutation rates and protection cost on infection state trajectories.
A Scalable Design Approach to Resilient Architectures for Interconnected Cyber-Physical Systems: Safety Guarantees under Multiple Attacks
Complex, interconnected cyber-physical systems (CPS) are increasingly prevalent in domains such as power systems. Cyber-resilient architectures have been proposed to recover compromised cyber components of CPS. Recent works have studied tuning the recovery times of such architectures to guarantee safety in single-system settings. Extending these designs to interconnected CPS is more challenging, since solutions must account for attacks on multiple subsystems that can occur in any order and potentially infinite possible temporal overlap. This paper aims to address the aforementioned challenge by developing a scalable framework to assign resilient architectures and to inform the tuning of their recovery times. Our approach introduces a scalar index that quantifies the impact of each subsystem on safety under compromised input. These indices aggregate linearly across subsystems, enabling scalable analysis under arbitrary attack orderings and temporal overlaps. We establish a linear inequality relating each subsystem's index and recovery time that guarantees safety and guides resilient architecture assignment. We also propose a segmentation-based approach to strengthen the previously derived conditions. We then present algorithms to compute the proposed indices and to find a cost-optimal architecture assignment with a safety guarantee. We validate the framework through a case study on temperature regulation in interconnected rooms under different attack scenarios.
Efficient Optimal Path Planning in Dynamic Environments Using Koopman MPC
This paper presents a data-driven model predictive control framework for mobile robots navigating in dynamic environments, leveraging Koopman operator theory. Unlike the conventional Koopman-based approaches that focus on the linearization of system dynamics only, our work focuses on finding a global linear representation for the optimal path planning problem that includes both the nonlinear robot dynamics and collision-avoidance constraints. We deploy extended dynamic mode decomposition to identify linear and bilinear Koopman realizations from input-state data. Our open-loop analysis demonstrates that only the bilinear Koopman model can accurately capture nonlinear state-input couplings and quadratic terms essential for collision avoidance, whereas linear realizations fail to do so. We formulate a quadratic program for the robot path planning in the presence of moving obstacles in the lifted space and determine the optimal robot action in an MPC framework. Our approach is capable of finding the safe optimal action 320 times faster than a nonlinear MPC counterpart that solves the path planning problem in the original state space. Our work highlights the potential of bilinear Koopman realizations for linearization of highly nonlinear optimal control problems subject to nonlinear state and input constraints to achieve computational efficiency similar to linear problems.
comment: This work has been submitted to the ACC2026 conference
Bridging the Prediction Error Method and Subspace Identification: A Weighted Null Space Fitting Method
Subspace identification methods (SIMs) have proven to be very useful and numerically robust for building state-space models. While most SIMs are consistent, few if any can achieve the efficiency of the maximum likelihood estimate (MLE). Conversely, the prediction error method (PEM) with a quadratic criteria is equivalent to MLE, but it comes with non-convex optimization problems and requires good initialization points. This contribution proposes a weighted null space fitting (WNSF) approach for estimating state-space models, combining some key advantages of the two aforementioned mainstream approaches. It starts with a least-squares estimate of a high-order ARX model, and then a multi-step least-squares procedure reduces the model to a state-space model on canoncial form. It is demonstrated through statistical analysis that when a canonical parameterization is admissible, the proposed method is consistent and asymptotically efficient, thereby making progress on the long-standing open problem about the existence of an asymptotically efficient SIM. Numerical and practical examples are provided to illustrate that the proposed method performs favorable in comparison with SIMs.
Self-supervised diffusion model fine-tuning for costate initialization using Markov chain Monte Carlo
Global search and optimization of long-duration, low-thrust spacecraft trajectories with the indirect method is challenging due to a complex solution space and the difficulty of generating good initial guesses for the costate variables. This is particularly true in multibody environments. Given data that reveals a partial Pareto optimal front, it is desirable to find a flexible manner in which the Pareto front can be completed and fronts for related trajectory problems can be found. In this work we use conditional diffusion models to represent the distribution of candidate optimal trajectory solutions. We then introduce into this framework the novel approach of using Markov Chain Monte Carlo algorithms with self-supervised fine-tuning to achieve the aforementioned goals. Specifically, a random walk Metropolis algorithm is employed to propose new data that can be used to fine-tune the diffusion model using a reward-weighted training based on efficient evaluations of constraint violations and missions objective functions. The framework removes the need for separate focused and often tedious data generation phases. Numerical experiments are presented for two problems demonstrating the ability to improve sample quality and explicitly target Pareto optimality based on the theory of Markov chains. The first problem does so for a transfer in the Jupiter-Europa circular restricted three-body problem, where the MCMC approach completes a partial Pareto front. The second problem demonstrates how a dense and superior Pareto front can be generated by the MCMC self-supervised fine-tuning method for a Saturn-Titan transfer starting from the Jupiter-Europa case versus a separate dedicated global search.
A Bilevel Optimization Framework for Adversarial Control of Gas Pipeline Operations
Cyberattacks on pipeline operational technology systems pose growing risks to energy infrastructure. This study develops a physics-informed simulation and optimization framework for analyzing cyber-physical threats in petroleum pipeline networks. The model integrates networked hydraulic dynamics, SCADA-based state estimation, model predictive control (MPC), and a bi-level formulation for stealthy false-data injection (FDI) attacks. Pipeline flow and pressure dynamics are modeled on a directed graph using nodal pressure evolution and edge-based Weymouth-type relations, including control-aware equipment such as valves and compressors. An extended Kalman filter estimates the full network state from partial SCADA telemetry. The controller computes pressure-safe control inputs via MPC under actuator constraints and forecasted demands. Adversarial manipulation is formalized as a bi-level optimization problem where an attacker perturbs sensor data to degrade throughput while remaining undetected by bad-data detectors. This attack-control interaction is solved via Karush-Kuhn-Tucker (KKT) reformulation, which results in a tractable mixed-integer quadratic program. Test gas pipeline case studies demonstrate the covert reduction of service delivery under attack. Results show that undetectable attacks can cause sustained throughput loss with minimal instantaneous deviation. This reveals the need for integrated detection and control strategies in cyber-physical infrastructure.
Situationally Aware Rolling Horizon Multi-Tier Load Restoration Considering Behind-The-Meter DER
Restoration in power distribution systems (PDSs) is well studied, however, most existing research focuses on network partition and microgrid formation, where load transfer is limited to adjacent feeders. This focus is not practical, as when adjacent feeders lack sufficient capacity, utilities may request support from more distant feeders in practice. Such a hirarchical restoration is complex, especially when involving changing system conditions due to cold load pickup and delayed reconnection of behind-the-meter DERs. To fill this research gap, a situationally aware multi-tier load restoration framework is proposed. Specifically, models are proposed to describe the multi-tier load restoration, including the multi-tier load transfer and substation transformer and feeder protection models. By introducing binary actional switching variables and load block transfer variables, the models effectively captures the dynamics of switches and multi-tier transfer process. To integrate situational awareness of evolving system conditions, the problem is formulated as a mixed-integer linear program (MILP) and then embedded within a rolling horizon optimization. Particularly, a set of safeguarded constraints are developed based on segment-level restoration reward bounds to mitigate the myopia of traditional rolling horizon optimization. The proposed safeguarded rolling strategy guarantees that each time step is lower bounded by a $(1-\varepsilon)$-fraction of its optimal restoration potential, thereby balancing short-term switching decisions with long-term restoration goals. Finally, cases studies on the modified IEEE 123-node test feeder validate the proposed multi-tier restoration framework.
Power Distribution System Blackstart Restoration Using Renewable Energy
Integrating renewable energy sources into the grid not only reduces global carbon emissions, but also facilitates distribution system (DS) blackstart restoration. This process leverages renewable energy, inverters, situational awareness and distribution automation to initiate blackstart at the DS level, obtaining a fast response and bottom-up restoration. In this Review, we survey the latest technological advances for DS blackstart restoration using renewable energy. We first present mathematical models for distributed energy resources (DERs), network topology, and load dynamics. We then discuss how the situational awareness can help improve restoration performance through real-time monitoring and forecasting. Next, the DS blackstart restoration problem, including objectives, constraints, and existing methodologies for decision-making are provided. Lastly, we outline remaining challenges, and highlight the opportunities and future research directions.
Improved Robustness of Deep Reinforcement Learning for Control of Time-Varying Systems by Bounded Extremum Seeking
In this paper, we study the use of robust model independent bounded extremum seeking (ES) feedback control to improve the robustness of deep reinforcement learning (DRL) controllers for a class of nonlinear time-varying systems. DRL has the potential to learn from large datasets to quickly control or optimize the outputs of many-parameter systems, but its performance degrades catastrophically when the system model changes rapidly over time. Bounded ES can handle time-varying systems with unknown control directions, but its convergence speed slows down as the number of tuned parameters increases and, like all local adaptive methods, it can get stuck in local minima. We demonstrate that together, DRL and bounded ES result in a hybrid controller whose performance exceeds the sum of its parts with DRL taking advantage of historical data to learn how to quickly control a many-parameter system to a desired setpoint while bounded ES ensures its robustness to time variations. We present a numerical study of a general time-varying system and a combined ES-DRL controller for automatic tuning of the Low Energy Beam Transport section at the Los Alamos Neutron Science Center linear particle accelerator.
Data-Driven Stochastic Distribution System Hardening Based on Bayesian Online Learning
Extreme weather frequently cause widespread outages in distribution systems (DSs), demonstrating the importance of hardening strategies for resilience enhancement. However, the well-utilization of real-world outage data with associated weather conditions to make informed hardening decisions in DSs is still an open issue. To bridge this research gap, this paper proposes a data-driven stochastic distribution line (DL) hardening strategy. First, a deep neural network (DNN) regression model is developed to predict the probabilistic evolution of outage scenarios under various hardening decisions. Based on the DNN predictions, the problem is formulated as a decision-dependent distributionally robust optimization (DRO) model, accounting for uncertainties in outage scenario distributions using a data-driven ambiguity set. To address decision-dependent uncertainty, a Bayesian online learning algorithm is proposed. This algorithm decomposes the original problem into inner and outer problems. Then, it iteratively refines hardening decisions by sequentially incorporating outage data and dynamically updating decision-specific ambiguity sets by using Bayes' theorem and Bayesian Inference. Also, the convergence of the algorithm is proven through dynamic regret analysis. Finally, case studies are implemented on a real-world DS in Redfield, Iowa, USA. A dataset spanning 24 years (2001-2024) is constructed based on the utility outage records. The simulation results validates the effectiveness of the proposed strategy.
On Architectures for Combining Reinforcement Learning and Model Predictive Control with Runtime Improvements
Model Predictive Control (MPC) faces computational demands and performance degradation from model inaccuracies. We propose two architectures combining Neural Network-approximated MPC (NNMPC) with Reinforcement Learning (RL). The first, Warm Start RL, initializes the RL actor with pre-trained NNMPC weights. The second, RLMPC, uses RL to generate corrective residuals for NNMPC outputs. We introduce a downsampling method reducing NNMPC input dimensions while maintaining performance. Evaluated on a rotary inverted pendulum, both architectures demonstrate runtime reductions exceeding 99% compared to traditional MPC while improving tracking performance under model uncertainties, with RL+MPC achieving 11-40% cost reduction depending on reference amplitude.
comment: Accepted at the 2025 IFAC Conference on Modeling, Estimation, and Control of Systems (MECC 2025), Pittsburgh, USA
Cyber-physical WebAssembly: Secure Hardware Interfaces and Pluggable Drivers
The rapid expansion of Internet of Things (IoT), edge, and embedded devices in the past decade has introduced numerous challenges in terms of security and configuration management. Simultaneously, advances in cloud-native development practices have greatly enhanced the development experience and facilitated quicker updates, thereby enhancing application security. However, applying these advances to IoT, edge, and embedded devices remains a complex task, primarily due to the heterogeneous environments and the need to support devices with extended lifespans. WebAssembly and the WebAssembly System Interface (WASI) has emerged as a promising technology to bridge this gap. As WebAssembly becomes more popular on IoT, edge, and embedded devices, there is a growing demand for hardware interface support in WebAssembly programs. This work presents WASI proposals and proof-of-concept implementations to enable hardware interaction with I2C and USB, which are two commonly used protocols in IoT, directly from WebAssembly applications. This is achieved by running the device drivers within WebAssembly as well. A thorough evaluation of the proof of concepts shows that WASI-USB introduces a minimal overhead of at most 8% compared to native operating system USB APIs. However, the results show that runtime initialization overhead can be significant in low-latency applications.
comment: Accepted article of the IEEE/IFIP Network Operations and Management Symposium 2025 (NOMS 2025)
Time-o1: Time-Series Forecasting Needs Transformed Label Alignment NeurIPS 2025
Training time-series forecast models presents unique challenges in designing effective learning objectives. Existing methods predominantly utilize the temporal mean squared error, which faces two critical challenges: (1) label autocorrelation, which leads to bias from the label sequence likelihood; (2) excessive amount of tasks, which increases with the forecast horizon and complicates optimization. To address these challenges, we propose Time-o1, a transformation-augmented learning objective tailored for time-series forecasting. The central idea is to transform the label sequence into decorrelated components with discriminated significance. Models are then trained to align the most significant components, thereby effectively mitigating label autocorrelation and reducing task amount. Extensive experiments demonstrate that Time-o1 achieves state-of-the-art performance and is compatible with various forecast models. Code is available at https://github.com/Master-PLC/Time-o1.
comment: Accepted as poster in NeurIPS 2025
AdGT: Decentralized Gradient Tracking with Tuning-free Per-Agent Stepsize
In decentralized optimization, the choice of stepsize plays a critical role in algorithm performance. A common approach is to use a shared stepsize across all agents to ensure convergence. However, selecting an optimal stepsize often requires careful tuning, which can be time-consuming and may lead to slow convergence, especially when there is significant variation in the smoothness (L-smoothness) of local objective functions across agents. Individually tuning stepsizes per agent is also impractical, particularly in large-scale networks. To address these limitations, we propose AdGT, an adaptive gradient tracking method that enables each agent to adjust its stepsize based on the smoothness of its local objective. We prove that AdGT achieves linear convergence to the global optimal solution. Through numerical experiments, we compare AdGT with fixed-stepsize gradient tracking methods and demonstrate its superior performance. Additionally, we compare AdGT with adaptive gradient descent (AdGD) in a centralized setting and observe that fully adaptive stepsizes offer greater benefits in decentralized networks than in centralized ones.
Robust Capacity Expansion Modelling for Renewable Energy Systems
Future greenhouse gas neutral energy systems will be dominated by renewable energy technologies whose energy output and utilisation is subject to uncertain weather conditions. This work proposes an algorithm for capacity expansion planning if only uncertain data is available for a year's operative parameters. When faced with multiple possible operating years, the quality of a solution derived on a single operating year's data is evaluated for all years, and the optimisation problem is iteratively modified whenever supply gaps are detected. These modifications lead to solutions with sufficient back-up capacity to overcome periods of cold dark lulls, and sufficient total annual energy supply across all years. A computational study on an energy system model of Germany for 40 different operating years shows that the iterative algorithm finds solutions that guarantee security of supply for all considered years increasing the total annual cost by 1.6-2.9% compared to a lower bound. Results also underline the importance of assessing the feasibility of energy system models using atypical time-series, combining dark lull and cold period effects.
Learning Low-Dimensional Embeddings for Black-Box Optimization
When gradient-based methods are impractical, black-box optimization (BBO) provides a valuable alternative. However, BBO often struggles with high-dimensional problems and limited trial budgets. In this work, we propose a novel approach based on meta-learning to pre-compute a reduced-dimensional manifold where optimal points lie for a specific class of optimization problems. When optimizing a new problem instance sampled from the class, black-box optimization is carried out in the reduced-dimensional space, effectively reducing the effort required for finding near-optimal solutions.
A Fractional-Order Nonlinear Backstepping Controller Design for Current-Controlled Maglev System
The magnetic levitation system (Maglev) is a nonlinear system by which an object is suspended with no support other than magnetic fields. The main control perspective of the Maglev system is to levitate a steel ball in air by the electromagnetic force. However, the Maglev system has highly nonlinear dynamics which is inconvenient in the sense of sensitive control/regulation of its nonlinear dynamics. In this paper, the nonlinear backstepping controller based on the fractional-order derivative is proposed for the control of the nonlinear current-controlled Maglev system. After, the system dynamics and fractional-order backstepping controller design are given, the asymptotic stability of the closed-loop system is proved by employing the Lyapunov theory. Some computer-based numerical experiments are carried out to show the effectiveness of the proposed controller for the control of Maglev system.
comment: Accepted at Proceedings of the 28th International Scientific Conference Mechanika 2024
A Novel State-Centric Necessary Condition for Time-Optimal Control of Controllable Linear Systems Based on Augmented Switching Laws (Extended Version)
Most existing necessary conditions for optimal control based on adjoining methods require both state and costate information, yet the unobservability of costates for a given feasible trajectory impedes the determination of optimality in practice. This paper establishes a novel theoretical framework for time-optimal control of controllable linear systems with a single input, proposing the augmented switching law (ASL) that represents the input control and the feasibility in a compact form. Given a feasible trajectory, the perturbed trajectory under the constraints of ASL is guaranteed to be feasible, resulting in a novel state-centric necessary condition without dependence on costate information. A first-order necessary condition is proposed that the Jacobian matrix of the ASL is not of full row rank, which also results in a potential approach to optimizing a given feasible trajectory with the preservation of arc structures. The proposed necessary condition is applied to high-order chain-of-integrator systems with full box constraints, contributing to some theoretical results challenging to reason by costate-based conditions.
comment: This paper has been accepted by IEEE TAC
Neural Network-based Co-design of Output-Feedback Control Barrier Function and Observer
Control Barrier Functions (CBFs) provide a powerful framework for ensuring safety in dynamical systems. However, their application typically relies on full state information, which is often violated in real-world scenarios due to the availability of partial state information. In this work, we propose a neural network-based framework for the co-design of a safety controller, observer, and CBF for partially observed continuous-time systems. By formulating barrier conditions over an augmented state space, our approach ensures safety without requiring bounded estimation errors or handcrafted barrier functions. All components are jointly trained by formulating appropriate loss functions, and we introduce a validity condition to provide formal safety guarantees beyond the training data. Finally, we demonstrate the effectiveness of the proposed approach through several case studies.
comment: There were errors in paper (introduction section and notations)
Verifiable Mission Planning For Space Operations
Spacecraft must operate under environmental and actuator uncertainties while meeting strict safety requirements. Traditional approaches rely on scenario-based heuristics that fail to account for stochastic influences, leading to suboptimal or unsafe plans. We propose a finite-horizon, chance-constrained Markov decision process for mission planning, where states represent mission and vehicle parameters, actions correspond to operational adjustments, and temporal logic specifications encode operational constraints. We synthesize policies that optimize mission objectives while ensuring constraints are met with high probability. Applied to the GRACE-FO mission, the approach accounts for stochastic solar activity and uncertain thrust performance, yielding maneuver schedules that maximize scientific return and provably satisfy safety requirements. This work demonstrates how Markov decision processes can be applied to space missions, enabling autonomous operation with formal guarantees.
comment: Submitted to the 2025 AAS/AIAA Astrodynamics Specialist Conference
Future pathways for eVTOLs: A design optimization perspective
The rapid development of advanced urban air mobility, particularly electric vertical take-off and landing (eVTOL) aircraft, requires interdisciplinary approaches involving the future urban air mobility ecosystem. Operational cost efficiency, regulatory aspects, sustainability, and environmental compatibility should be incorporated directly into the conceptual design of aircraft and across operational and regulatory strategies. In this work, we apply a novel multidisciplinary design optimization framework for the conceptual design of eVTOL aircraft. The framework optimizes conventional design elements of eVTOL aircraft over a generic mission and integrates a comprehensive operational cost model to directly capture economic incentives of the designed system through profit modeling for operators. We introduce a novel metric, the cross-transportation Figure of Merit (FoM), to compare the optimized eVTOL system with various competing road, rail, and air transportation modes in terms of sustainability, cost, and travel time. We investigate four objective-specific eVTOL optimization designs in a broad scenario space, mapping regulatory, technical, and operational constraints to generate a representation of potential urban air mobility stakeholder-centric design objectives. The analysis of an optimized profit-maximizing eVTOL, cost-minimizing eVTOL, sustainability-maximizing eVTOL, and a combined FoM-maximizing eVTOL design highlights significant trade-offs in the area of profitability, operational flexibility, and sustainability strategies. This underlines the importance of incorporating multiple operationally tangential disciplines into the design process, while also reflecting the diverse priorities of stakeholders such as operators, regulators, and society.
comment: 26 pages, 8 figures
Optimal Modified Feedback Strategies in LQ Games under Control Imperfections
Game-theoretic approaches and Nash equilibrium have been widely applied across various engineering domains. However, practical challenges such as disturbances, delays, and actuator limitations can hinder the precise execution of Nash equilibrium strategies. This work investigates the impact of such implementation imperfections on game trajectories and players' costs in the context of a two-player finite-horizon linear quadratic (LQ) nonzero-sum game. Specifically, we analyze how small deviations by one player, measured or estimated at each stage, affect the state and cost function of the other player. To mitigate these effects, we propose an adjusted control policy that optimally compensates for the deviations under the stated information structure and can, under certain conditions, exploit them to improve performance. Rigorous mathematical analysis and proofs are provided, and the effectiveness of the proposed method is demonstrated through a representative numerical example.
comment: 6 pages, 2 figures, Preprint version of a paper submitted to ACC 2026
Koopman-Nemytskii Operator: A Linear Representation of Nonlinear Controlled Systems
While Koopman operator lifts a nonlinear system into an infinite-dimensional function space and represents it as a linear dynamics, its definition is restricted to autonomous systems, i.e., does not incorporate inputs or disturbances. To the end of designing state-feedback controllers, the existing extensions of Koopman operator, which only account for the effect of open-loop values of inputs, does not involve feedback laws on closed-loop systems. Hence, in order to generically represent any nonlinear controlled dynamics linearly, this paper proposes a Koopman-Nemytskii operator, defined as a linear mapping from a product reproducing kernel Hilbert space (RKHS) of states and feedback laws to an RKHS of states. Using the equivalence between RKHS and Sobolev-Hilbert spaces under certain regularity conditions on the dynamics and kernel selection, this operator is well-defined. Its data-based approximation, which follows a kernel extended dynamic mode decomposition (kernel EDMD) approach, have established errors in single-step and multi-step state predictions as well as accumulated cost under control.
comment: 19 pages, 9 figures, submitted to IEEE Transactions on Automatic Control after revision on 10/2/2025
Planning Stealthy Backdoor Attacks in MDPs with Observation-Based Triggers
This paper investigates backdoor attack planning in stochastic control systems modeled as Markov Decision Processes (MDPs). In a backdoor attack, the adversary provides a control policy that behaves well in the original MDP to pass the testing phase. However, when such a policy is deployed with a trigger policy, which perturbs the system dynamics at runtime, it optimizes the attacker's objective instead. To solve jointly the control policy and its trigger, we formulate the attack planning problem as a constrained optimal planning problem in an MDP with augmented state space, with the objective to maximize the attacker's total rewards in the system with an activated trigger, subject to the constraint that the control policy is near optimal in the original MDP. We then introduce a gradient-based optimization method to solve the optimal backdoor attack policy as a pair of coordinated control and trigger policies. Experimental results from a case study validate the effectiveness of our approach in achieving stealthy backdoor attacks.
Active Alignments of Lens Systems with Reinforcement Learning
Aligning a lens system relative to an imager is a critical challenge in camera manufacturing. While optimal alignment can be mathematically computed under ideal conditions, real-world deviations caused by manufacturing tolerances often render this approach impractical. Measuring these tolerances can be costly or even infeasible, and neglecting them may result in suboptimal alignments. We propose a reinforcement learning (RL) approach that learns exclusively in the pixel space of the sensor output, eliminating the need to develop expert-designed alignment concepts. We conduct an extensive benchmark study and show that our approach surpasses other methods in speed, precision, and robustness. We further introduce relign, a realistic, freely explorable, open-source simulation utilizing physically based rendering that models optical systems with non-deterministic manufacturing tolerances and noise in robotic alignment movement. It provides an interface to popular machine learning frameworks, enabling seamless experimentation and development. Our work highlights the potential of RL in a manufacturing environment to enhance efficiency of optical alignments while minimizing the need for manual intervention.
comment: This work has been submitted to the IEEE for possible publication
Neural Parameter-varying Data-enabled Predictive Control of Cold Atmospheric Pressure Plasma Jets
Cold Atmospheric Pressure Plasma Jets (APPJs) show significant potential for biomedical applications, but their inherent complexity, characterized by nonlinear dynamics and strong sensitivity to operating conditions like tip-to-surface distance, presents challenges for real-time control. This paper introduces the Neural Parameter-varying Data-enabled Predictive Control (NPV-DeePC) framework to address these issues. By integrating hypernetworks into the neural DeePC paradigm, NPV-DeePC adaptively captures system nonlinearities and parameter variations, dynamically adjusts the neural network's learned representation of the system, enabling accurate multi-step trajectory prediction and control. Simulation studies on surface temperature tracking and thermal dose delivery demonstrate that NPV-DeePC achieves higher accuracy and adaptability than existing controllers. Moreover, its computational efficiency supports real-time implementation, making it a practical approach for precise APPJ control and a generalizable solution for other nonlinear, parameter-varying systems.
Introduction to Online Control
This text presents an introduction to an emerging paradigm in control of dynamical systems and differentiable reinforcement learning called online nonstochastic control. The new approach applies techniques from online convex optimization and convex relaxations to obtain new methods with provable guarantees for classical settings in optimal and robust control. The primary distinction between online nonstochastic control and other frameworks is the objective. In optimal control, robust control, and other control methodologies that assume stochastic noise, the goal is to perform comparably to an offline optimal strategy. In online nonstochastic control, both the cost functions as well as the perturbations from the assumed dynamical model are chosen by an adversary. Thus the optimal policy is not defined a priori. Rather, the target is to attain low regret against the best policy in hindsight from a benchmark class of policies. This objective suggests the use of the decision making framework of online convex optimization as an algorithmic methodology. The resulting methods are based on iterative mathematical optimization algorithms, and are accompanied by finite-time regret and computational complexity guarantees.
comment: Draft; comments/suggestions welcome at nonstochastic.control@gmail.com
Systems and Control (EESS)
Game-theoretic Social Distancing in Competitive Bi-Virus SIS Epidemics
Numerous elements drive the spread of infectious diseases in complex real-world networks. Of particular interest is social behaviors that evolve in tandem with the spread of disease. Moreover, recent studies highlight the importance of understanding how multiple strains spread simultaneously through a population (e.g. Delta and Omicron variants of SARS-CoV-2). In this paper, we propose a bi-virus SIS epidemic model coupled with a game-theoretic social distancing behavior model. The behaviors are governed by replicator equations from evolutionary game theory. The prevalence of each strain impacts the choice of an individual to social distance, and, in turn, their behavior affects the spread of each virus in the SIS model. Our analysis identifies equilibria of the system and their local stability properties, which reveal several isolated fixed points with varying levels of social distancing. We find that endemic co-existence is possible only when the reproduction numbers of both strains are equal. Assuming the reproduction number for each virus is the same, we identify suitable parameter regimes that give rise to lines of coexistence equilibria. Moreover, we also identify conditions for local exponential stability of said lines of equilibria. We illustrate our findings with several numerical simulations.
Computing Control Lyapunov-Barrier Functions: Softmax Relaxation and Smooth Patching with Formal Guarantees
We present a computational framework for synthesizing a single smooth Lyapunov function that certifies both asymptotic stability and safety. We show that the existence of a strictly compatible pair of control barrier and control Lyapunov functions (CBF-CLF) guarantees the existence of such a function on the exact safe set certified by the barrier. To maximize the certifiable safe domain while retaining differentiability, we employ a log-sum-exp (softmax) relaxation of the nonsmooth maximum barrier, together with a counterexample-guided refinement that inserts half-space cuts until a strict barrier condition is verifiable. We then patch the softmax barrier with a CLF via an explicit smooth bump construction, which is always feasible under the strict compatibility condition. All conditions are formally verified using a satisfiability modulo theories (SMT) solver, enabled by a reformulation of Farkas' lemma for encoding strict compatibility. On benchmark systems, including a power converter, we show that the certified safe stabilization regions obtained with the proposed approach are often less conservative than those achieved by state-of-the-art sum-of-squares (SOS) compatible CBF-CLF designs.
Detection and Identification of Sensor Attacks Using Data
In this paper, we investigate data-driven attack detection and identification in a model-free setting. Unlike existing studies, we consider the case where the available output data include malicious false-data injections. We aim to detect and identify such attacks solely from the compromised data. We address this problem in two scenarios: (1) when the system operator is aware of the system's sparse observability condition, and (2) when the data are partially clean (i.e., attack-free). In both scenarios, we derive conditions and algorithms for detecting and identifying attacks using only the compromised data. Finally, we demonstrate the effectiveness of the proposed framework via numerical simulations on a three-inertia system.
Product Digital Twin Supporting End-of-life Phase of Electric Vehicle Batteries Utilizing Product-Process-Resource Asset Network
In the context of the circular economy, products in their end-of-life phase should be either remanufactured or recycled. Both of these processes are crucial for sustainability and environmental conservation. However, manufacturers often do not support these processes enough by not sharing relevant data. This paper proposes use of a digital twin technology, which is capable to help optimizing the disassembly processes to reduce ecological impact and enhance sustainability. The proposed approach is demonstrated through a disassembly use-case of the product digital twin of an electric vehicle battery. By utilizing product digital twins, challenges associated with the disassembly of electric vehicle batteries can be solved flexibly and efficiently for various battery types. As a backbone for the product digital twin representation, the paper uses the paradigm of product-process-resource asset networks (PAN). Such networks enable to model relevant relationships across products, production resources, manufacturing processes, and specific production operations that have to be done in the manufacturing phase of a product. This paper introduces a Bi-Flow Product-Process-Resource Asset Network (Bi-PAN) representation, which extends the PAN paradigm to cover not only the manufacturing, but also the remanufacturing/recycling phase.
comment: This work has been submitted to the IEEE for possible publication. 6 pages, 4 figures
On the (almost) Global Exponential Convergence of the Overparameterized Policy Optimization for the LQR Problem
In this work we study the convergence of gradient methods for nonconvex optimization problems -- specifically the effect of the problem formulation to the convergence behavior of the solution of a gradient flow. We show through a simple example that, surprisingly, the gradient flow solution can be exponentially or asymptotically convergent, depending on how the problem is formulated. We then deepen the analysis and show that a policy optimization strategy for the continuous-time linear quadratic regulator (LQR) (which is known to present only asymptotic convergence globally) presents almost global exponential convergence if the problem is overparameterized through a linear feed-forward neural network (LFFNN). We prove this qualitative improvement always happens for a simplified version of the LQR problem and derive explicit convergence rates for the gradient flow. Finally, we show that both the qualitative improvement and the quantitative rate gains persist in the general LQR through numerical simulations.
comment: This version is currently under review for the 2026 IEEE American Control Conference (ACC)
Recurrent Control Barrier Functions: A Path Towards Nonparametric Safety Verification
Ensuring the safety of complex dynamical systems often relies on Hamilton-Jacobi (HJ) Reachability Analysis or Control Barrier Functions (CBFs). Both methods require computing a function that characterizes a safe set that can be made (control) invariant. However, the computational burden of solving high-dimensional partial differential equations (for HJ Reachability) or large-scale semidefinite programs (for CBFs) makes finding such functions challenging. In this paper, we introduce the notion of Recurrent Control Barrier Functions (RCBFs), a novel class of CBFs that leverages a recurrent property of the trajectories, i.e., coming back to a safe set, for safety verification. Under mild assumptions, we show that the RCBF condition holds for the signed-distance function, turning function design into set identification. Notably, the resulting set need not be invariant to certify safety. We further propose a data-driven nonparametric method to compute safe sets that is massively parallelizable and trades off conservativeness against computational cost.
comment: 8 Pages, 3 Figures
Cooperative Guidance for Aerial Defense in Multiagent Systems
This paper addresses a critical aerial defense challenge in contested airspace, involving three autonomous aerial vehicles -- a hostile drone (the pursuer), a high-value drone (the evader), and a protective drone (the defender). We present a cooperative guidance framework for the evader-defender team that guarantees interception of the pursuer before it can capture the evader, even under highly dynamic and uncertain engagement conditions. Unlike traditional heuristic, optimal control, or differential game-based methods, we approach the problem within a time-constrained guidance framework, leveraging true proportional navigation based approach that ensures robust and guaranteed solutions to the aerial defense problem. The proposed strategy is computationally lightweight, scalable to a large number of agent configurations, and does not require knowledge of the pursuer's strategy or control laws. From arbitrary initial geometries, our method guarantees that key engagement errors are driven to zero within a fixed time, leading to a successful mission. Extensive simulations across diverse and adversarial scenarios confirm the effectiveness of the proposed strategy and its relevance for real-time autonomous defense in contested airspace environments.
Event-triggered control and communication for single-master multi-slave teleoperation systems with Try-Once-Discard protocol
Single-master multi-slave (SMMS) teleoperation systems can perform multiple tasks remotely in a shorter time, cover large-scale areas, and adapt more easily to single-point failures, thereby effectively encompassing a broader range of applications. As the number of slave manipulators sharing a communication network increases, the limitation of communication bandwidth becomes critical. To alleviate bandwidth usage, the Try-Once-Discard (TOD) scheduling protocol and event-triggered mechanisms are often employed separately. In this paper, we combine both strategies to optimize network bandwidth and energy consumption for SMMS teleoperation systems. Specifically, we propose event-triggered control and communication schemes for a class of SMMS teleoperation systems using the TOD scheduling protocol. Considering dynamic uncertainties, the unavailability of relative velocities, and time-varying delays, we develop adaptive controllers with virtual observers based on event-triggered schemes to achieve master-slave synchronization. Stability criteria for the SMMS teleoperation systems under these event-triggered control and communication schemes are established, demonstrating that Zeno behavior is excluded. Finally, experiments are conducted to validate the effectiveness of the proposed algorithms.
LLM-Enhanced, Data-Driven Personalized and Equitable Clinician Scheduling: A Predict-then-Optimize Approach ICDM 2025
Clinician scheduling remains a persistent challenge due to limited clinical resources and fluctuating demands. This complexity is especially acute in large academic anesthesiology departments as physicians balance responsibilities across multiple clinical sites with conflicting priorities. Further, scheduling must account for individual clinical and lifestyle preferences to ensure job satisfaction and well-being. Traditional approaches, often based on statistical or rule-based optimization models, rely on structured data and explicit domain knowledge. However, these methods often overlook unstructured information, e.g., free-text notes from routinely administered clinician well-being surveys and scheduling platforms. These notes may reveal implicit and underutilized clinical resources. Neglecting such information can lead to misaligned schedules, increased burnout, overlooked staffing flexibility, and suboptimal utilization of available resources. To address this gap, we propose a predict-then-optimize framework that integrates classification-based clinician availability predictions with a mixed-integer programming schedule optimization model. Large language models (LLMs) are employed to extract actionable preferences and implicit constraints from unstructured schedule notes, enhancing the reliability of availability predictions. These predictions then inform the schedule optimization considering four objectives: first, ensuring clinical full-time equivalent compliance, second, reducing workload imbalances by enforcing equitable proportions of shift types, third, maximizing clinician availability for assigned shifts, and fourth, schedule consistency. By combining the interpretive power of LLMs with the rigor of mathematical optimization, our framework provides a robust, data-driven solution that enhances operational efficiency while supporting equity and clinician well-being.
comment: 10 pages, 5 figures, Accepted to IEEE ICDM 2025 Workshops Proceedings; IEEE Computer Society Press
Coordinated Car-following Using Distributed MPC
Within the modeling framework of Markov games, we propose a series of algorithms for coordinated car-following using distributed model predictive control (DMPC). Instead of tracking prescribed feasible trajectories, driving policies are solved directly as outcomes of the DMPC optimization given the driver's perceivable states. The coordinated solutions are derived using the best response dynamics via iterated self-play, and are facilitated by direct negotiation using inter-agent or agent-infrastructure communication. These solutions closely approximate either Nash equilibrium or centralized optimization. By re-parameterizing the action sequence in DMPC as a curve along the planning horizon, we are able to systematically reduce the original DMPC to very efficient grid searches such that the optimal solution to the original DMPC can be well executed in real-time. Within our modeling framework, it is natural to cast traffic control problems as mechanism design problems, in which all agents are endogenized on an equal footing with full incentive compatibility. We show how traffic efficiency can be dramatically improved while keeping stop-and-go phantom waves tamed at high vehicle densities. Our approach can be viewed as an alternative way to formulate coordinated adaptive cruise control (CACC) without an explicit platooning (or with all vehicles in the traffic system treated as a single extended platoon). We also address the issue of linear stability of the associated discrete-time traffic dynamics and demonstrate why it does not always tell the full story about the traffic stability.
Wearable and Ultra-Low-Power Fusion of EMG and A-Mode US for Hand-Wrist Kinematic Tracking
Hand gesture recognition based on biosignals has shown strong potential for developing intuitive human-machine interaction strategies that closely mimic natural human behavior. In particular, sensor fusion approaches have gained attention for combining complementary information and overcoming the limitations of individual sensing modalities, thereby enabling more robust and reliable systems. Among them, the fusion of surface electromyography (EMG) and A-mode ultrasound (US) is very promising. However, prior solutions rely on power-hungry platforms unsuitable for multi-day use and are limited to discrete gesture classification. In this work, we present an ultra-low-power (sub-50 mW) system for concurrent acquisition of 8-channel EMG and 4-channel A-mode US signals, integrating two state-of-the-art platforms into fully wearable, dry-contact armbands. We propose a framework for continuous tracking of 23 degrees of freedom (DoFs), 20 for the hand and 3 for the wrist, using a kinematic glove for ground-truth labeling. Our method employs lightweight encoder-decoder architectures with multi-task learning to simultaneously estimate hand and wrist joint angles. Experimental results under realistic sensor repositioning conditions demonstrate that EMG-US fusion achieves a root mean squared error of $10.6^\circ\pm2.0^\circ$, compared to $12.0^\circ\pm1^\circ$ for EMG and $13.1^\circ\pm2.6^\circ$ for US, and a R$^2$ score of $0.61\pm0.1$, with $0.54\pm0.03$ for EMG and $0.38\pm0.20$ for US.
Reducing Discomfort in Driving Simulators: Motion Cueing for Motion Sickness Mitigation
Driving simulators are increasingly used in research and development. However, simulators often cause motion sickness due to downscaled motion and unscaled veridical visuals. In this paper, a motion cueing algorithm is proposed that reduces motion sickness as predicted by the subjective vertical conflict (SVC) model using model predictive control (MPC). Both sensory conflict and specific force errors are penalised in the cost function, allowing the algorithm to jointly optimise fidelity and comfort. Human-in-the-loop experiments were conducted to compare four simulator motion settings: two variations of our MPC-based algorithm, one focused on pure specific force tracking and the second compromising specific force tracking and motion sickness minimisation, as well as reference adaptive washout and no motion cases. The experiments were performed on a hexapod driving simulator with participants exposed to passive driving. Experimental motion sickness results closely matched the sickness model predictions. As predicted by the model, the no motion condition yielded the lowest sickness levels. However, it was rated lowest in terms of fidelity. The compromise solution reduced sickness by over 50% (average MISC level 3 to 1.5) compared to adaptive washout and the algorithm focusing on specific force tracking, without any significant reduction in fidelity rating. The proposed approach for developing MCA that takes into account both the simulator dynamics and time evolution of motion sickness offers a significant advancement in achieving an optimal control of motion sickness and specific force recreation in driving simulators, supporting broader simulator use.
SPARC: Spine with Prismatic and Revolute Compliance for Quadruped Robot
We present SPARC, a compact, open-source 3-DoF sagittal-plane spine module that combines revolute (pitch) and prismatic (axial) motion with programmable task-space impedance for quadruped robots. The system integrates three torque-controlled actuators, a custom 1 kHz control board, and a protected power unit in a 1.26 kg package, enabling closed-loop stiffness and damping shaping along x, z, and theta. We develop an RNEA-based computed-acceleration controller with smooth Stribeck friction compensation to render spring-damper behavior without explicit inertia shaping. Bench experiments validate the approach. Quasi-static push-pull tests show linear force-displacement characteristics with commanded horizontal stiffness spanning 300-700 N/m and <= 1.5% relative error (R^2 >= 0.992, narrow 95% CIs). Dynamic displace-and-release trials confirm mass-spring-damper responses over multiple damping settings, with small, interpretable phase deviations due to configuration-dependent inertia and low-speed friction effects. A task-space PD controller produces roughly linear stiffness but with greater variability and coupling sensitivity. SPARC provides a portable platform for systematic studies of spine compliance in legged locomotion and will be released with complete hardware and firmware resources.
A TSO-DSO Coordination Framework via Analytical Representation and Monetization of PQV-Based Distribution System Flexibility
As the role of distribution system (DS) flexibility in transmission system operator (TSO) network management becomes increasingly vital, data privacy concerns hinder seamless interoperability. The notion of the feasible operating region (FOR), defined in the PQ domain, has emerged as a promising privacy-preserving approach. However, effectively leveraging FOR in TSO operations remains challenging due to three key factors: its accurate determination in large-scale, meshed DS networks; its tractable analytical representation; and its economic valuation. In the present paper, we propose a novel AC optimal power flow (OPF)-based method to construct a three-dimensional PQV-FOR, explicitly accounting for voltage variability and diverse flexibility-providing unit (FPU) characteristics. The construction process employs a two-stage sampling strategy that combines bounding box projection and Fibonacci direction techniques to efficiently capture the FOR. We then introduce an implicit polynomial fitting approach to analytically represent the FOR. Furthermore, we derive a quadratic cost function over the PQV domain to monetize the FOR. Thus, the proposed framework enables single-round TSO-DSO coordination: the DSO provides an analytical FOR and cost model; the TSO determines operating point at the point of common coupling (PCC) within the FOR-based AC-OPF; and the DSO computes FPU dispatch by solving its local OPF, without computationally intensive disaggregation or iterative coordination. Case studies on meshed DS with up to 533 buses, integrated into TS, demonstrates the method's efficiency compared to standard AC-OPF. On average, the proposed approach yields negligible cost deviations of at most 0.058% across test cases, while reducing computation times by up to 58.11%.
Robust MPC for Large-scale Linear Systems
State-of-the-art approaches of Robust Model Predictive Control (MPC) are restricted to linear systems of relatively small scale, i.e., with no more than about 5 states. The main reason is the computational burden of determining a robust positively invariant (RPI) set, whose complexity suffers from the curse of dimensionality. The recently proposed approach of Deadbeat Robust Model Predictive Control (DRMPC) is the first that does not rely on an RPI set. Yet it comes with the full set of essential system theoretic guarantees. DRMPC is hence a viable option, in particular, for large-scale systems. This paper introduces a detailed design procedure for DRMPC. It is shown that the optimal control problem generated for DRMPC has exactly the same computational complexity as Nominal MPC. A numerical study validates its applicability to randomly generated large-scale linear systems of various dimensions.
Dual-Mode Magnetic Continuum Robot for Targeted Drug Delivery ICRA 2026
Magnetic continuum robots (MCRs) enable minimally invasive navigation through tortuous anatomical channels, yet axially magnetized designs have largely been limited to bending-only motion. To expand deformation capabilities, this paper presents a simple assembly that embeds permanent magnets radially within the catheter wall, allowing a single externally steered permanent magnet to independently induce either bending or torsion. A physics-based formulation together with finite-element analysis establishes the actuation principles, and benchtop experiments validate decoupled mode control under practical fields. Building on this, a dual-layer blockage mechanism consisting of outer grooves and inner plates leverages torsional shear to achieve on-demand drug release. Finally, an in-phantom intervention experiment demonstrates end-to-end operation: lumen following by bending for target approach, followed by twist-activated release at the site. The resulting compact, cable-free platform combines versatile deformation with precise payload delivery, indicating strong potential for next-generation, site-specific therapies.
comment: 7 pages, 3 figures, under review of ICRA 2026
Geometric Backstepping Control of Omnidirectional Tiltrotors Incorporating Servo-Rotor Dynamics for Robustness against Sudden Disturbances
This work presents a geometric backstepping controller for a variable-tilt omnidirectional multirotor that explicitly accounts for both servo and rotor dynamics. Considering actuator dynamics is essential for more effective and reliable operation, particularly during aggressive flight maneuvers or recovery from sudden disturbances. While prior studies have investigated actuator-aware control for conventional and fixed-tilt multirotors, these approaches rely on linear relationships between actuator input and wrench, which cannot capture the nonlinearities induced by variable tilt angles. In this work, we exploit the cascade structure between the rigid-body dynamics of the multirotor and its nonlinear actuator dynamics to design the proposed backstepping controller and establish exponential stability of the overall system. Furthermore, we reveal parametric uncertainty in the actuator model through experiments, and we demonstrate that the proposed controller remains robust against such uncertainty. The controller was compared against a baseline that does not account for actuator dynamics across three experimental scenarios: fast translational tracking, rapid rotational tracking, and recovery from sudden disturbance. The proposed method consistently achieved better tracking performance, and notably, while the baseline diverged and crashed during the fastest translational trajectory tracking and the recovery experiment, the proposed controller maintained stability and successfully completed the tasks, thereby demonstrating its effectiveness.
Stability and Robustness of Time-Varying Opinion Dynamics: A Graph-Theoretic Approach
We study the stability of opinion dynamics in the time-varying Friedkin-Johnsen (TVFJ) model, which captures both persistent individual biases and adaptive social influence. We introduce two temporal structures, defected temporal graphs (DTGs) and weakly defected temporal graphs (WDTGs), that serve as graph-theoretic certificates linking stubborn influence and temporal connectivity to contraction of the state-transition matrix. Using these tools, we prove asymptotic stability of TVFJ dynamics under infinitely recurring DTGs, exponential stability in semi-periodic defected networks, and asymptotic stability of a trust-based extension under the weaker condition of recurring WDTGs. We also establish boundedness of the omega-limit set, showing that long-run opinions remain within the convex hull of innate beliefs, and characterize the limit set for periodically switching systems via a p-LTI decomposition with the tight bound that the size of the omega-limit set is at most p. Finally, we show that exponential stability persists under bounded perturbations, ensuring robustness in noisy or imperfect networks. These results unify algebraic contraction tests with interpretable graph-based reasoning, providing scalable and resilient tools for analyzing opinion formation in evolving social and human-AI networks.
Bi-Virus SIS Epidemic Propagation under Mutation and Game-theoretic Protection Adoption
We study a bi-virus susceptible-infected-susceptible (SIS) epidemic model in which individuals are either susceptible or infected with one of two virus strains, and consider mutation-driven transitions between strains. The general case of bi-directional mutation is first analyzed, where we characterize the disease-free equilibrium and establish its global asymptotic stability, as well as the existence, uniqueness, and stability of an endemic equilibrium. We then present a game-theoretic framework where susceptible individuals strategically choose whether to adopt protection or remain unprotected, to maximize their instantaneous payoffs. We derive Nash strategies under bi-directional mutation, and subsequently consider the special case of unidirectional mutation. In the latter case, we show that coexistence of both strains is impossible when mutation occurs from the strain with lower reproduction number and transmission rate to the other strain. Furthermore, we fully characterize the stationary Nash equilibrium (SNE) in the setting permitting coexistence, and examine how mutation rates influence protection adoption and infection prevalence at the SNE. Numerical simulations corroborate the analytical results, demonstrating that infection levels decrease monotonically with higher protection adoption, and highlight the impact of mutation rates and protection cost on infection state trajectories.
A Scalable Design Approach to Resilient Architectures for Interconnected Cyber-Physical Systems: Safety Guarantees under Multiple Attacks
Complex, interconnected cyber-physical systems (CPS) are increasingly prevalent in domains such as power systems. Cyber-resilient architectures have been proposed to recover compromised cyber components of CPS. Recent works have studied tuning the recovery times of such architectures to guarantee safety in single-system settings. Extending these designs to interconnected CPS is more challenging, since solutions must account for attacks on multiple subsystems that can occur in any order and potentially infinite possible temporal overlap. This paper aims to address the aforementioned challenge by developing a scalable framework to assign resilient architectures and to inform the tuning of their recovery times. Our approach introduces a scalar index that quantifies the impact of each subsystem on safety under compromised input. These indices aggregate linearly across subsystems, enabling scalable analysis under arbitrary attack orderings and temporal overlaps. We establish a linear inequality relating each subsystem's index and recovery time that guarantees safety and guides resilient architecture assignment. We also propose a segmentation-based approach to strengthen the previously derived conditions. We then present algorithms to compute the proposed indices and to find a cost-optimal architecture assignment with a safety guarantee. We validate the framework through a case study on temperature regulation in interconnected rooms under different attack scenarios.
Efficient Optimal Path Planning in Dynamic Environments Using Koopman MPC
This paper presents a data-driven model predictive control framework for mobile robots navigating in dynamic environments, leveraging Koopman operator theory. Unlike the conventional Koopman-based approaches that focus on the linearization of system dynamics only, our work focuses on finding a global linear representation for the optimal path planning problem that includes both the nonlinear robot dynamics and collision-avoidance constraints. We deploy extended dynamic mode decomposition to identify linear and bilinear Koopman realizations from input-state data. Our open-loop analysis demonstrates that only the bilinear Koopman model can accurately capture nonlinear state-input couplings and quadratic terms essential for collision avoidance, whereas linear realizations fail to do so. We formulate a quadratic program for the robot path planning in the presence of moving obstacles in the lifted space and determine the optimal robot action in an MPC framework. Our approach is capable of finding the safe optimal action 320 times faster than a nonlinear MPC counterpart that solves the path planning problem in the original state space. Our work highlights the potential of bilinear Koopman realizations for linearization of highly nonlinear optimal control problems subject to nonlinear state and input constraints to achieve computational efficiency similar to linear problems.
comment: This work has been submitted to the ACC2026 conference
Bridging the Prediction Error Method and Subspace Identification: A Weighted Null Space Fitting Method
Subspace identification methods (SIMs) have proven to be very useful and numerically robust for building state-space models. While most SIMs are consistent, few if any can achieve the efficiency of the maximum likelihood estimate (MLE). Conversely, the prediction error method (PEM) with a quadratic criteria is equivalent to MLE, but it comes with non-convex optimization problems and requires good initialization points. This contribution proposes a weighted null space fitting (WNSF) approach for estimating state-space models, combining some key advantages of the two aforementioned mainstream approaches. It starts with a least-squares estimate of a high-order ARX model, and then a multi-step least-squares procedure reduces the model to a state-space model on canoncial form. It is demonstrated through statistical analysis that when a canonical parameterization is admissible, the proposed method is consistent and asymptotically efficient, thereby making progress on the long-standing open problem about the existence of an asymptotically efficient SIM. Numerical and practical examples are provided to illustrate that the proposed method performs favorable in comparison with SIMs.
Self-supervised diffusion model fine-tuning for costate initialization using Markov chain Monte Carlo
Global search and optimization of long-duration, low-thrust spacecraft trajectories with the indirect method is challenging due to a complex solution space and the difficulty of generating good initial guesses for the costate variables. This is particularly true in multibody environments. Given data that reveals a partial Pareto optimal front, it is desirable to find a flexible manner in which the Pareto front can be completed and fronts for related trajectory problems can be found. In this work we use conditional diffusion models to represent the distribution of candidate optimal trajectory solutions. We then introduce into this framework the novel approach of using Markov Chain Monte Carlo algorithms with self-supervised fine-tuning to achieve the aforementioned goals. Specifically, a random walk Metropolis algorithm is employed to propose new data that can be used to fine-tune the diffusion model using a reward-weighted training based on efficient evaluations of constraint violations and missions objective functions. The framework removes the need for separate focused and often tedious data generation phases. Numerical experiments are presented for two problems demonstrating the ability to improve sample quality and explicitly target Pareto optimality based on the theory of Markov chains. The first problem does so for a transfer in the Jupiter-Europa circular restricted three-body problem, where the MCMC approach completes a partial Pareto front. The second problem demonstrates how a dense and superior Pareto front can be generated by the MCMC self-supervised fine-tuning method for a Saturn-Titan transfer starting from the Jupiter-Europa case versus a separate dedicated global search.
A Bilevel Optimization Framework for Adversarial Control of Gas Pipeline Operations
Cyberattacks on pipeline operational technology systems pose growing risks to energy infrastructure. This study develops a physics-informed simulation and optimization framework for analyzing cyber-physical threats in petroleum pipeline networks. The model integrates networked hydraulic dynamics, SCADA-based state estimation, model predictive control (MPC), and a bi-level formulation for stealthy false-data injection (FDI) attacks. Pipeline flow and pressure dynamics are modeled on a directed graph using nodal pressure evolution and edge-based Weymouth-type relations, including control-aware equipment such as valves and compressors. An extended Kalman filter estimates the full network state from partial SCADA telemetry. The controller computes pressure-safe control inputs via MPC under actuator constraints and forecasted demands. Adversarial manipulation is formalized as a bi-level optimization problem where an attacker perturbs sensor data to degrade throughput while remaining undetected by bad-data detectors. This attack-control interaction is solved via Karush-Kuhn-Tucker (KKT) reformulation, which results in a tractable mixed-integer quadratic program. Test gas pipeline case studies demonstrate the covert reduction of service delivery under attack. Results show that undetectable attacks can cause sustained throughput loss with minimal instantaneous deviation. This reveals the need for integrated detection and control strategies in cyber-physical infrastructure.
Situationally Aware Rolling Horizon Multi-Tier Load Restoration Considering Behind-The-Meter DER
Restoration in power distribution systems (PDSs) is well studied, however, most existing research focuses on network partition and microgrid formation, where load transfer is limited to adjacent feeders. This focus is not practical, as when adjacent feeders lack sufficient capacity, utilities may request support from more distant feeders in practice. Such a hirarchical restoration is complex, especially when involving changing system conditions due to cold load pickup and delayed reconnection of behind-the-meter DERs. To fill this research gap, a situationally aware multi-tier load restoration framework is proposed. Specifically, models are proposed to describe the multi-tier load restoration, including the multi-tier load transfer and substation transformer and feeder protection models. By introducing binary actional switching variables and load block transfer variables, the models effectively captures the dynamics of switches and multi-tier transfer process. To integrate situational awareness of evolving system conditions, the problem is formulated as a mixed-integer linear program (MILP) and then embedded within a rolling horizon optimization. Particularly, a set of safeguarded constraints are developed based on segment-level restoration reward bounds to mitigate the myopia of traditional rolling horizon optimization. The proposed safeguarded rolling strategy guarantees that each time step is lower bounded by a $(1-\varepsilon)$-fraction of its optimal restoration potential, thereby balancing short-term switching decisions with long-term restoration goals. Finally, cases studies on the modified IEEE 123-node test feeder validate the proposed multi-tier restoration framework.
Power Distribution System Blackstart Restoration Using Renewable Energy
Integrating renewable energy sources into the grid not only reduces global carbon emissions, but also facilitates distribution system (DS) blackstart restoration. This process leverages renewable energy, inverters, situational awareness and distribution automation to initiate blackstart at the DS level, obtaining a fast response and bottom-up restoration. In this Review, we survey the latest technological advances for DS blackstart restoration using renewable energy. We first present mathematical models for distributed energy resources (DERs), network topology, and load dynamics. We then discuss how the situational awareness can help improve restoration performance through real-time monitoring and forecasting. Next, the DS blackstart restoration problem, including objectives, constraints, and existing methodologies for decision-making are provided. Lastly, we outline remaining challenges, and highlight the opportunities and future research directions.
Improved Robustness of Deep Reinforcement Learning for Control of Time-Varying Systems by Bounded Extremum Seeking
In this paper, we study the use of robust model independent bounded extremum seeking (ES) feedback control to improve the robustness of deep reinforcement learning (DRL) controllers for a class of nonlinear time-varying systems. DRL has the potential to learn from large datasets to quickly control or optimize the outputs of many-parameter systems, but its performance degrades catastrophically when the system model changes rapidly over time. Bounded ES can handle time-varying systems with unknown control directions, but its convergence speed slows down as the number of tuned parameters increases and, like all local adaptive methods, it can get stuck in local minima. We demonstrate that together, DRL and bounded ES result in a hybrid controller whose performance exceeds the sum of its parts with DRL taking advantage of historical data to learn how to quickly control a many-parameter system to a desired setpoint while bounded ES ensures its robustness to time variations. We present a numerical study of a general time-varying system and a combined ES-DRL controller for automatic tuning of the Low Energy Beam Transport section at the Los Alamos Neutron Science Center linear particle accelerator.
Data-Driven Stochastic Distribution System Hardening Based on Bayesian Online Learning
Extreme weather frequently cause widespread outages in distribution systems (DSs), demonstrating the importance of hardening strategies for resilience enhancement. However, the well-utilization of real-world outage data with associated weather conditions to make informed hardening decisions in DSs is still an open issue. To bridge this research gap, this paper proposes a data-driven stochastic distribution line (DL) hardening strategy. First, a deep neural network (DNN) regression model is developed to predict the probabilistic evolution of outage scenarios under various hardening decisions. Based on the DNN predictions, the problem is formulated as a decision-dependent distributionally robust optimization (DRO) model, accounting for uncertainties in outage scenario distributions using a data-driven ambiguity set. To address decision-dependent uncertainty, a Bayesian online learning algorithm is proposed. This algorithm decomposes the original problem into inner and outer problems. Then, it iteratively refines hardening decisions by sequentially incorporating outage data and dynamically updating decision-specific ambiguity sets by using Bayes' theorem and Bayesian Inference. Also, the convergence of the algorithm is proven through dynamic regret analysis. Finally, case studies are implemented on a real-world DS in Redfield, Iowa, USA. A dataset spanning 24 years (2001-2024) is constructed based on the utility outage records. The simulation results validates the effectiveness of the proposed strategy.
On Architectures for Combining Reinforcement Learning and Model Predictive Control with Runtime Improvements
Model Predictive Control (MPC) faces computational demands and performance degradation from model inaccuracies. We propose two architectures combining Neural Network-approximated MPC (NNMPC) with Reinforcement Learning (RL). The first, Warm Start RL, initializes the RL actor with pre-trained NNMPC weights. The second, RLMPC, uses RL to generate corrective residuals for NNMPC outputs. We introduce a downsampling method reducing NNMPC input dimensions while maintaining performance. Evaluated on a rotary inverted pendulum, both architectures demonstrate runtime reductions exceeding 99% compared to traditional MPC while improving tracking performance under model uncertainties, with RL+MPC achieving 11-40% cost reduction depending on reference amplitude.
comment: Accepted at the 2025 IFAC Conference on Modeling, Estimation, and Control of Systems (MECC 2025), Pittsburgh, USA
Cyber-physical WebAssembly: Secure Hardware Interfaces and Pluggable Drivers
The rapid expansion of Internet of Things (IoT), edge, and embedded devices in the past decade has introduced numerous challenges in terms of security and configuration management. Simultaneously, advances in cloud-native development practices have greatly enhanced the development experience and facilitated quicker updates, thereby enhancing application security. However, applying these advances to IoT, edge, and embedded devices remains a complex task, primarily due to the heterogeneous environments and the need to support devices with extended lifespans. WebAssembly and the WebAssembly System Interface (WASI) has emerged as a promising technology to bridge this gap. As WebAssembly becomes more popular on IoT, edge, and embedded devices, there is a growing demand for hardware interface support in WebAssembly programs. This work presents WASI proposals and proof-of-concept implementations to enable hardware interaction with I2C and USB, which are two commonly used protocols in IoT, directly from WebAssembly applications. This is achieved by running the device drivers within WebAssembly as well. A thorough evaluation of the proof of concepts shows that WASI-USB introduces a minimal overhead of at most 8% compared to native operating system USB APIs. However, the results show that runtime initialization overhead can be significant in low-latency applications.
comment: Accepted article of the IEEE/IFIP Network Operations and Management Symposium 2025 (NOMS 2025)
Time-o1: Time-Series Forecasting Needs Transformed Label Alignment NeurIPS 2025
Training time-series forecast models presents unique challenges in designing effective learning objectives. Existing methods predominantly utilize the temporal mean squared error, which faces two critical challenges: (1) label autocorrelation, which leads to bias from the label sequence likelihood; (2) excessive amount of tasks, which increases with the forecast horizon and complicates optimization. To address these challenges, we propose Time-o1, a transformation-augmented learning objective tailored for time-series forecasting. The central idea is to transform the label sequence into decorrelated components with discriminated significance. Models are then trained to align the most significant components, thereby effectively mitigating label autocorrelation and reducing task amount. Extensive experiments demonstrate that Time-o1 achieves state-of-the-art performance and is compatible with various forecast models. Code is available at https://github.com/Master-PLC/Time-o1.
comment: Accepted as poster in NeurIPS 2025
AdGT: Decentralized Gradient Tracking with Tuning-free Per-Agent Stepsize
In decentralized optimization, the choice of stepsize plays a critical role in algorithm performance. A common approach is to use a shared stepsize across all agents to ensure convergence. However, selecting an optimal stepsize often requires careful tuning, which can be time-consuming and may lead to slow convergence, especially when there is significant variation in the smoothness (L-smoothness) of local objective functions across agents. Individually tuning stepsizes per agent is also impractical, particularly in large-scale networks. To address these limitations, we propose AdGT, an adaptive gradient tracking method that enables each agent to adjust its stepsize based on the smoothness of its local objective. We prove that AdGT achieves linear convergence to the global optimal solution. Through numerical experiments, we compare AdGT with fixed-stepsize gradient tracking methods and demonstrate its superior performance. Additionally, we compare AdGT with adaptive gradient descent (AdGD) in a centralized setting and observe that fully adaptive stepsizes offer greater benefits in decentralized networks than in centralized ones.
Robust Capacity Expansion Modelling for Renewable Energy Systems
Future greenhouse gas neutral energy systems will be dominated by renewable energy technologies whose energy output and utilisation is subject to uncertain weather conditions. This work proposes an algorithm for capacity expansion planning if only uncertain data is available for a year's operative parameters. When faced with multiple possible operating years, the quality of a solution derived on a single operating year's data is evaluated for all years, and the optimisation problem is iteratively modified whenever supply gaps are detected. These modifications lead to solutions with sufficient back-up capacity to overcome periods of cold dark lulls, and sufficient total annual energy supply across all years. A computational study on an energy system model of Germany for 40 different operating years shows that the iterative algorithm finds solutions that guarantee security of supply for all considered years increasing the total annual cost by 1.6-2.9% compared to a lower bound. Results also underline the importance of assessing the feasibility of energy system models using atypical time-series, combining dark lull and cold period effects.
Learning Low-Dimensional Embeddings for Black-Box Optimization
When gradient-based methods are impractical, black-box optimization (BBO) provides a valuable alternative. However, BBO often struggles with high-dimensional problems and limited trial budgets. In this work, we propose a novel approach based on meta-learning to pre-compute a reduced-dimensional manifold where optimal points lie for a specific class of optimization problems. When optimizing a new problem instance sampled from the class, black-box optimization is carried out in the reduced-dimensional space, effectively reducing the effort required for finding near-optimal solutions.
A Fractional-Order Nonlinear Backstepping Controller Design for Current-Controlled Maglev System
The magnetic levitation system (Maglev) is a nonlinear system by which an object is suspended with no support other than magnetic fields. The main control perspective of the Maglev system is to levitate a steel ball in air by the electromagnetic force. However, the Maglev system has highly nonlinear dynamics which is inconvenient in the sense of sensitive control/regulation of its nonlinear dynamics. In this paper, the nonlinear backstepping controller based on the fractional-order derivative is proposed for the control of the nonlinear current-controlled Maglev system. After, the system dynamics and fractional-order backstepping controller design are given, the asymptotic stability of the closed-loop system is proved by employing the Lyapunov theory. Some computer-based numerical experiments are carried out to show the effectiveness of the proposed controller for the control of Maglev system.
comment: Accepted at Proceedings of the 28th International Scientific Conference Mechanika 2024
A Novel State-Centric Necessary Condition for Time-Optimal Control of Controllable Linear Systems Based on Augmented Switching Laws (Extended Version)
Most existing necessary conditions for optimal control based on adjoining methods require both state and costate information, yet the unobservability of costates for a given feasible trajectory impedes the determination of optimality in practice. This paper establishes a novel theoretical framework for time-optimal control of controllable linear systems with a single input, proposing the augmented switching law (ASL) that represents the input control and the feasibility in a compact form. Given a feasible trajectory, the perturbed trajectory under the constraints of ASL is guaranteed to be feasible, resulting in a novel state-centric necessary condition without dependence on costate information. A first-order necessary condition is proposed that the Jacobian matrix of the ASL is not of full row rank, which also results in a potential approach to optimizing a given feasible trajectory with the preservation of arc structures. The proposed necessary condition is applied to high-order chain-of-integrator systems with full box constraints, contributing to some theoretical results challenging to reason by costate-based conditions.
comment: This paper has been accepted by IEEE TAC
Neural Network-based Co-design of Output-Feedback Control Barrier Function and Observer
Control Barrier Functions (CBFs) provide a powerful framework for ensuring safety in dynamical systems. However, their application typically relies on full state information, which is often violated in real-world scenarios due to the availability of partial state information. In this work, we propose a neural network-based framework for the co-design of a safety controller, observer, and CBF for partially observed continuous-time systems. By formulating barrier conditions over an augmented state space, our approach ensures safety without requiring bounded estimation errors or handcrafted barrier functions. All components are jointly trained by formulating appropriate loss functions, and we introduce a validity condition to provide formal safety guarantees beyond the training data. Finally, we demonstrate the effectiveness of the proposed approach through several case studies.
comment: There were errors in paper (introduction section and notations)
Verifiable Mission Planning For Space Operations
Spacecraft must operate under environmental and actuator uncertainties while meeting strict safety requirements. Traditional approaches rely on scenario-based heuristics that fail to account for stochastic influences, leading to suboptimal or unsafe plans. We propose a finite-horizon, chance-constrained Markov decision process for mission planning, where states represent mission and vehicle parameters, actions correspond to operational adjustments, and temporal logic specifications encode operational constraints. We synthesize policies that optimize mission objectives while ensuring constraints are met with high probability. Applied to the GRACE-FO mission, the approach accounts for stochastic solar activity and uncertain thrust performance, yielding maneuver schedules that maximize scientific return and provably satisfy safety requirements. This work demonstrates how Markov decision processes can be applied to space missions, enabling autonomous operation with formal guarantees.
comment: Submitted to the 2025 AAS/AIAA Astrodynamics Specialist Conference
Future pathways for eVTOLs: A design optimization perspective
The rapid development of advanced urban air mobility, particularly electric vertical take-off and landing (eVTOL) aircraft, requires interdisciplinary approaches involving the future urban air mobility ecosystem. Operational cost efficiency, regulatory aspects, sustainability, and environmental compatibility should be incorporated directly into the conceptual design of aircraft and across operational and regulatory strategies. In this work, we apply a novel multidisciplinary design optimization framework for the conceptual design of eVTOL aircraft. The framework optimizes conventional design elements of eVTOL aircraft over a generic mission and integrates a comprehensive operational cost model to directly capture economic incentives of the designed system through profit modeling for operators. We introduce a novel metric, the cross-transportation Figure of Merit (FoM), to compare the optimized eVTOL system with various competing road, rail, and air transportation modes in terms of sustainability, cost, and travel time. We investigate four objective-specific eVTOL optimization designs in a broad scenario space, mapping regulatory, technical, and operational constraints to generate a representation of potential urban air mobility stakeholder-centric design objectives. The analysis of an optimized profit-maximizing eVTOL, cost-minimizing eVTOL, sustainability-maximizing eVTOL, and a combined FoM-maximizing eVTOL design highlights significant trade-offs in the area of profitability, operational flexibility, and sustainability strategies. This underlines the importance of incorporating multiple operationally tangential disciplines into the design process, while also reflecting the diverse priorities of stakeholders such as operators, regulators, and society.
comment: 26 pages, 8 figures
Optimal Modified Feedback Strategies in LQ Games under Control Imperfections
Game-theoretic approaches and Nash equilibrium have been widely applied across various engineering domains. However, practical challenges such as disturbances, delays, and actuator limitations can hinder the precise execution of Nash equilibrium strategies. This work investigates the impact of such implementation imperfections on game trajectories and players' costs in the context of a two-player finite-horizon linear quadratic (LQ) nonzero-sum game. Specifically, we analyze how small deviations by one player, measured or estimated at each stage, affect the state and cost function of the other player. To mitigate these effects, we propose an adjusted control policy that optimally compensates for the deviations under the stated information structure and can, under certain conditions, exploit them to improve performance. Rigorous mathematical analysis and proofs are provided, and the effectiveness of the proposed method is demonstrated through a representative numerical example.
comment: 6 pages, 2 figures, Preprint version of a paper submitted to ACC 2026
Koopman-Nemytskii Operator: A Linear Representation of Nonlinear Controlled Systems
While Koopman operator lifts a nonlinear system into an infinite-dimensional function space and represents it as a linear dynamics, its definition is restricted to autonomous systems, i.e., does not incorporate inputs or disturbances. To the end of designing state-feedback controllers, the existing extensions of Koopman operator, which only account for the effect of open-loop values of inputs, does not involve feedback laws on closed-loop systems. Hence, in order to generically represent any nonlinear controlled dynamics linearly, this paper proposes a Koopman-Nemytskii operator, defined as a linear mapping from a product reproducing kernel Hilbert space (RKHS) of states and feedback laws to an RKHS of states. Using the equivalence between RKHS and Sobolev-Hilbert spaces under certain regularity conditions on the dynamics and kernel selection, this operator is well-defined. Its data-based approximation, which follows a kernel extended dynamic mode decomposition (kernel EDMD) approach, have established errors in single-step and multi-step state predictions as well as accumulated cost under control.
comment: 19 pages, 9 figures, submitted to IEEE Transactions on Automatic Control after revision on 10/2/2025
Planning Stealthy Backdoor Attacks in MDPs with Observation-Based Triggers
This paper investigates backdoor attack planning in stochastic control systems modeled as Markov Decision Processes (MDPs). In a backdoor attack, the adversary provides a control policy that behaves well in the original MDP to pass the testing phase. However, when such a policy is deployed with a trigger policy, which perturbs the system dynamics at runtime, it optimizes the attacker's objective instead. To solve jointly the control policy and its trigger, we formulate the attack planning problem as a constrained optimal planning problem in an MDP with augmented state space, with the objective to maximize the attacker's total rewards in the system with an activated trigger, subject to the constraint that the control policy is near optimal in the original MDP. We then introduce a gradient-based optimization method to solve the optimal backdoor attack policy as a pair of coordinated control and trigger policies. Experimental results from a case study validate the effectiveness of our approach in achieving stealthy backdoor attacks.
Active Alignments of Lens Systems with Reinforcement Learning
Aligning a lens system relative to an imager is a critical challenge in camera manufacturing. While optimal alignment can be mathematically computed under ideal conditions, real-world deviations caused by manufacturing tolerances often render this approach impractical. Measuring these tolerances can be costly or even infeasible, and neglecting them may result in suboptimal alignments. We propose a reinforcement learning (RL) approach that learns exclusively in the pixel space of the sensor output, eliminating the need to develop expert-designed alignment concepts. We conduct an extensive benchmark study and show that our approach surpasses other methods in speed, precision, and robustness. We further introduce relign, a realistic, freely explorable, open-source simulation utilizing physically based rendering that models optical systems with non-deterministic manufacturing tolerances and noise in robotic alignment movement. It provides an interface to popular machine learning frameworks, enabling seamless experimentation and development. Our work highlights the potential of RL in a manufacturing environment to enhance efficiency of optical alignments while minimizing the need for manual intervention.
comment: This work has been submitted to the IEEE for possible publication
Neural Parameter-varying Data-enabled Predictive Control of Cold Atmospheric Pressure Plasma Jets
Cold Atmospheric Pressure Plasma Jets (APPJs) show significant potential for biomedical applications, but their inherent complexity, characterized by nonlinear dynamics and strong sensitivity to operating conditions like tip-to-surface distance, presents challenges for real-time control. This paper introduces the Neural Parameter-varying Data-enabled Predictive Control (NPV-DeePC) framework to address these issues. By integrating hypernetworks into the neural DeePC paradigm, NPV-DeePC adaptively captures system nonlinearities and parameter variations, dynamically adjusts the neural network's learned representation of the system, enabling accurate multi-step trajectory prediction and control. Simulation studies on surface temperature tracking and thermal dose delivery demonstrate that NPV-DeePC achieves higher accuracy and adaptability than existing controllers. Moreover, its computational efficiency supports real-time implementation, making it a practical approach for precise APPJ control and a generalizable solution for other nonlinear, parameter-varying systems.
Introduction to Online Control
This text presents an introduction to an emerging paradigm in control of dynamical systems and differentiable reinforcement learning called online nonstochastic control. The new approach applies techniques from online convex optimization and convex relaxations to obtain new methods with provable guarantees for classical settings in optimal and robust control. The primary distinction between online nonstochastic control and other frameworks is the objective. In optimal control, robust control, and other control methodologies that assume stochastic noise, the goal is to perform comparably to an offline optimal strategy. In online nonstochastic control, both the cost functions as well as the perturbations from the assumed dynamical model are chosen by an adversary. Thus the optimal policy is not defined a priori. Rather, the target is to attain low regret against the best policy in hindsight from a benchmark class of policies. This objective suggests the use of the decision making framework of online convex optimization as an algorithmic methodology. The resulting methods are based on iterative mathematical optimization algorithms, and are accompanied by finite-time regret and computational complexity guarantees.
comment: Draft; comments/suggestions welcome at nonstochastic.control@gmail.com
Robotics
Online Hierarchical Policy Learning using Physics Priors for Robot Navigation in Unknown Environments
Robot navigation in large, complex, and unknown indoor environments is a challenging problem. The existing approaches, such as traditional sampling-based methods, struggle with resolution control and scalability, while imitation learning-based methods require a large amount of demonstration data. Active Neural Time Fields (ANTFields) have recently emerged as a promising solution by using local observations to learn cost-to-go functions without relying on demonstrations. Despite their potential, these methods are hampered by challenges such as spectral bias and catastrophic forgetting, which diminish their effectiveness in complex scenarios. To address these issues, our approach decomposes the planning problem into a hierarchical structure. At the high level, a sparse graph captures the environment's global connectivity, while at the low level, a planner based on neural fields navigates local obstacles by solving the Eikonal PDE. This physics-informed strategy overcomes common pitfalls like spectral bias and neural field fitting difficulties, resulting in a smooth and precise representation of the cost landscape. We validate our framework in large-scale environments, demonstrating its enhanced adaptability and precision compared to previous methods, and highlighting its potential for online exploration, mapping, and real-world navigation.
A Robust Neural Control Design for Multi-drone Slung Payload Manipulation with Control Contraction Metrics
This paper presents a robust neural control design for a three-drone slung payload transportation system to track a reference path under external disturbances. The control contraction metric (CCM) is used to generate a neural exponentially converging baseline controller while complying with control input saturation constraints. We also incorporate the uncertainty and disturbance estimator (UDE) technique to dynamically compensate for persistent disturbances. The proposed framework yields a modularized design, allowing the controller and estimator to perform their individual tasks and achieve a zero trajectory tracking error if the disturbances meet certain assumptions. The stability and robustness of the complete system, incorporating both the CCM controller and the UDE compensator, are presented. Simulations are conducted to demonstrate the capability of the proposed control design to follow complicated trajectories under external disturbances.
comment: Submit to the 2026 American Control Conference (ACC)
Pose Estimation of a Thruster-Driven Bioinspired Multi-Link Robot
This work demonstrates pose (position and shape) estimation for a free-floating, bioinspired multi-link robot with unactuated joints, link-mounted thrusters for control, and a single gyroscope per link, resulting in an underactuated, minimally sensed platform. Through a proof-of-concept hardware experiment and offline Kalman filter analysis, we show that the robot's pose can be reliably estimated. State estimation is performed using an unscented Kalman filter augmented with Gaussian process residual learning to compensate for non-zero-mean, non-Gaussian noise. We further show that a filter trained on a multi-gait dataset (forward, backward, left, right, and turning) performs comparably to one trained on a larger forward-gait-only dataset when both are evaluated on the same forward-gait test trajectory. These results reveal overlap in the gait input space, which can be exploited to reduce training data requirements while enhancing the filter's generalizability across multiple gaits.
comment: 8 pages, 8 figures, 3 tables
VL-KnG: Visual Scene Understanding for Navigation Goal Identification using Spatiotemporal Knowledge Graphs
Vision-language models (VLMs) have shown potential for robot navigation but encounter fundamental limitations: they lack persistent scene memory, offer limited spatial reasoning, and do not scale effectively with video duration for real-time application. We present VL-KnG, a Visual Scene Understanding system that tackles these challenges using spatiotemporal knowledge graph construction and computationally efficient query processing for navigation goal identification. Our approach processes video sequences in chunks utilizing modern VLMs, creates persistent knowledge graphs that maintain object identity over time, and enables explainable spatial reasoning through queryable graph structures. We also introduce WalkieKnowledge, a new benchmark with about 200 manually annotated questions across 8 diverse trajectories spanning approximately 100 minutes of video data, enabling fair comparison between structured approaches and general-purpose VLMs. Real-world deployment on a differential drive robot demonstrates practical applicability, with our method achieving 77.27% success rate and 76.92% answer accuracy, matching Gemini 2.5 Pro performance while providing explainable reasoning supported by the knowledge graph, computational efficiency for real-time deployment across different tasks, such as localization, navigation and planning. Code and dataset will be released after acceptance.
comment: This work has been submitted to the IEEE for possible publication
Touching the tumor boundary: A pilot study on ultrasound based virtual fixtures for breast-conserving surgery
Purpose: Delineating tumor boundaries during breast-conserving surgery is challenging as tumors are often highly mobile, non-palpable, and have irregularly shaped borders. To address these challenges, we introduce a cooperative robotic guidance system that applies haptic feedback for tumor localization. In this pilot study, we aim to assess if and how this system can be successfully integrated into breast cancer care. Methods: A small haptic robot is retrofitted with an electrocautery blade to operate as a cooperatively controlled surgical tool. Ultrasound and electromagnetic navigation are used to identify the tumor boundaries and position. A forbidden region virtual fixture is imposed when the surgical tool collides with the tumor boundary. We conducted a study where users were asked to resect tumors from breast simulants both with and without the haptic guidance. We then assess the results of these simulated resections both qualitatively and quantitatively. Results: Virtual fixture guidance is shown to improve resection margins. On average, users find the task to be less mentally demanding, frustrating, and effort intensive when haptic feedback is available. We also discovered some unanticipated impacts on surgical workflow that will guide design adjustments and training protocol moving forward. Conclusion: Our results suggest that virtual fixtures can help localize tumor boundaries in simulated breast-conserving surgery. Future work will include an extensive user study to further validate these results and fine-tune our guidance system.
Differentiable Skill Optimisation for Powder Manipulation in Laboratory Automation
Robotic automation is accelerating scientific discovery by reducing manual effort in laboratory workflows. However, precise manipulation of powders remains challenging, particularly in tasks such as transport that demand accuracy and stability. We propose a trajectory optimisation framework for powder transport in laboratory settings, which integrates differentiable physics simulation for accurate modelling of granular dynamics, low-dimensional skill-space parameterisation to reduce optimisation complexity, and a curriculum-based strategy that progressively refines task competence over long horizons. This formulation enables end-to-end optimisation of contact-rich robot trajectories while maintaining stability and convergence efficiency. Experimental results demonstrate that the proposed method achieves superior task success rates and stability compared to the reinforcement learning baseline.
comment: 4 pages
AFFORD2ACT: Affordance-Guided Automatic Keypoint Selection for Generalizable and Lightweight Robotic Manipulation
Vision-based robot learning often relies on dense image or point-cloud inputs, which are computationally heavy and entangle irrelevant background features. Existing keypoint-based approaches can focus on manipulation-centric features and be lightweight, but either depend on manual heuristics or task-coupled selection, limiting scalability and semantic understanding. To address this, we propose AFFORD2ACT, an affordance-guided framework that distills a minimal set of semantic 2D keypoints from a text prompt and a single image. AFFORD2ACT follows a three-stage pipeline: affordance filtering, category-level keypoint construction, and transformer-based policy learning with embedded gating to reason about the most relevant keypoints, yielding a compact 38-dimensional state policy that can be trained in 15 minutes, which performs well in real-time without proprioception or dense representations. Across diverse real-world manipulation tasks, AFFORD2ACT consistently improves data efficiency, achieving an 82% success rate on unseen objects, novel categories, backgrounds, and distractors.
How Well do Diffusion Policies Learn Kinematic Constraint Manifolds?
Diffusion policies have shown impressive results in robot imitation learning, even for tasks that require satisfaction of kinematic equality constraints. However, task performance alone is not a reliable indicator of the policy's ability to precisely learn constraints in the training data. To investigate, we analyze how well diffusion policies discover these manifolds with a case study on a bimanual pick-and-place task that encourages fulfillment of a kinematic constraint for success. We study how three factors affect trained policies: dataset size, dataset quality, and manifold curvature. Our experiments show diffusion policies learn a coarse approximation of the constraint manifold with learning affected negatively by decreases in both dataset size and quality. On the other hand, the curvature of the constraint manifold showed inconclusive correlations with both constraint satisfaction and task success. A hardware evaluation verifies the applicability of our results in the real world. Project website with additional results and visuals: https://diffusion-learns-kinematic.github.io
comment: Under review. 8 pages, 3 figures, 3 tables. Additional results available at https://diffusion-learns-kinematic.github.io
Beyond Collision Cones: Dynamic Obstacle Avoidance for Nonholonomic Robots via Dynamic Parabolic Control Barrier Functions
Control Barrier Functions (CBFs) are a powerful tool for ensuring the safety of autonomous systems, yet applying them to nonholonomic robots in cluttered, dynamic environments remains an open challenge. State-of-the-art methods often rely on collision-cone or velocity-obstacle constraints which, by only considering the angle of the relative velocity, are inherently conservative and can render the CBF-based quadratic program infeasible, particularly in dense scenarios. To address this issue, we propose a Dynamic Parabolic Control Barrier Function (DPCBF) that defines the safe set using a parabolic boundary. The parabola's vertex and curvature dynamically adapt based on both the distance to an obstacle and the magnitude of the relative velocity, creating a less restrictive safety constraint. We prove that the proposed DPCBF is valid for a kinematic bicycle model subject to input constraints. Extensive comparative simulations demonstrate that our DPCBF-based controller significantly enhances navigation success rates and QP feasibility compared to baseline methods. Our approach successfully navigates through dense environments with up to 100 dynamic obstacles, scenarios where collision cone-based methods fail due to infeasibility.
comment: The first two authors contributed equally to this work. Project page: https://www.taekyung.me/dpcbf
INSIGHT: INference-time Sequence Introspection for Generating Help Triggers in Vision-Language-Action Models
Recent Vision-Language-Action (VLA) models show strong generalization capabilities, yet they lack introspective mechanisms for anticipating failures and requesting help from a human supervisor. We present \textbf{INSIGHT}, a learning framework for leveraging token-level uncertainty signals to predict when a VLA should request help. Using $\pi_0$-FAST as the underlying model, we extract per-token \emph{entropy}, \emph{log-probability}, and Dirichlet-based estimates of \emph{aleatoric and epistemic uncertainty}, and train compact transformer classifiers to map these sequences to help triggers. We explore supervision regimes for strong or weak supervision, and extensively compare them across in-distribution and out-of-distribution tasks. Our results show a trade-off: strong labels enable models to capture fine-grained uncertainty dynamics for reliable help detection, while weak labels, though noisier, still support competitive introspection when training and evaluation are aligned, offering a scalable path when dense annotation is impractical. Crucially, we find that modeling the temporal evolution of token-level uncertainty signals with transformers provides far greater predictive power than static sequence-level scores. This study provides the first systematic evaluation of uncertainty-based introspection in VLAs, opening future avenues for active learning and for real-time error mitigation through selective human intervention.
VENTURA: Adapting Image Diffusion Models for Unified Task Conditioned Navigation
Robots must adapt to diverse human instructions and operate safely in unstructured, open-world environments. Recent Vision-Language models (VLMs) offer strong priors for grounding language and perception, but remain difficult to steer for navigation due to differences in action spaces and pretraining objectives that hamper transferability to robotics tasks. Towards addressing this, we introduce VENTURA, a vision-language navigation system that finetunes internet-pretrained image diffusion models for path planning. Instead of directly predicting low-level actions, VENTURA generates a path mask (i.e. a visual plan) in image space that captures fine-grained, context-aware navigation behaviors. A lightweight behavior-cloning policy grounds these visual plans into executable trajectories, yielding an interface that follows natural language instructions to generate diverse robot behaviors. To scale training, we supervise on path masks derived from self-supervised tracking models paired with VLM-augmented captions, avoiding manual pixel-level annotation or highly engineered data collection setups. In extensive real-world evaluations, VENTURA outperforms state-of-the-art foundation model baselines on object reaching, obstacle avoidance, and terrain preference tasks, improving success rates by 33% and reducing collisions by 54% across both seen and unseen scenarios. Notably, we find that VENTURA generalizes to unseen combinations of distinct tasks, revealing emergent compositional capabilities. Videos, code, and additional materials: https://venturapath.github.io
comment: 9 pages, 6 figures, 3 tables
A Stochastic Framework for Continuous-Time State Estimation of Continuum Robots
State estimation techniques for continuum robots (CRs) typically involve using computationally complex dynamic models, simplistic shape approximations, or are limited to quasi-static methods. These limitations can be sensitive to unmodelled disturbances acting on the robot. Inspired by a factor-graph optimization paradigm, this work introduces a continuous-time stochastic state estimation framework for continuum robots. We introduce factors based on continuous-time kinematics that are corrupted by a white noise Gaussian process (GP). By using a simple robot model paired with high-rate sensing, we show adaptability to unmodelled external forces and data dropout. The result contains an estimate of the mean and covariance for the robot's pose, velocity, and strain, each of which can be interpolated continuously in time or space. This same interpolation scheme can be used during estimation, allowing for inclusion of measurements on states that are not explicitly estimated. Our method's inherent sparsity leads to a linear solve complexity with respect to time and interpolation queries in constant time. We demonstrate our method on a CR with gyroscope and pose sensors, highlighting its versatility in real-world systems.
comment: 17 pages, 11 figures. Submitted to IEEE Transactions on Robotics
Safe Motion Planning and Control Using Predictive and Adaptive Barrier Methods for Autonomous Surface Vessels IROS 2025
Safe motion planning is essential for autonomous vessel operations, especially in challenging spaces such as narrow inland waterways. However, conventional motion planning approaches are often computationally intensive or overly conservative. This paper proposes a safe motion planning strategy combining Model Predictive Control (MPC) and Control Barrier Functions (CBFs). We introduce a time-varying inflated ellipse obstacle representation, where the inflation radius is adjusted depending on the relative position and attitude between the vessel and the obstacle. The proposed adaptive inflation reduces the conservativeness of the controller compared to traditional fixed-ellipsoid obstacle formulations. The MPC solution provides an approximate motion plan, and high-order CBFs ensure the vessel's safety using the varying inflation radius. Simulation and real-world experiments demonstrate that the proposed strategy enables the fully-actuated autonomous robot vessel to navigate through narrow spaces in real time and resolve potential deadlocks, all while ensuring safety.
comment: IROS 2025
Kilometer-Scale GNSS-Denied UAV Navigation via Heightmap Gradients: A Winning System from the SPRIN-D Challenge
Reliable long-range flight of unmanned aerial vehicles (UAVs) in GNSS-denied environments is challenging: integrating odometry leads to drift, loop closures are unavailable in previously unseen areas and embedded platforms provide limited computational power. We present a fully onboard UAV system developed for the SPRIN-D Funke Fully Autonomous Flight Challenge, which required 9 km long-range waypoint navigation below 25 m AGL (Above Ground Level) without GNSS or prior dense mapping. The system integrates perception, mapping, planning, and control with a lightweight drift-correction method that matches LiDAR-derived local heightmaps to a prior geo-data heightmap via gradient-template matching and fuses the evidence with odometry in a clustered particle filter. Deployed during the competition, the system executed kilometer-scale flights across urban, forest, and open-field terrain and reduced drift substantially relative to raw odometry, while running in real time on CPU-only hardware. We describe the system architecture, the localization pipeline, and the competition evaluation, and we report practical insights from field deployment that inform the design of GNSS-denied UAV autonomy.
comment: 8 pages
Real-Time Trajectory Generation and Hybrid Lyapunov-Based Control for Hopping Robots
The advent of rotor-based hopping robots has created very capable hopping platforms with high agility and efficiency, and similar controllability, as compared to their purely flying quadrotor counterparts. Advances in robot performance have increased the hopping height to greater than 4 meters and opened up the possibility for more complex aerial trajectories (i.e., behaviors). However, currently hopping robots do not directly control their aerial trajectory or transition to flight, eliminating the efficiency benefits of a hopping system. Here we show a real-time, computationally efficiency, non-linear drag compensated, trajectory generation methodology and accompanying Lyapunov-based controller. The combined system can create and follow complex aerial trajectories from liftoff to touchdown on horizontal and vertical surfaces, while maintaining strick control over the orientation at touchdown. The computational efficiency provides broad applicability across all size scales of hopping robots while maintaining applicability to quadrotors in general.
comment: 7 pages, 4 figures, 4 tables
Strategic Fusion of Vision Language Models: Shapley-Credited Context-Aware Dawid-Skene for Multi-Label Tasks in Autonomous Driving
Large vision-language models (VLMs) are increasingly used in autonomous-vehicle (AV) stacks, but hallucination limits their reliability in safety-critical pipelines. We present Shapley-credited Context-Aware Dawid-Skene with Agreement, a game-theoretic fusion method for multi-label understanding of ego-view dashcam video. It learns per-model, per-label, context-conditioned reliabilities from labelled history and, at inference, converts each model's report into an agreement-guardrailed log-likelihood ratio that is combined with a contextual prior and a public reputation state updated via Shapley-based team credit. The result is calibrated, thresholdable posteriors that (i) amplify agreement among reliable models, (ii) preserve uniquely correct single-model signals, and (iii) adapt to drift. To specialise general VLMs, we curate 1,000 real-world dashcam clips with structured annotations (scene description, manoeuvre recommendation, rationale) via an automatic pipeline that fuses HDD ground truth, vehicle kinematics, and YOLOv11 + BoT-SORT tracking, guided by a three-step chain-of-thought prompt; three heterogeneous VLMs are then fine-tuned with LoRA. We evaluate with Hamming distance, Micro-Macro-F1, and average per-video latency. Empirically, the proposed method achieves a 23% reduction in Hamming distance, 55% improvement in Macro-F1, and 47% improvement in Micro-F1 when comparing with the best single model, supporting VLM fusion as a calibrated, interpretable, and robust decision-support component for AV pipelines.
comment: 8 pages
Compose Your Policies! Improving Diffusion-based or Flow-based Robot Policies via Test-time Distribution-level Composition
Diffusion-based models for robotic control, including vision-language-action (VLA) and vision-action (VA) policies, have demonstrated significant capabilities. Yet their advancement is constrained by the high cost of acquiring large-scale interaction datasets. This work introduces an alternative paradigm for enhancing policy performance without additional model training. Perhaps surprisingly, we demonstrate that the composed policies can exceed the performance of either parent policy. Our contribution is threefold. First, we establish a theoretical foundation showing that the convex composition of distributional scores from multiple diffusion models can yield a superior one-step functional objective compared to any individual score. A Gr\"onwall-type bound is then used to show that this single-step improvement propagates through entire generation trajectories, leading to systemic performance gains. Second, motivated by these results, we propose General Policy Composition (GPC), a training-free method that enhances performance by combining the distributional scores of multiple pre-trained policies via a convex combination and test-time search. GPC is versatile, allowing for the plug-and-play composition of heterogeneous policies, including VA and VLA models, as well as those based on diffusion or flow-matching, irrespective of their input visual modalities. Third, we provide extensive empirical validation. Experiments on Robomimic, PushT, and RoboTwin benchmarks, alongside real-world robotic evaluations, confirm that GPC consistently improves performance and adaptability across a diverse set of tasks. Further analysis of alternative composition operators and weighting strategies offers insights into the mechanisms underlying the success of GPC. These results establish GPC as a simple yet effective method for improving control performance by leveraging existing policies.
comment: Project Page: https://sagecao1125.github.io/GPC-Site/
Predictive Control Barrier Functions for Discrete-Time Linear Systems with Unmodeled Delays
This paper introduces a predictive control barrier function (PCBF) framework for enforcing state constraints in discrete-time systems with unknown relative degree, which can be caused by input delays or unmodeled input dynamics. Existing discrete-time CBF formulations typically require the construction of auxiliary barrier functions when the relative degree is greater than one, which complicates implementation and may yield conservative safe sets. The proposed PCBF framework addresses this challenge by extending the prediction horizon to construct a CBF for an associated system with relative degree one. As a result, the superlevel set of the PCBF coincides with the safe set, simplifying constraint enforcement and eliminating the need for auxiliary functions. The effectiveness of the proposed method is demonstrated on a discrete-time double integrator with input delay and a bicopter system with position constraints.
comment: 8 pages, 7 figures, submitted to ACC 2026
KeySG: Hierarchical Keyframe-Based 3D Scene Graphs
In recent years, 3D scene graphs have emerged as a powerful world representation, offering both geometric accuracy and semantic richness. Combining 3D scene graphs with large language models enables robots to reason, plan, and navigate in complex human-centered environments. However, current approaches for constructing 3D scene graphs are semantically limited to a predefined set of relationships, and their serialization in large environments can easily exceed an LLM's context window. We introduce KeySG, a framework that represents 3D scenes as a hierarchical graph consisting of floors, rooms, objects, and functional elements, where nodes are augmented with multi-modal information extracted from keyframes selected to optimize geometric and visual coverage. The keyframes allow us to efficiently leverage VLM to extract scene information, alleviating the need to explicitly model relationship edges between objects, enabling more general, task-agnostic reasoning and planning. Our approach can process complex and ambiguous queries while mitigating the scalability issues associated with large scene graphs by utilizing a hierarchical retrieval-augmented generation (RAG) pipeline to extract relevant context from the graph. Evaluated across four distinct benchmarks -- including 3D object segmentation and complex query retrieval -- KeySG outperforms prior approaches on most metrics, demonstrating its superior semantic richness and efficiency.
ROSplane 2.0: A Fixed-Wing Autopilot for Research
Unmanned aerial vehicle (UAV) research requires the integration of cutting-edge technology into existing autopilot frameworks. This process can be arduous, requiring extensive resources, time, and detailed knowledge of the existing system. ROSplane is a lean, open-source fixed-wing autonomy stack built by researchers for researchers. It is designed to accelerate research by providing clearly defined interfaces with an easily modifiable framework. Powered by ROS 2, ROSplane allows for rapid integration of low or high-level control, path planning, or estimation algorithms. A focus on lean, easily understood code and extensive documentation lowers the barrier to entry for researchers. Recent developments to ROSplane improve its capacity to accelerate UAV research, including the transition from ROS 1 to ROS 2, enhanced estimation and control algorithms, increased modularity, and an improved aerodynamic modeling pipeline. This aerodynamic modeling pipeline significantly reduces the effort of transitioning from simulation to real-world testing without requiring expensive system identification or computational fluid dynamics tools. ROSplane's architecture reduces the effort required to integrate new research tools and methods, expediting hardware experimentation.
Prometheus: Universal, Open-Source Mocap-Based Teleoperation System with Force Feedback for Dataset Collection in Robot Learning
This paper presents a novel teleoperation system with force feedback, utilizing consumer-grade HTC Vive Trackers 2.0. The system integrates a custom-built controller, a UR3 robotic arm, and a Robotiq gripper equipped with custom-designed fingers to ensure uniform pressure distribution on an embedded force sensor. Real-time compression force data is transmitted to the controller, enabling operators to perceive the gripping force applied to objects. Experimental results demonstrate that the system enhances task success rates and provides a low-cost solution for large-scale imitation learning data collection without compromising affordability.
ROSflight 2.0: Lean ROS 2-Based Autopilot for Unmanned Aerial Vehicles
ROSflight is a lean, open-source autopilot ecosystem for unmanned aerial vehicles (UAVs). Designed by researchers for researchers, it is built to lower the barrier to entry to UAV research and accelerate the transition from simulation to hardware experiments by maintaining a lean (not full-featured), well-documented, and modular codebase. This publication builds on previous treatments and describes significant additions to the architecture that improve the modularity and usability of ROSflight, including the transition from ROS 1 to ROS 2, supported hardware, low-level actuator mixing, and the simulation environment. We believe that these changes improve the usability of ROSflight and enable ROSflight to accelerate research in areas like advanced-air mobility. Hardware results are provided, showing that ROSflight is able to control a multirotor over a serial connection at 400 Hz while closing all control loops on the companion computer.
comment: To be submitted to the 2026 IEEE International Conference on Robotics and Automation in Vienna, Austria
Non-submodular Visual Attention for Robot Navigation
This paper presents a task-oriented computational framework to enhance Visual-Inertial Navigation (VIN) in robots, addressing challenges such as limited time and energy resources. The framework strategically selects visual features using a Mean Squared Error (MSE)-based, non-submodular objective function and a simplified dynamic anticipation model. To address the NP-hardness of this problem, we introduce four polynomial-time approximation algorithms: a classic greedy method with constant-factor guarantees; a low-rank greedy variant that significantly reduces computational complexity; a randomized greedy sampler that balances efficiency and solution quality; and a linearization-based selector based on a first-order Taylor expansion for near-constant-time execution. We establish rigorous performance bounds by leveraging submodularity ratios, curvature, and element-wise curvature analyses. Extensive experiments on both standardized benchmarks and a custom control-aware platform validate our theoretical results, demonstrating that these methods achieve strong approximation guarantees while enabling real-time deployment.
comment: 22 pages; Accepted to appear in IEEE Transactions on Robotics (T-RO)
Product-oriented Product-Process-Resource Asset Network and its Representation in AutomationML for Asset Administration Shell
Current products, especially in the automotive sector, pose complex technical systems having a multi-disciplinary mechatronic nature. Industrial standards supporting system engineering and production typically (i) address the production phase only, but do not cover the complete product life cycle, and (ii) focus on production processes and resources rather than the products themselves. The presented approach is motivated by incorporating impacts of end-of-life phase of the product life cycle into the engineering phase. This paper proposes a modelling approach coming up from the Product-Process-Resource (PPR) modeling paradigm. It combines requirements on (i) respecting the product structure as a basis for the model, and (ii) it incorporates repairing, remanufacturing, or upcycling within cyber-physical production systems. The proposed model called PoPAN should accompany the product during the entire life cycle as a digital shadow encapsulated within the Asset Administration Shell of a product. To facilitate the adoption of the proposed paradigm, the paper also proposes serialization of the model in the AutomationML data format. The model is demonstrated on a use-case for disassembling electric vehicle batteries to support their remanufacturing for stationary battery applications.
comment: This work has been submitted to the IEEE for possible publication. 8 pages, 6 figures
RTFF: Random-to-Target Fabric Flattening Policy using Dual-Arm Manipulator
Robotic fabric manipulation in garment production for sewing, cutting, and ironing requires reliable flattening and alignment, yet remains challenging due to fabric deformability, effectively infinite degrees of freedom, and frequent occlusions from wrinkles, folds, and the manipulator's End-Effector (EE) and arm. To address these issues, this paper proposes the first Random-to-Target Fabric Flattening (RTFF) policy, which aligns a random wrinkled fabric state to an arbitrary wrinkle-free target state. The proposed policy adopts a hybrid Imitation Learning-Visual Servoing (IL-VS) framework, where IL learns with explicit fabric models for coarse alignment of the wrinkled fabric toward a wrinkle-free state near the target, and VS ensures fine alignment to the target. Central to this framework is a template-based mesh that offers precise target state representation, wrinkle-aware geometry prediction, and consistent vertex correspondence across RTFF manipulation steps, enabling robust manipulation and seamless IL-VS switching. Leveraging the power of mesh, a novel IL solution for RTFF-Mesh Action Chunking Transformer (MACT)-is then proposed by conditioning the mesh information into a Transformer-based policy. The RTFF policy is validated on a real dual-arm tele-operation system, showing zero-shot alignment to different targets, high accuracy, and strong generalization across fabrics and scales. Project website: https://kaitang98.github.io/RTFF_Policy/
comment: 9 pages, 6 figures, conference
Semantic Visual Simultaneous Localization and Mapping: A Survey on State of the Art, Challenges, and Future Directions
Semantic Simultaneous Localization and Mapping (SLAM) is a critical area of research within robotics and computer vision, focusing on the simultaneous localization of robotic systems and associating semantic information to construct the most accurate and complete comprehensive model of the surrounding environment. Since the first foundational work in Semantic SLAM appeared more than two decades ago, this field has received increasing attention across various scientific communities. Despite its significance, the field lacks comprehensive surveys encompassing recent advances and persistent challenges. In response, this study provides a thorough examination of the state-of-the-art of Semantic SLAM techniques, with the aim of illuminating current trends and key obstacles. Beginning with an in-depth exploration of the evolution of visual SLAM, this study outlines its strengths and unique characteristics, while also critically assessing previous survey literature. Subsequently, a unified problem formulation and evaluation of the modular solution framework is proposed, which divides the problem into discrete stages, including visual localization, semantic feature extraction, mapping, data association, and loop closure optimization. Moreover, this study investigates alternative methodologies such as deep learning and the utilization of large language models, alongside a review of relevant research about contemporary SLAM datasets. Concluding with a discussion on potential future research directions, this study serves as a comprehensive resource for researchers seeking to navigate the complex landscape of Semantic SLAM.
Tele-rehabilitation with online skill transfer and adaptation in $\mathbb{R}^3 \times \mathit{S}^3$
This paper proposes a tele-teaching framework for the domain of robot-assisted tele-rehabilitation. The system connects two robotic manipulators on therapist and patient side via bilateral teleoperation, enabling a therapist to remotely demonstrate rehabilitation exercises that are executed by the patient-side robot. A 6-DoF Dynamical Movement Primitives formulation is employed to jointly encode translational and rotational motions in $\mathbb{R}^3 \times \mathit{S}^3$ space, ensuring accurate trajectory reproduction. The framework supports smooth transitions between therapist-led guidance and patient passive training, while allowing adaptive adjustment of motion. Experiments with 7-DoF manipulators demonstrate the feasibility of the approach, highlighting its potential for personalized and remotely supervised rehabilitation.
CroSTAta: Cross-State Transition Attention Transformer for Robotic Manipulation
Learning robotic manipulation policies through supervised learning from demonstrations remains challenging when policies encounter execution variations not explicitly covered during training. While incorporating historical context through attention mechanisms can improve robustness, standard approaches process all past states in a sequence without explicitly modeling the temporal structure that demonstrations may include, such as failure and recovery patterns. We propose a Cross-State Transition Attention Transformer that employs a novel State Transition Attention (STA) mechanism to modulate standard attention weights based on learned state evolution patterns, enabling policies to better adapt their behavior based on execution history. Our approach combines this structured attention with temporal masking during training, where visual information is randomly removed from recent timesteps to encourage temporal reasoning from historical context. Evaluation in simulation shows that STA consistently outperforms standard cross-attention and temporal modeling approaches like TCN and LSTM networks across all tasks, achieving more than 2x improvement over cross-attention on precision-critical tasks.
comment: Code and data available at https://github.com/iit-DLSLab/croSTAta
MultiPhysio-HRC: Multimodal Physiological Signals Dataset for industrial Human-Robot Collaboration
Human-robot collaboration (HRC) is a key focus of Industry 5.0, aiming to enhance worker productivity while ensuring well-being. The ability to perceive human psycho-physical states, such as stress and cognitive load, is crucial for adaptive and human-aware robotics. This paper introduces MultiPhysio-HRC, a multimodal dataset containing physiological, audio, and facial data collected during real-world HRC scenarios. The dataset includes electroencephalography (EEG), electrocardiography (ECG), electrodermal activity (EDA), respiration (RESP), electromyography (EMG), voice recordings, and facial action units. The dataset integrates controlled cognitive tasks, immersive virtual reality experiences, and industrial disassembly activities performed manually and with robotic assistance, to capture a holistic view of the participants' mental states. Rich ground truth annotations were obtained using validated psychological self-assessment questionnaires. Baseline models were evaluated for stress and cognitive load classification, demonstrating the dataset's potential for affective computing and human-aware robotics research. MultiPhysio-HRC is publicly available to support research in human-centered automation, workplace well-being, and intelligent robotic systems.
Shared Object Manipulation with a Team of Collaborative Quadrupeds
Utilizing teams of multiple robots is advantageous for handling bulky objects. Many related works focus on multi-manipulator systems, which are limited by workspace constraints. In this paper, we extend a classical hybrid motion-force controller to a team of legged manipulator systems, enabling collaborative loco-manipulation of rigid objects with a force-closed grasp. Our novel approach allows the robots to flexibly coordinate their movements, achieving efficient and stable object co-manipulation and transport, validated through extensive simulations and real-world experiments.
comment: 8 pages, 9 figures, submitted to The 2026 American Control Conference
Enabling High-Frequency Cross-Modality Visual Positioning Service for Accurate Drone Landing
After years of growth, drone-based delivery is transforming logistics. At its core, real-time 6-DoF drone pose tracking enables precise flight control and accurate drone landing. With the widespread availability of urban 3D maps, the Visual Positioning Service (VPS), a mobile pose estimation system, has been adapted to enhance drone pose tracking during the landing phase, as conventional systems like GPS are unreliable in urban environments due to signal attenuation and multi-path propagation. However, deploying the current VPS on drones faces limitations in both estimation accuracy and efficiency. In this work, we redesign drone-oriented VPS with the event camera and introduce EV-Pose to enable accurate, high-frequency 6-DoF pose tracking for accurate drone landing. EV-Pose introduces a spatio-temporal feature-instructed pose estimation module that extracts a temporal distance field to enable 3D point map matching for pose estimation; and a motion-aware hierarchical fusion and optimization scheme to enhance the above estimation in accuracy and efficiency, by utilizing drone motion in the \textit{early stage} of event filtering and the \textit{later stage} of pose optimization. Evaluation shows that EV-Pose achieves a rotation accuracy of 1.34$\degree$ and a translation accuracy of 6.9$mm$ with a tracking latency of 10.08$ms$, outperforming baselines by $>$50\%, \tmcrevise{thus enabling accurate drone landings.} Demo: https://ev-pose.github.io/
comment: 15 pages, 23 figures
Trajectory Based Observer Design: A Framework for Lightweight Sensor Fusion
Efficient observer design and accurate sensor fusion are key in state estimation. This work proposes an optimization-based methodology, termed Trajectory Based Optimization Design (TBOD), allowing the user to easily design observers for general nonlinear systems and multi-sensor setups. Starting from parametrized observer dynamics, the proposed method considers a finite set of pre-recorded measurement trajectories from the nominal plant and exploits them to tune the observer parameters through numerical optimization. This research hinges on the classic observer's theory and Moving Horizon Estimators methodology. Optimization is exploited to ease the observer's design, providing the user with a lightweight, general-purpose sensor fusion methodology. TBOD's main characteristics are the capability to handle general sensors efficiently and in a modular way and, most importantly, its straightforward tuning procedure. The TBOD's performance is tested on a terrestrial rover localization problem, combining IMU and ranging sensors provided by Ultra Wide Band antennas, and validated through a motion-capture system. Comparison with an Extended Kalman Filter is also provided, matching its position estimation accuracy and significantly improving in the orientation.
What Did I Learn? Operational Competence Assessment for AI-Based Trajectory Planners
Automated driving functions increasingly rely on machine learning for tasks like perception and trajectory planning, requiring large, relevant datasets. The performance of these algorithms depends on how closely the training data matches the task. To ensure reliable functioning, it is crucial to know what is included in the dataset to assess the trained model's operational risk. We aim to enhance the safe use of machine learning in automated driving by developing a method to recognize situations that an automated vehicle has not been sufficiently trained on. This method also improves explainability by describing the dataset at a human-understandable level. We propose modeling driving data as knowledge graphs, representing driving scenes with entities and their relationships. These graphs are queried for specific sub-scene configurations to check their occurrence in the dataset. We estimate a vehicle's competence in a driving scene by considering the coverage and complexity of sub-scene configurations in the training set. Higher complexity scenes require greater coverage for high competence. We apply this method to the NuPlan dataset, modeling it with knowledge graphs and analyzing the coverage of specific driving scenes. This approach helps monitor the competence of machine learning models trained on the dataset, which is essential for trustworthy AI to be deployed in automated driving.
comment: Accepted for publication in proceedings of the 2025 IEEE International Automated Vehicle Validation Conference
Hybrid Training for Vision-Language-Action Models
Using Large Language Models to produce intermediate thoughts, a.k.a. Chain-of-thought (CoT), before providing an answer has been a successful recipe for solving complex language tasks. In robotics, similar embodied CoT strategies, generating thoughts before actions, have also been shown to lead to improved performance when using Vision-Language-Action models (VLAs). As these techniques increase the length of the model's generated outputs to include the thoughts, the inference time is negatively affected. Delaying an agent's actions in real-world executions, as in robotic manipulation settings, strongly affects the usability of a method, as tasks require long sequences of actions. However, is the generation of long chains-of-thought a strong prerequisite for achieving performance improvements? In this work, we explore the idea of Hybrid Training (HyT), a framework that enables VLAs to learn from thoughts and benefit from the associated performance gains, while enabling the possibility to leave out CoT generation during inference. Furthermore, by learning to conditionally predict a diverse set of outputs, HyT supports flexibility at inference time, enabling the model to either predict actions directly, generate thoughts or follow instructions. We evaluate the proposed method in a series of simulated benchmarks and real-world experiments.
GRITS: A Spillage-Aware Guided Diffusion Policy for Robot Food Scooping Tasks
Robotic food scooping is a critical manipulation skill for food preparation and service robots. However, existing robot learning algorithms, especially learn-from-demonstration methods, still struggle to handle diverse and dynamic food states, which often results in spillage and reduced reliability. In this work, we introduce GRITS: A Spillage-Aware Guided Diffusion Policy for Robot Food Scooping Tasks. This framework leverages guided diffusion policy to minimize food spillage during scooping and to ensure reliable transfer of food items from the initial to the target location. Specifically, we design a spillage predictor that estimates the probability of spillage given current observation and action rollout. The predictor is trained on a simulated dataset with food spillage scenarios, constructed from four primitive shapes (spheres, cubes, cones, and cylinders) with varied physical properties such as mass, friction, and particle size. At inference time, the predictor serves as a differentiable guidance signal, steering the diffusion sampling process toward safer trajectories while preserving task success. We validate GRITS on a real-world robotic food scooping platform. GRITS is trained on six food categories and evaluated on ten unseen categories with different shapes and quantities. GRITS achieves an 82% task success rate and a 4% spillage rate, reducing spillage by over 40% compared to baselines without guidance, thereby demonstrating its effectiveness.
Two stage GNSS outlier detection for factor graph optimization based GNSS-RTK/INS/odometer fusion
Reliable GNSS positioning in complex environments remains a critical challenge due to non-line-of-sight (NLOS) propagation, multipath effects, and frequent signal blockages. These effects can easily introduce large outliers into the raw pseudo-range measurements, which significantly degrade the performance of global navigation satellite system (GNSS) real-time kinematic (RTK) positioning and limit the effectiveness of tightly coupled GNSS-based integrated navigation system. To address this issue, we propose a two-stage outlier detection method and apply the method in a tightly coupled GNSS-RTK, inertial navigation system (INS), and odometer integration based on factor graph optimization (FGO). In the first stage, Doppler measurements are employed to detect pseudo-range outliers in a GNSS-only manner, since Doppler is less sensitive to multipath and NLOS effects compared with pseudo-range, making it a more stable reference for detecting sudden inconsistencies. In the second stage, pre-integrated inertial measurement units (IMU) and odometer constraints are used to generate predicted double-difference pseudo-range measurements, which enable a more refined identification and rejection of remaining outliers. By combining these two complementary stages, the system achieves improved robustness against both gross pseudo-range errors and degraded satellite measuring quality. The experimental results demonstrate that the two-stage detection framework significantly reduces the impact of pseudo-range outliers, and leads to improved positioning accuracy and consistency compared with representative baseline approaches. In the deep urban canyon test, the outlier mitigation method has limits the RMSE of GNSS-RTK/INS/odometer fusion from 0.52 m to 0.30 m, with 42.3% improvement.
From Human Hands to Robot Arms: Manipulation Skills Transfer via Trajectory Alignment
Learning diverse manipulation skills for real-world robots is severely bottlenecked by the reliance on costly and hard-to-scale teleoperated demonstrations. While human videos offer a scalable alternative, effectively transferring manipulation knowledge is fundamentally hindered by the significant morphological gap between human and robotic embodiments. To address this challenge and facilitate skill transfer from human to robot, we introduce Traj2Action,a novel framework that bridges this embodiment gap by using the 3D trajectory of the operational endpoint as a unified intermediate representation, and then transfers the manipulation knowledge embedded in this trajectory to the robot's actions. Our policy first learns to generate a coarse trajectory, which forms an high-level motion plan by leveraging both human and robot data. This plan then conditions the synthesis of precise, robot-specific actions (e.g., orientation and gripper state) within a co-denoising framework. Extensive real-world experiments on a Franka robot demonstrate that Traj2Action boosts the performance by up to 27% and 22.25% over $\pi_0$ baseline on short- and long-horizon real-world tasks, and achieves significant gains as human data scales in robot policy learning. Our project website, featuring code and video demonstrations, is available at https://anonymous.4open.science/w/Traj2Action-4A45/.
Integrating Offline Pre-Training with Online Fine-Tuning: A Reinforcement Learning Approach for Robot Social Navigation
Offline reinforcement learning (RL) has emerged as a promising framework for addressing robot social navigation challenges. However, inherent uncertainties in pedestrian behavior and limited environmental interaction during training often lead to suboptimal exploration and distributional shifts between offline training and online deployment. To overcome these limitations, this paper proposes a novel offline-to-online fine-tuning RL algorithm for robot social navigation by integrating Return-to-Go (RTG) prediction into a causal Transformer architecture. Our algorithm features a spatiotem-poral fusion model designed to precisely estimate RTG values in real-time by jointly encoding temporal pedestrian motion patterns and spatial crowd dynamics. This RTG prediction framework mitigates distribution shift by aligning offline policy training with online environmental interactions. Furthermore, a hybrid offline-online experience sampling mechanism is built to stabilize policy updates during fine-tuning, ensuring balanced integration of pre-trained knowledge and real-time adaptation. Extensive experiments in simulated social navigation environments demonstrate that our method achieves a higher success rate and lower collision rate compared to state-of-the-art baselines. These results underscore the efficacy of our algorithm in enhancing navigation policy robustness and adaptability. This work paves the way for more reliable and adaptive robotic navigation systems in real-world applications.
Seeing through Uncertainty: Robust Task-Oriented Optimization in Visual Navigation
Visual navigation is a fundamental problem in embodied AI, yet practical deployments demand long-horizon planning capabilities to address multi-objective tasks. A major bottleneck is data scarcity: policies learned from limited data often overfit and fail to generalize OOD. Existing neural network-based agents typically increase architectural complexity that paradoxically become counterproductive in the small-sample regime. This paper introduce NeuRO, a integrated learning-to-optimize framework that tightly couples perception networks with downstream task-level robust optimization. Specifically, NeuRO addresses core difficulties in this integration: (i) it transforms noisy visual predictions under data scarcity into convex uncertainty sets using Partially Input Convex Neural Networks (PICNNs) with conformal calibration, which directly parameterize the optimization constraints; and (ii) it reformulates planning under partial observability as a robust optimization problem, enabling uncertainty-aware policies that transfer across environments. Extensive experiments on both unordered and sequential multi-object navigation tasks demonstrate that NeuRO establishes SoTA performance, particularly in generalization to unseen environments. Our work thus presents a significant advancement for developing robust, generalizable autonomous agents.
Conflict-Based Search as a Protocol: A Multi-Agent Motion Planning Protocol for Heterogeneous Agents, Solvers, and Independent Tasks
Imagine the future construction site, hospital, office, or even sophisticated household with dozens of robots bought from different manufacturers. How can we enable these different systems to effectively move in a shared environment, given that each robot may have its own independent motion planning system? This work shows how we can get efficient collision-free movements between algorithmically heterogeneous agents by using Conflict-Based Search (Sharon et al. 2015) as a protocol. At its core, the CBS Protocol requires one specific single-agent motion planning API; finding a collision-free path that satisfies certain space-time constraints. Given such an API, CBS uses a central planner to find collision-free paths - independent of how the API is implemented. We show how this protocol enables multi-agent motion planning for a heterogeneous team of agents completing independent tasks with a variety of single-agent planners including: Heuristic Search (e.g., A*), Sampling Based Search (e.g., RRT), Optimization (e.g., Direct Collocation), Diffusion, and Reinforcement Learning.
comment: Project webpage: https://rishi-v.github.io/CBS-Protocol/
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators
Vision-Language-Action (VLA) models enable embodied decision-making but rely heavily on imitation learning, leading to compounding errors and poor robustness under distribution shift. Reinforcement learning (RL) can mitigate these issues yet typically demands costly real-world interactions or suffers from sim-to-real gaps. We introduce VLA-RFT, a reinforcement fine-tuning framework that leverages a data-driven world model as a controllable simulator. Trained from real interaction data, the simulator predicts future visual observations conditioned on actions, allowing policy rollouts with dense, trajectory-level rewards derived from goal-achieving references. This design delivers an efficient and action-aligned learning signal, drastically lowering sample requirements. With fewer than 400 fine-tuning steps, VLA-RFT surpasses strong supervised baselines and achieves greater efficiency than simulator-based RL. Moreover, it exhibits strong robustness under perturbed conditions, sustaining stable task execution. Our results establish world-model-based RFT as a practical post-training paradigm to enhance the generalization and robustness of VLA models. For more details, please refer to https://vla-rft.github.io/.
EgoTraj-Bench: Towards Robust Trajectory Prediction Under Ego-view Noisy Observations
Reliable trajectory prediction from an ego-centric perspective is crucial for robotic navigation in human-centric environments. However, existing methods typically assume idealized observation histories, failing to account for the perceptual artifacts inherent in first-person vision, such as occlusions, ID switches, and tracking drift. This discrepancy between training assumptions and deployment reality severely limits model robustness. To bridge this gap, we introduce EgoTraj-Bench, the first real-world benchmark that grounds noisy, first-person visual histories in clean, bird's-eye-view future trajectories, enabling robust learning under realistic perceptual constraints. Building on this benchmark, we propose BiFlow, a dual-stream flow matching model that concurrently denoises historical observations and forecasts future motion by leveraging a shared latent representation. To better model agent intent, BiFlow incorporates our EgoAnchor mechanism, which conditions the prediction decoder on distilled historical features via feature modulation. Extensive experiments show that BiFlow achieves state-of-the-art performance, reducing minADE and minFDE by 10-15% on average and demonstrating superior robustness. We anticipate that our benchmark and model will provide a critical foundation for developing trajectory forecasting systems truly resilient to the challenges of real-world, ego-centric perception.
Physics-Informed Neural Controlled Differential Equations for Scalable Long Horizon Multi-Agent Motion Forecasting
Long-horizon motion forecasting for multiple autonomous robots is challenging due to non-linear agent interactions, compounding prediction errors, and continuous-time evolution of dynamics. Learned dynamics of such a system can be useful in various applications such as travel time prediction, prediction-guided planning and generative simulation. In this work, we aim to develop an efficient trajectory forecasting model conditioned on multi-agent goals. Motivated by the recent success of physics-guided deep learning for partially known dynamical systems, we develop a model based on neural Controlled Differential Equations (CDEs) for long-horizon motion forecasting. Unlike discrete-time methods such as RNNs and transformers, neural CDEs operate in continuous time, allowing us to combine physics-informed constraints and biases to jointly model multi-robot dynamics. Our approach, named PINCoDE (Physics-Informed Neural Controlled Differential Equations), learns differential equation parameters that can be used to predict the trajectories of a multi-agent system starting from an initial condition. PINCoDE is conditioned on future goals and enforces physics constraints for robot motion over extended periods of time. We adopt a strategy that scales our model from 10 robots to 100 robots without the need for additional model parameters, while producing predictions with an average ADE below 0.5 m for a 1-minute horizon. Furthermore, progressive training with curriculum learning for our PINCoDE model results in a 2.7X reduction of forecasted pose error over 4 minute horizons compared to analytical models.
State Estimation for Compliant and Morphologically Adaptive Robots
Locomotion robots with active or passive compliance can show robustness to uncertain scenarios, which can be promising for agricultural, research and environmental industries. However, state estimation for these robots is challenging due to the lack of rigid-body assumptions and kinematic changes from morphing. We propose a method to estimate typical rigid-body states alongside compliance-related states, such as soft robot shape in different morphologies and locomotion modes. Our neural network-based state estimator uses a history of states and a mechanism to directly influence unreliable sensors. We test our framework on the GOAT platform, a robot capable of passive compliance and active morphing for extreme outdoor terrain. The network is trained on motion capture data in a novel compliance-centric frame that accounts for morphing-related states. Our method predicts shape-related measurements within 4.2% of the robot's size, velocities within 6.3% and 2.4% of the top linear and angular speeds, respectively, and orientation within 1.5 degrees. We also demonstrate a 300% increase in travel range during a motor malfunction when using our estimator for closed-loop autonomous outdoor operation.
comment: 8 pages, 10 figures, 1 table, preprint
Beyond Needle(s) in the Embodied Haystack: Environment, Architecture, and Training Considerations for Long Context Reasoning
We introduce $\infty$-THOR, a new framework for long-horizon embodied tasks that advances long-context understanding in embodied AI. $\infty$-THOR provides: (1) a generation framework for synthesizing scalable, reproducible, and unlimited long-horizon trajectories; (2) a novel embodied QA task, Needle(s) in the Embodied Haystack, where multiple scattered clues across extended trajectories test agents' long-context reasoning ability; and (3) a long-horizon dataset and benchmark suite featuring complex tasks that span hundreds of environment steps, each paired with ground-truth action sequences. To enable this capability, we explore architectural adaptations, including interleaved Goal-State-Action modeling, context extension techniques, and Context Parallelism, to equip LLM-based agents for extreme long-context reasoning and interaction. Experimental results and analyses highlight the challenges posed by our benchmark and provide insights into training strategies and model behaviors under long-horizon conditions. Our work provides a foundation for the next generation of embodied AI systems capable of robust, long-term reasoning and planning.
CogVLA: Cognition-Aligned Vision-Language-Action Model via Instruction-Driven Routing & Sparsification NeurIPS 2025
Recent Vision-Language-Action (VLA) models built on pre-trained Vision-Language Models (VLMs) require extensive post-training, resulting in high computational overhead that limits scalability and deployment.We propose CogVLA, a Cognition-Aligned Vision-Language-Action framework that leverages instruction-driven routing and sparsification to improve both efficiency and performance. CogVLA draws inspiration from human multimodal coordination and introduces a 3-stage progressive architecture. 1) Encoder-FiLM based Aggregation Routing (EFA-Routing) injects instruction information into the vision encoder to selectively aggregate and compress dual-stream visual tokens, forming a instruction-aware latent representation. 2) Building upon this compact visual encoding, LLM-FiLM based Pruning Routing (LFP-Routing) introduces action intent into the language model by pruning instruction-irrelevant visually grounded tokens, thereby achieving token-level sparsity. 3) To ensure that compressed perception inputs can still support accurate and coherent action generation, we introduce V-L-A Coupled Attention (CAtten), which combines causal vision-language attention with bidirectional action parallel decoding. Extensive experiments on the LIBERO benchmark and real-world robotic tasks demonstrate that CogVLA achieves state-of-the-art performance with success rates of 97.4% and 70.0%, respectively, while reducing training costs by 2.5-fold and decreasing inference latency by 2.8-fold compared to OpenVLA. CogVLA is open-sourced and publicly available at https://github.com/JiuTian-VL/CogVLA.
comment: Accepted to NeurIPS 2025, Project Page: https://jiutian-vl.github.io/CogVLA-page
Act to See, See to Act: Diffusion-Driven Perception-Action Interplay for Adaptive Policies NeurIPS 2025
Existing imitation learning methods decouple perception and action, which overlooks the causal reciprocity between sensory representations and action execution that humans naturally leverage for adaptive behaviors. To bridge this gap, we introduce Action-Guided Diffusion Policy (DP-AG), a unified representation learning that explicitly models a dynamic interplay between perception and action through probabilistic latent dynamics. DP-AG encodes latent observations into a Gaussian posterior via variational inference and evolves them using an action-guided SDE, where the Vector-Jacobian Product (VJP) of the diffusion policy's noise predictions serves as a structured stochastic force driving latent updates. To promote bidirectional learning between perception and action, we introduce a cycle-consistent contrastive loss that organizes the gradient flow of the noise predictor into a coherent perception-action loop, enforcing mutually consistent transitions in both latent updates and action refinements. Theoretically, we derive a variational lower bound for the action-guided SDE, and prove that the contrastive objective enhances continuity in both latent and action trajectories. Empirically, DP-AG significantly outperforms state-of-the-art methods across simulation benchmarks and real-world UR5 manipulation tasks. As a result, our DP-AG offers a promising step toward bridging biological adaptability and artificial policy learning.
comment: 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
LLM-guided Task and Motion Planning using Knowledge-based Reasoning
Performing complex manipulation tasks in dynamic environments requires efficient Task and Motion Planning (TAMP) approaches that combine high-level symbolic plans with low-level motion control. Advances in Large Language Models (LLMs), such as GPT-4, are transforming task planning by offering natural language as an intuitive and flexible way to describe tasks, generate symbolic plans, and reason. However, the effectiveness of LLM-based TAMP approaches is limited due to static and template-based prompting, which limits adaptability to dynamic environments and complex task contexts. To address these limitations, this work proposes a novel Onto-LLM-TAMP framework that employs knowledge-based reasoning to refine and expand user prompts with task-contextual reasoning and knowledge-based environment state descriptions. Integrating domain-specific knowledge into the prompt ensures semantically accurate and context-aware task plans. The proposed framework demonstrates its effectiveness by resolving semantic errors in symbolic plan generation, such as maintaining logical temporal goal ordering in scenarios involving hierarchical object placement. The proposed framework is validated through both simulation and real-world scenarios, demonstrating significant improvements over the baseline approach in terms of adaptability to dynamic environments and the generation of semantically correct task plans.
comment: Submitted to knowledge based systems
Opt2Skill: Imitating Dynamically-feasible Whole-Body Trajectories for Versatile Humanoid Loco-Manipulation
Humanoid robots are designed to perform diverse loco-manipulation tasks. However, they face challenges due to their high-dimensional and unstable dynamics, as well as the complex contact-rich nature of the tasks. Model-based optimal control methods offer flexibility to define precise motion but are limited by high computational complexity and accurate contact sensing. On the other hand, reinforcement learning (RL) handles high-dimensional spaces with strong robustness but suffers from inefficient learning, unnatural motion, and sim-to-real gaps. To address these challenges, we introduce Opt2Skill, an end-to-end pipeline that combines model-based trajectory optimization with RL to achieve robust whole-body loco-manipulation. Opt2Skill generates dynamic feasible and contact-consistent reference motions for the Digit humanoid robot using differential dynamic programming (DDP) and trains RL policies to track these optimal trajectories. Our results demonstrate that Opt2Skill outperforms baselines that rely on human demonstrations and inverse kinematics-based references, both in motion tracking and task success rates. Furthermore, we show that incorporating trajectories with torque information improves contact force tracking in contact-involved tasks, such as wiping a table. We have successfully transferred our approach to real-world applications.
DBF-MA: A Differential Bayesian Filtering Planner for Multi-Agent Autonomous Racing Overtakes
A significant challenge in autonomous racing is to generate overtaking maneuvers. Racing agents must execute these maneuvers on complex racetracks with little room for error. Optimization techniques and graph-based methods have been proposed, but these methods often rely on oversimplified assumptions for collision-avoidance and dynamic constraints. In this work, we present an approach to trajectory synthesis based on an extension of the Differential Bayesian Filtering framework. Our approach for collision-free trajectory synthesis frames the problem as one of Bayesian Inference over the space of Composite Bezier Curves. Our method is derivative-free, does not require a spherical approximation of the vehicle footprint, linearization of constraints, or simplifying upper bounds on collision avoidance. We conduct a closed-loop analysis of DBF-MA and find it successfully overtakes an opponent in 87% of tested scenarios, outperforming existing methods in autonomous overtaking.
comment: This work has been submitted to the IEEE for possible publication
Probabilistic Collision Risk Estimation through Gauss-Legendre Cubature and Non-Homogeneous Poisson Processes
Overtaking in high-speed autonomous racing demands precise, real-time estimation of collision risk; particularly in wheel-to-wheel scenarios where safety margins are minimal. Existing methods for collision risk estimation either rely on simplified geometric approximations, like bounding circles, or perform Monte Carlo sampling which leads to overly conservative motion planning behavior at racing speeds. We introduce the Gauss-Legendre Rectangle (GLR) algorithm, a principled two-stage integration method that estimates collision risk by combining Gauss-Legendre with a non-homogeneous Poisson process over time. GLR produces accurate risk estimates that account for vehicle geometry and trajectory uncertainty. In experiments across 446 overtaking scenarios in a high-fidelity Formula One racing simulation, GLR outperforms five state-of-the-art baselines achieving an average error reduction of 77% and surpassing the next-best method by 52%, all while running at 1000 Hz. The framework is general and applicable to broader motion planning contexts beyond autonomous racing.
comment: This work has been submitted to the IEEE for possible publication
HetSwarm: Cooperative Navigation of Heterogeneous Swarm in Dynamic and Dense Environments through Impedance-based Guidance
With the growing demand for efficient logistics and warehouse management, unmanned aerial vehicles (UAVs) are emerging as a valuable complement to automated guided vehicles (AGVs). UAVs enhance efficiency by navigating dense environments and operating at varying altitudes. However, their limited flight time, battery life, and payload capacity necessitate a supporting ground station. To address these challenges, we propose HetSwarm, a heterogeneous multi-robot system that combines a UAV and a mobile ground robot for collaborative navigation in cluttered and dynamic conditions. Our approach employs an artificial potential field (APF)-based path planner for the UAV, allowing it to dynamically adjust its trajectory in real time. The ground robot follows this path while maintaining connectivity through impedance links, ensuring stable coordination. Additionally, the ground robot establishes temporal impedance links with low-height ground obstacles to avoid local collisions, as these obstacles do not interfere with the UAV's flight. Experimental validation of HetSwarm in diverse environmental conditions demonstrated a 90% success rate across 30 test cases. The ground robot exhibited an average deviation of 45 cm near obstacles, confirming effective collision avoidance. Extensive simulations in the Gym PyBullet environment further validated the robustness of our system for real-world applications, demonstrating its potential for dynamic, real-time task execution in cluttered environments.
comment: Manuscript has been accepted by ICUAS-2025
Adaptive Diffusion Constrained Sampling for Bimanual Robot Manipulation
Coordinated multi-arm manipulation requires satisfying multiple simultaneous geometric constraints across high-dimensional configuration spaces, which poses a significant challenge for traditional planning and control methods. In this work, we propose Adaptive Diffusion Constrained Sampling (ADCS), a generative framework that flexibly integrates both equality (e.g., relative and absolute pose constraints) and structured inequality constraints (e.g., proximity to object surfaces) into an energy-based diffusion model. Equality constraints are modeled using dedicated energy networks trained on pose differences in Lie algebra space, while inequality constraints are represented via Signed Distance Functions (SDFs) and encoded into learned constraint embeddings, allowing the model to reason about complex spatial regions. A key innovation of our method is a Transformer-based architecture that learns to weight constraint-specific energy functions at inference time, enabling flexible and context-aware constraint integration. Moreover, we adopt a two-phase sampling strategy that improves precision and sample diversity by combining Langevin dynamics with resampling and density-aware re-weighting. Experimental results on dual-arm manipulation tasks show that ADCS significantly improves sample diversity and generalization across settings demanding precise coordination and adaptive constraint handling.
ImpedanceGPT: VLM-driven Impedance Control of Swarm of Mini-drones for Intelligent Navigation in Dynamic Environment IROS 2025
Swarm robotics plays a crucial role in enabling autonomous operations in dynamic and unpredictable environments. However, a major challenge remains ensuring safe and efficient navigation in environments filled with both dynamic alive (e.g., humans) and dynamic inanimate (e.g., non-living objects) obstacles. In this paper, we propose ImpedanceGPT, a novel system that combines a Vision-Language Model (VLM) with retrieval-augmented generation (RAG) to enable real-time reasoning for adaptive navigation of mini-drone swarms in complex environments. The key innovation of ImpedanceGPT lies in the integration of VLM and RAG, which provides the drones with enhanced semantic understanding of their surroundings. This enables the system to dynamically adjust impedance control parameters in response to obstacle types and environmental conditions. Our approach not only ensures safe and precise navigation but also improves coordination between drones in the swarm. Experimental evaluations demonstrate the effectiveness of the system. The VLM-RAG framework achieved an obstacle detection and retrieval accuracy of 80 % under optimal lighting. In static environments, drones navigated dynamic inanimate obstacles at 1.4 m/s but slowed to 0.7 m/s with increased separation around humans. In dynamic environments, speed adjusted to 1.0 m/s near hard obstacles, while reducing to 0.6 m/s with higher deflection to safely avoid moving humans.
comment: Accepted in IROS 2025
On the Application of Model Predictive Control to a Weighted Coverage Path Planning Problem
This paper considers the application of Model Predictive Control (MPC) to a weighted coverage path planning (WCPP) problem. The problem appears in a wide range of practical applications, including search and rescue (SAR) missions. The basic setup is that one (or multiple) agents can move around a given search space and collect rewards from a given spatial distribution. Unlike an artificial potential field, each reward can only be collected once. In contrast to a Traveling Salesman Problem (TSP), the agent moves in a continuous space. Moreover, he is not obliged to cover all locations and/or may return to previously visited locations. The WCPP problem is tackled by a new Model Predictive Control (MPC) formulation with so-called Coverage Constraints (CCs). It is shown that the solution becomes more effective if the solver is initialized with a TSP-based heuristic. With and without this initialization, the proposed MPC approach clearly outperforms a naive MPC formulation, as demonstrated in a small simulation study.
Learning Hierarchical Domain Models Through Environment-Grounded Interaction
Domain models enable autonomous agents to solve long-horizon tasks by producing interpretable plans. However, in open-world environments, a single general domain model cannot capture the variety of tasks, so agents must generate suitable task-specific models on the fly. Large Language Models (LLMs), with their implicit common knowledge, can generate such domains, but suffer from high error rates that limit their applicability. Hence, related work relies on extensive human feed-back or prior knowledge, which undermines autonomous, open-world deployment. In this work, we propose LODGE, a framework for autonomous domain learning from LLMs and environment grounding. LODGE builds on hierarchical abstractions and automated simulations to identify and correct inconsistencies between abstraction layers and between the model and environment. Our framework is task-agnostic, as it generates predicates, operators, and their preconditions and effects, while only assuming access to a simulator and a set of generic, executable low-level skills. Experiments on two International Planning Competition ( IPC) domains and a robotic assembly domain show that LODGE yields more accurate domain models and higher task success than existing methods, requiring remarkably few environment interactions and no human feedback or demonstrations.
HR-INR: Continuous Space-Time Video Super-Resolution via Event Camera
Continuous space-time video super-resolution (C-STVSR) aims to simultaneously enhance video resolution and frame rate at an arbitrary scale. Recently, implicit neural representation (INR) has been applied to video restoration, representing videos as implicit fields that can be decoded at an arbitrary scale. However, existing INR-based C-STVSR methods typically rely on only two frames as input, leading to insufficient inter-frame motion information. Consequently, they struggle to capture fast, complex motion and long-term dependencies (spanning more than three frames), hindering their performance in dynamic scenes. In this paper, we propose a novel C-STVSR framework, named HR-INR, which captures both holistic dependencies and regional motions based on INR. It is assisted by an event camera -- a novel sensor renowned for its high temporal resolution and low latency. To fully utilize the rich temporal information from events, we design a feature extraction consisting of (1) a regional event feature extractor -- taking events as inputs via the proposed event temporal pyramid representation to capture the regional nonlinear motion and (2) a holistic event-frame feature extractor for long-term dependence and continuity motion. We then propose a novel INR-based decoder with spatiotemporal embeddings to capture long-term dependencies with a larger temporal perception field. We validate the effectiveness and generalization of our method on four datasets (both simulated and real data), showing the superiority of our method. The project page is available at https://github.com/yunfanLu/HR-INR
comment: Project page: https://github.com/yunfanLu/HR-INR
RoVerFly: Robust and Versatile Implicit Hybrid Control of Quadrotor-Payload Systems
Designing robust controllers for precise trajectory tracking with quadrotors is challenging due to nonlinear dynamics and underactuation, and becomes harder with flexible cable-suspended payloads that add degrees of freedom and hybrid dynamics. Classical model-based methods offer stability guarantees but require extensive tuning and often fail to adapt when the configuration changes-when a payload is added or removed, or when its mass or cable length varies. We present RoVerFly, a unified learning-based control framework where a single reinforcement learning (RL) policy functions as an implicit hybrid controller, managing complex dynamics without explicit mode detection or controller switching. Trained with task and domain randomization, the controller is resilient to disturbances and varying dynamics. It achieves strong zero-shot generalization across payload settings-including no payload as well as varying mass and cable length-without re-tuning, while retaining the interpretability and structure of a feedback tracking controller. Code and supplementary materials are available at https://github.com/mintaeshkim/roverfly.
comment: 8 pages, 5 figures
Sampling-Based Global Optimal Control and Estimation via Semidefinite Programming
Global optimization has gained attraction over the past decades, thanks to the development of both theoretical foundations and efficient numerical routines. Among recent advances, Kernel Sum of Squares (KernelSOS) provides a powerful theoretical framework, combining the expressivity of kernel methods with the guarantees of SOS optimization. In this paper, we take KernelSOS from theory to practice and demonstrate its use on challenging control and robotics problems. We identify and address the practical considerations required to make the method work in applied settings: restarting strategies, systematic calibration of hyperparameters, methods for recovering minimizers, and the combination with fast local solvers. As a proof of concept, the application of KernelSOS to robot localization highlights its competitiveness with existing SOS approaches that rely on heuristics and handcrafted reformulations to render the problem polynomial. Even in the high-dimensional, non-parametric setting of trajectory optimization with simulators treated as black boxes, we demonstrate how KernelSOS can be combined with fast local solvers to uncover higher-quality solutions without compromising overall runtimes.
Heterogeneous Predictor-based Risk-Aware Planning with Conformal Prediction in Dense, Uncertain Environments
Real-time planning among many uncertain, dynamic obstacles is challenging because predicting every agent with high fidelity is both unnecessary and computationally expensive. We present Heterogeneous Predictor-based Risk-Aware Planning (H-PRAP), a framework that allocates prediction effort to where it matters. H-PRAP introduces the Probability-based Collision Risk Index (P-CRI), a closed-form, horizon-level collision index obtained by calibrating a Gaussian surrogate with conformal prediction. P-CRI drives a router that assigns high-risk obstacles to accurate but expensive predictors and low-risk obstacles to lightweight predictors, while preserving distribution-free coverage across heterogeneous predictors through conformal prediction. The selected predictions and their conformal radii are embedded in a chance-constrained model predictive control (MPC) problem, yielding receding-horizon policies with explicit safety margins. We analyze the safety-efficiency trade-off under prediction compute budget: more portion of low-fidelity predictions reduce residual risk from dropped obstacles, but in the same time induces larger conformal radii and degrades trajectory efficiency and shrinks MPC feasibility. Extensive numerical simulations in dense, uncertain environments validate that H-PRAP attains best balance between trajectory success rate (i.e., no collisions) and the time to reach the goal (i.e., trajectory efficiency) compared to single prediction architectures.
Poutine: Vision-Language-Trajectory Pre-Training and Reinforcement Learning Post-Training Enable Robust End-to-End Autonomous Driving
We present Poutine, a 3B-parameter vision-language model (VLM) tailored for end-to-end autonomous driving in long-tail driving scenarios. Poutine is trained in two stages. To obtain strong base driving capabilities, we train Poutine-Base in a self-supervised vision-language-trajectory (VLT) next-token prediction fashion on 83 hours of CoVLA nominal driving and 11 hours of Waymo long-tail driving. Accompanying language annotations are auto-generated with a 72B-parameter VLM. Poutine is obtained by fine-tuning Poutine-Base with Group Relative Policy Optimization (GRPO) using less than 500 preference-labeled frames from the Waymo validation set. We show that both VLT pretraining and RL fine-tuning are critical to attain strong driving performance in the long-tail. Poutine-Base achieves a rater-feedback score (RFS) of 8.12 on the validation set, nearly matching Waymo's expert ground-truth RFS. The final Poutine model achieves an RFS of 7.99 on the official Waymo test set, placing 1st in the 2025 Waymo Vision-Based End-to-End Driving Challenge by a significant margin. These results highlight the promise of scalable VLT pre-training and lightweight RL fine-tuning to enable robust and generalizable autonomy.
Are All Marine Species Created Equal? Performance Disparities in Underwater Object Detection
Underwater object detection is critical for monitoring marine ecosystems but poses unique challenges, including degraded image quality, imbalanced class distribution, and distinct visual characteristics. Not every species is detected equally well, yet underlying causes remain unclear. We address two key research questions: 1) What factors beyond data quantity drive class-specific performance disparities? 2) How can we systematically improve detection of under-performing marine species? We manipulate the DUO and RUOD datasets to separate the object detection task into localization and classification and investigate the under-performance of the scallop class. Localization analysis using YOLO11 and TIDE finds that foreground-background discrimination is the most problematic stage regardless of data quantity. Classification experiments reveal persistent precision gaps even with balanced data, indicating intrinsic feature-based challenges beyond data scarcity and inter-class dependencies. We recommend imbalanced distributions when prioritizing precision, and balanced distributions when prioritizing recall. Improving under-performing classes should focus on algorithmic advances, especially within localization modules. We publicly release our code and datasets.
comment: 10 pages for main paper, 4 pages for supplementary material
Certifiably Optimal Estimation and Calibration in Robotics via Trace-Constrained Semi-Definite Programming
Many nonconvex problems in robotics can be relaxed into convex formulations via Semi-Definite Programming (SDP) that can be solved to global optimality. The practical quality of these solutions, however, critically depends on rounding them to rank-1 matrices, a condition that can be challenging to achieve. In this work, we focus on trace-constrained SDPs (TCSDPs), where the decision variables are Positive Semi-Definite (PSD) matrices with fixed trace values. We show that the latter can be used to design a gradient-based refinement procedure that projects relaxed SDP solutions toward rank-1, low-cost candidates. We also provide fixed-trace SDP relaxations for common robotic quantities, such as rotations and translations, and a modular virtual robot abstraction that simplifies modeling across different problem settings. We demonstrate that our trace-constrained SDP framework can be applied to many robotics tasks, and we showcase its effectiveness through simulations in Perspective-n-Point (PnP) estimation, hand-eye calibration, and dual-robot system calibration.
comment: Manuscript submitted to American Control Conference (ACC) 2026
Curriculum Imitation Learning of Distributed Multi-Robot Policies
Learning control policies for multi-robot systems (MRS) remains a major challenge due to long-term coordination and the difficulty of obtaining realistic training data. In this work, we address both limitations within an imitation learning framework. First, we shift the typical role of Curriculum Learning in MRS, from scalability with the number of robots, to focus on improving long-term coordination. We propose a curriculum strategy that gradually increases the length of expert trajectories during training, stabilizing learning and enhancing the accuracy of long-term behaviors. Second, we introduce a method to approximate the egocentric perception of each robot using only third-person global state demonstrations. Our approach transforms idealized trajectories into locally available observations by filtering neighbors, converting reference frames, and simulating onboard sensor variability. Both contributions are integrated into a physics-informed technique to produce scalable, distributed policies from observations. We conduct experiments across two tasks with varying team sizes and noise levels. Results show that our curriculum improves long-term accuracy, while our perceptual estimation method yields policies that are robust to realistic uncertainty. Together, these strategies enable the learning of robust, distributed controllers from global demonstrations, even in the absence of expert actions or onboard measurements.
comment: Accepted and presented at the Eight Iberian Robotics Conference, 2025
Data-Driven Distributionally Robust Optimal Control with State-Dependent Noise
Distributionally Robust Optimal Control (DROC) is a framework that enables robust control in a stochastic setting where the true disturbance distribution is unknown. Traditional DROC approaches require given ambiguity sets and KL divergence bounds to represent the distributional uncertainty; however, these quantities are often unavailable a priori or require manual specification. To overcome this limitation, we propose a data-driven approach that jointly estimates the uncertainty distribution and the corresponding KL divergence bound, which we refer to as $\mathrm{D}^3\mathrm{ROC}$. To evaluate the effectiveness of our approach, we consider a car-like robot navigation task with unknown noise distributions. The experimental results show that $\mathrm{D}^3\mathrm{ROC}$ yields robust and effective control policies, outperforming iterative Linear Quadratic Gaussian (iLQG) control and demonstrating strong adaptability to varying noise distributions.
Hierarchical place recognition with omnidirectional images and curriculum learning-based loss functions
This paper addresses Visual Place Recognition (VPR), which is essential for the safe navigation of mobile robots. The solution we propose employs panoramic images and deep learning models, which are fine-tuned with triplet loss functions that integrate curriculum learning strategies. By progressively presenting more challenging examples during training, these loss functions enable the model to learn more discriminative and robust feature representations, overcoming the limitations of conventional contrastive loss functions. After training, VPR is tackled in two steps: coarse (room retrieval) and fine (position estimation). The results demonstrate that the curriculum-based triplet losses consistently outperform standard contrastive loss functions, particularly under challenging perceptual conditions. To thoroughly assess the robustness and generalization capabilities of the proposed method, it is evaluated in a variety of indoor and outdoor environments. The approach is tested against common challenges in real operation conditions, including severe illumination changes, the presence of dynamic visual effects such as noise and occlusions, and scenarios with limited training data. The results show that the proposed framework performs competitively in all these situations, achieving high recognition accuracy and demonstrating its potential as a reliable solution for real-world robotic applications. The code used in the experiments is available at https://github.com/MarcosAlfaro/TripletNetworksIndoorLocalization.git.
CRAFT: Coaching Reinforcement Learning Autonomously using Foundation Models for Multi-Robot Coordination Tasks
Multi-Agent Reinforcement Learning (MARL) provides a powerful framework for learning coordination in multi-agent systems. However, applying MARL to robotics still remains challenging due to high-dimensional continuous joint action spaces, complex reward design, and non-stationary transitions inherent to decentralized settings. On the other hand, humans learn complex coordination through staged curricula, where long-horizon behaviors are progressively built upon simpler skills. Motivated by this, we propose CRAFT: Coaching Reinforcement learning Autonomously using Foundation models for multi-robot coordination Tasks, a framework that leverages the reasoning capabilities of foundation models to act as a "coach" for multi-robot coordination. CRAFT automatically decomposes long-horizon coordination tasks into sequences of subtasks using the planning capability of Large Language Models (LLMs). In what follows, CRAFT trains each subtask using reward functions generated by LLM, and refines them through a Vision Language Model (VLM)-guided reward-refinement loop. We evaluate CRAFT on multi-quadruped navigation and bimanual manipulation tasks, demonstrating its capability to learn complex coordination behaviors. In addition, we validate the multi-quadruped navigation policy in real hardware experiments.
ReactEMG: Zero-Shot, Low-Latency Intent Detection via sEMG
Surface electromyography (sEMG) signals show promise for effective human-computer interfaces, particularly in rehabilitation and prosthetics. However, challenges remain in developing systems that respond quickly and reliably to user intent, across different subjects and without requiring time-consuming calibration. In this work, we propose a framework for EMG-based intent detection that addresses these challenges. Unlike traditional gesture recognition models that wait until a gesture is completed before classifying it, our approach uses a segmentation strategy to assign intent labels at every timestep as the gesture unfolds. We introduce a novel masked modeling strategy that aligns muscle activations with their corresponding user intents, enabling rapid onset detection and stable tracking of ongoing gestures. In evaluations against baseline methods, considering both accuracy and stability for device control, our approach surpasses state-of-the-art performance in zero-shot transfer conditions, demonstrating its potential for wearable robotics and next-generation prosthetic systems. Our project page is available at: https://reactemg.github.io
Grasp Pre-shape Selection by Synthetic Training: Eye-in-hand Shared Control on the Hannes Prosthesis IROS 2022
We consider the task of object grasping with a prosthetic hand capable of multiple grasp types. In this setting, communicating the intended grasp type often requires a high user cognitive load which can be reduced adopting shared autonomy frameworks. Among these, so-called eye-in-hand systems automatically control the hand pre-shaping before the grasp, based on visual input coming from a camera on the wrist. In this paper, we present an eye-in-hand learning-based approach for hand pre-shape classification from RGB sequences. Differently from previous work, we design the system to support the possibility to grasp each considered object part with a different grasp type. In order to overcome the lack of data of this kind and reduce the need for tedious data collection sessions for training the system, we devise a pipeline for rendering synthetic visual sequences of hand trajectories. We develop a sensorized setup to acquire real human grasping sequences for benchmarking and show that, compared on practical use cases, models trained with our synthetic dataset achieve better generalization performance than models trained on real data. We finally integrate our model on the Hannes prosthetic hand and show its practical effectiveness. We make publicly available the code and dataset to reproduce the presented results.
comment: Accepted to IROS 2022
Continuous Wrist Control on the Hannes Prosthesis: a Vision-based Shared Autonomy Framework ICRA 2025
Most control techniques for prosthetic grasping focus on dexterous fingers control, but overlook the wrist motion. This forces the user to perform compensatory movements with the elbow, shoulder and hip to adapt the wrist for grasping. We propose a computer vision-based system that leverages the collaboration between the user and an automatic system in a shared autonomy framework, to perform continuous control of the wrist degrees of freedom in a prosthetic arm, promoting a more natural approach-to-grasp motion. Our pipeline allows to seamlessly control the prosthetic wrist to follow the target object and finally orient it for grasping according to the user intent. We assess the effectiveness of each system component through quantitative analysis and finally deploy our method on the Hannes prosthetic arm. Code and videos: https://hsp-iit.github.io/hannes-wrist-control.
comment: Accepted to ICRA 2025. Project website: https://hsp-iit.github.io/hannes-wrist-control
One Policy to Run Them All: an End-to-end Learning Approach to Multi-Embodiment Locomotion
Deep Reinforcement Learning techniques are achieving state-of-the-art results in robust legged locomotion. While there exists a wide variety of legged platforms such as quadruped, humanoids, and hexapods, the field is still missing a single learning framework that can control all these different embodiments easily and effectively and possibly transfer, zero or few-shot, to unseen robot embodiments. We introduce URMA, the Unified Robot Morphology Architecture, to close this gap. Our framework brings the end-to-end Multi-Task Reinforcement Learning approach to the realm of legged robots, enabling the learned policy to control any type of robot morphology. The key idea of our method is to allow the network to learn an abstract locomotion controller that can be seamlessly shared between embodiments thanks to our morphology-agnostic encoders and decoders. This flexible architecture can be seen as a potential first step in building a foundation model for legged robot locomotion. Our experiments show that URMA can learn a locomotion policy on multiple embodiments that can be easily transferred to unseen robot platforms in simulation and the real world.
Systems and Control (CS)
Probabilistic Control Barrier Functions: Safety in Probability for Discrete-Time Stochastic Systems
Control systems operating in the real world face countless sources of unpredictable uncertainties. These random disturbances can render deterministic guarantees inapplicable and cause catastrophic safety failures. To overcome this, this paper proposes a method for designing safe controllers for discrete-time stochastic systems that retain probabilistic guarantees of safety. To do this we modify the traditional notion of a control barrier function (CBF) to explicitly account for these stochastic uncertainties and call these new modified functions probabilistic CBFs. We show that probabilistic CBFs can be used to design controllers that guarantee safety over a finite number of time steps with a prescribed probability. Next, by leveraging various uncertainty quantification methods, such as concentration inequalities, the scenario approach, and conformal prediction, we provide a variety of sufficient conditions that result in computationally tractable controllers with tunable probabilistic guarantees across a plethora of practical scenarios. Finally, we showcase the applicability of our results in simulation and hardware for the control of a quadruped robot.
Off-Policy Reinforcement Learning with Anytime Safety Guarantees via Robust Safe Gradient Flow
This paper considers the problem of solving constrained reinforcement learning (RL) problems with anytime guarantees, meaning that the algorithmic solution must yield a constraint-satisfying policy at every iteration of its evolution. Our design is based on a discretization of the Robust Safe Gradient Flow (RSGF), a continuous-time dynamics for anytime constrained optimization whose forward invariance and stability properties we formally characterize. The proposed strategy, termed RSGF-RL, is an off-policy algorithm which uses episodic data to estimate the value functions and their gradients and updates the policy parameters by solving a convex quadratically constrained quadratic program. Our technical analysis combines statistical analysis, the theory of stochastic approximation, and convex analysis to determine the number of episodes sufficient to ensure that safe policies are updated to safe policies and to recover from an unsafe policy, both with an arbitrary user-specified probability, and to establish the asymptotic convergence to the set of KKT points of the RL problem almost surely. Simulations on a navigation example and the cart-pole system illustrate the superior performance of RSGF-RL with respect to the state of the art.
A Robust Neural Control Design for Multi-drone Slung Payload Manipulation with Control Contraction Metrics
This paper presents a robust neural control design for a three-drone slung payload transportation system to track a reference path under external disturbances. The control contraction metric (CCM) is used to generate a neural exponentially converging baseline controller while complying with control input saturation constraints. We also incorporate the uncertainty and disturbance estimator (UDE) technique to dynamically compensate for persistent disturbances. The proposed framework yields a modularized design, allowing the controller and estimator to perform their individual tasks and achieve a zero trajectory tracking error if the disturbances meet certain assumptions. The stability and robustness of the complete system, incorporating both the CCM controller and the UDE compensator, are presented. Simulations are conducted to demonstrate the capability of the proposed control design to follow complicated trajectories under external disturbances.
comment: Submit to the 2026 American Control Conference (ACC)
Pose Estimation of a Thruster-Driven Bioinspired Multi-Link Robot
This work demonstrates pose (position and shape) estimation for a free-floating, bioinspired multi-link robot with unactuated joints, link-mounted thrusters for control, and a single gyroscope per link, resulting in an underactuated, minimally sensed platform. Through a proof-of-concept hardware experiment and offline Kalman filter analysis, we show that the robot's pose can be reliably estimated. State estimation is performed using an unscented Kalman filter augmented with Gaussian process residual learning to compensate for non-zero-mean, non-Gaussian noise. We further show that a filter trained on a multi-gait dataset (forward, backward, left, right, and turning) performs comparably to one trained on a larger forward-gait-only dataset when both are evaluated on the same forward-gait test trajectory. These results reveal overlap in the gait input space, which can be exploited to reduce training data requirements while enhancing the filter's generalizability across multiple gaits.
comment: 8 pages, 8 figures, 3 tables
Adversarial Social Influence: Modeling Persuasion in Contested Social Networks
We present the Social Influence Game (SIG), a framework for modeling adversarial persuasion in social networks with an arbitrary number of competing players. Our goal is to provide a tractable and interpretable model of contested influence that scales to large systems while capturing the structural leverage points of networks. Each player allocates influence from a fixed budget to steer opinions that evolve under DeGroot dynamics, and we prove that the resulting optimization problem is a difference-of-convex program. To enable scalability, we develop an Iterated Linear (IL) solver that approximates player objectives with linear programs. In experiments on random and archetypical networks, IL achieves solutions within 7% of nonlinear solvers while being over 10x faster, scaling to large social networks. This paper lays a foundation for asymptotic analysis of contested influence in complex networks.
Density-Ratio Weighted Behavioral Cloning: Learning Control Policies from Corrupted Datasets
Offline reinforcement learning (RL) enables policy optimization from fixed datasets, making it suitable for safety-critical applications where online exploration is infeasible. However, these datasets are often contaminated by adversarial poisoning, system errors, or low-quality samples, leading to degraded policy performance in standard behavioral cloning (BC) and offline RL methods. This paper introduces Density-Ratio Weighted Behavioral Cloning (Weighted BC), a robust imitation learning approach that uses a small, verified clean reference set to estimate trajectory-level density ratios via a binary discriminator. These ratios are clipped and used as weights in the BC objective to prioritize clean expert behavior while down-weighting or discarding corrupted data, without requiring knowledge of the contamination mechanism. We establish theoretical guarantees showing convergence to the clean expert policy with finite-sample bounds that are independent of the contamination rate. A comprehensive evaluation framework is established, which incorporates various poisoning protocols (reward, state, transition, and action) on continuous control benchmarks. Experiments demonstrate that Weighted BC maintains near-optimal performance even at high contamination ratios outperforming baselines such as traditional BC, batch-constrained Q-learning (BCQ) and behavior regularized actor-critic (BRAC).
Comparative Field Deployment of Reinforcement Learning and Model Predictive Control for Residential HVAC
Advanced control strategies like Model Predictive Control (MPC) offer significant energy savings for HVAC systems but often require substantial engineering effort, limiting scalability. Reinforcement Learning (RL) promises greater automation and adaptability, yet its practical application in real-world residential settings remains largely undemonstrated, facing challenges related to safety, interpretability, and sample efficiency. To investigate these practical issues, we performed a direct comparison of an MPC and a model-based RL controller, with each controller deployed for a one-month period in an occupied house with a heat pump system in West Lafayette, Indiana. This investigation aimed to explore scalability of the chosen RL and MPC implementations while ensuring safety and comparability. The advanced controllers were evaluated against each other and against the existing controller. RL achieved substantial energy savings (22\% relative to the existing controller), slightly exceeding MPC's savings (20\%), albeit with modestly higher occupant discomfort. However, when energy savings were normalized for the level of comfort provided, MPC demonstrated superior performance. This study's empirical results show that while RL reduces engineering overhead, it introduces practical trade-offs in model accuracy and operational robustness. The key lessons learned concern the difficulties of safe controller initialization, navigating the mismatch between control actions and their practical implementation, and maintaining the integrity of online learning in a live environment. These insights pinpoint the essential research directions needed to advance RL from a promising concept to a truly scalable HVAC control solution.
comment: 27 pages, 11 figures, 4 tables. Under review for Applied Energy
Touching the tumor boundary: A pilot study on ultrasound based virtual fixtures for breast-conserving surgery
Purpose: Delineating tumor boundaries during breast-conserving surgery is challenging as tumors are often highly mobile, non-palpable, and have irregularly shaped borders. To address these challenges, we introduce a cooperative robotic guidance system that applies haptic feedback for tumor localization. In this pilot study, we aim to assess if and how this system can be successfully integrated into breast cancer care. Methods: A small haptic robot is retrofitted with an electrocautery blade to operate as a cooperatively controlled surgical tool. Ultrasound and electromagnetic navigation are used to identify the tumor boundaries and position. A forbidden region virtual fixture is imposed when the surgical tool collides with the tumor boundary. We conducted a study where users were asked to resect tumors from breast simulants both with and without the haptic guidance. We then assess the results of these simulated resections both qualitatively and quantitatively. Results: Virtual fixture guidance is shown to improve resection margins. On average, users find the task to be less mentally demanding, frustrating, and effort intensive when haptic feedback is available. We also discovered some unanticipated impacts on surgical workflow that will guide design adjustments and training protocol moving forward. Conclusion: Our results suggest that virtual fixtures can help localize tumor boundaries in simulated breast-conserving surgery. Future work will include an extensive user study to further validate these results and fine-tune our guidance system.
A New Partial State-Feedback IDA-PBC for Two-Dimensional Nonlinear Systems: Application to Power Converters with Experimental Results
In this paper we propose a variation of the widely popular Interconnection-and-Damping-Assigment Passivity-Based Control (IDA-PBC) based on Poincare's Lemma to design output feedback globally stabilizing controllers for two dimensional systems. The procedure is constructive and, in comparison with the classical IDA-PBC, whose application is often stymied by the need to solve the (infamous) matching partial differential equation (PDE), in this new method the PDE is replaced by an ordinary differential equation, whose solution is far simpler. The procedure is then applied for the design of voltage-feedback controllers for the three most typical DC-to-DC power converter topologies: the Buck, Boost and Buck-Boost. It is assumed that these converters feed an uncertain load, which is characterized by a static relation between its voltage and current. In the case when the load consists of the parallel connection of a resistive term and a constant power load we propose an adaptive version of the design, adding an identification scheme for the load parameters. This allows the controller to regulate the converter output when the load varies-that is a typical scenario in these applications. Extensive numerical simulations and experimental results validate the approach.
Robust Data-Driven Control for Nonlinear Systems Using their Digital Twins and Quadratic Funnels
This paper examines a robust data-driven approach for the safe deployment of systems with nonlinear dynamics using their imperfect digital twins. Our contribution involves proposing a method that fuses the digital twin's nominal trajectory with online, data-driven uncertainty quantification to synthesize robust tracking controllers. Specifically, we derive data-driven bounds to capture the deviations of the actual system from its prescribed nominal trajectory informed via its digital twin. Subsequently, the dataset is used in the synthesis of quadratic funnels -- robust positive invariant tubes around the nominal trajectory -- via linear matrix inequalities built on the time-series data. The resulting controller guarantees constraint satisfaction while adapting to the true system behavior through a segmented learning strategy, where each segment's controller is synthesized using uncertainty information from the previous segment. This work establishes a systematic framework for obtaining safety certificates in learning-based control of nonlinear systems with imperfect models.
comment: 8 pages, 3 figures
Beyond Collision Cones: Dynamic Obstacle Avoidance for Nonholonomic Robots via Dynamic Parabolic Control Barrier Functions
Control Barrier Functions (CBFs) are a powerful tool for ensuring the safety of autonomous systems, yet applying them to nonholonomic robots in cluttered, dynamic environments remains an open challenge. State-of-the-art methods often rely on collision-cone or velocity-obstacle constraints which, by only considering the angle of the relative velocity, are inherently conservative and can render the CBF-based quadratic program infeasible, particularly in dense scenarios. To address this issue, we propose a Dynamic Parabolic Control Barrier Function (DPCBF) that defines the safe set using a parabolic boundary. The parabola's vertex and curvature dynamically adapt based on both the distance to an obstacle and the magnitude of the relative velocity, creating a less restrictive safety constraint. We prove that the proposed DPCBF is valid for a kinematic bicycle model subject to input constraints. Extensive comparative simulations demonstrate that our DPCBF-based controller significantly enhances navigation success rates and QP feasibility compared to baseline methods. Our approach successfully navigates through dense environments with up to 100 dynamic obstacles, scenarios where collision cone-based methods fail due to infeasibility.
comment: The first two authors contributed equally to this work. Project page: https://www.taekyung.me/dpcbf
DeMuon: A Decentralized Muon for Matrix Optimization over Graphs
In this paper, we propose DeMuon, a method for decentralized matrix optimization over a given communication topology. DeMuon incorporates matrix orthogonalization via Newton-Schulz iterations-a technique inherited from its centralized predecessor, Muon-and employs gradient tracking to mitigate heterogeneity among local functions. Under heavy-tailed noise conditions and additional mild assumptions, we establish the iteration complexity of DeMuon for reaching an approximate stochastic stationary point. This complexity result matches the best-known complexity bounds of centralized algorithms in terms of dependence on the target tolerance. To the best of our knowledge, DeMuon is the first direct extension of Muon to decentralized optimization over graphs with provable complexity guarantees. We conduct preliminary numerical experiments on decentralized transformer pretraining over graphs with varying degrees of connectivity. Our numerical results demonstrate a clear margin of improvement of DeMuon over other popular decentralized algorithms across different network topologies.
A Control Theory inspired Exploration Method for a Linear Bandit driven by a Linear Gaussian Dynamical System
The paper introduces a linear bandit environment where the reward is the output of a known Linear Gaussian Dynamical System (LGDS). In this environment, we address the fundamental challenge of balancing exploration -- gathering information about the environment -- and exploitation -- selecting to the action with the highest predicted reward. We propose two algorithms, Kalman filter Upper Confidence Bound (Kalman-UCB) and Information filter Directed Exploration Action-selection (IDEA). Kalman-UCB uses the principle of optimism in the face of uncertainty. IDEA selects actions that maximize the combination of the predicted reward and a term that quantifies how much an action minimizes the error of the Kalman filter state prediction, which depends on the LGDS property called observability. IDEA is motivated by applications such as hyperparameter optimization in machine learning. A major problem encountered in hyperparameter optimization is the large action spaces, which hinder the performance of methods inspired by principle of optimism in the face of uncertainty as they need to explore each action to lower reward prediction uncertainty. To predict if either Kalman-UCB or IDEA will perform better, a metric based on the LGDS properties is provided. This metric is validated with numerical results across a variety of randomly generated environments.
Integrated Security Mechanisms for Weight Protection in Memristive Crossbar Arrays
Memristive crossbar arrays enable in-memory computing by performing parallel analog computations directly within memory, making them well-suited for machine learning, neural networks, and neuromorphic systems. However, despite their advantages, non-volatile memristors are vulnerable to security threats (such as adversarial extraction of stored weights when the hardware is compromised. Protecting these weights is essential since they represent valuable intellectual property resulting from lengthy and costly training processes using large, often proprietary, datasets. As a solution we propose two security mechanisms: Keyed Permutor and Watermark Protection Columns; where both safeguard critical weights and establish verifiable ownership (even in cases of data leakage). Our approach integrates efficiently with existing memristive crossbar architectures without significant design modifications. Simulations across 45nm, 22nm, and 7nm CMOS nodes, using a realistic interconnect model and a large RF dataset, show that both mechanisms offer robust protection with under 10% overhead in area, delay and power. We also present initial experiments employing the widely known MNIST dataset; further highlighting the feasibility of securing memristive in-memory computing systems with minimal performance trade-offs.
comment: 2 pages, 2 figures
Safety-Critical Control via Recurrent Tracking Functions
This paper addresses the challenge of synthesizing safety-critical controllers for high-order nonlinear systems, where constructing valid Control Barrier Functions (CBFs) remains computationally intractable. Leveraging layered control, we design CBFs in reduced-order models (RoMs) while regulating full-order models' (FoMs) dynamics at the same time. Traditional Lyapunov tracking functions are required to decrease monotonically, but systematic synthesis methods for such functions exist only for fully-actuated systems. To overcome this limitation, we introduce Recurrent Tracking Functions (RTFs), which replace the monotonic decay requirement with a weaker finite-time recurrence condition. This relaxation permits transient deviations of tracking errors while ensuring safety. By augmenting CBFs for RoMs with RTFs, we construct recurrent CBFs (RCBFs) whose zero-superlevel set is control $\tau$-recurrent, and guarantee safety for all initial states in such a set when RTFs are satisfied. We establish theoretical safety guarantees and validate the approach through numerical experiments, demonstrating RTFs' effectiveness and the safety of FoMs.
comment: 7 Pages, 2 Figures
Partial Resilient Leader-Follower Consensus in Time-Varying Graphs
This work studies resilient leader-follower consensus with a bounded number of adversaries. Existing approaches typically require robustness conditions of the entire network to guarantee resilient consensus. However, the behavior of such systems when these conditions are not fully met remains unexplored. To address this gap, we introduce the notion of partial leader-follower consensus, in which a subset of non-adversarial followers successfully tracks the leader's reference state despite insufficient robustness. We propose a novel distributed algorithm - the Bootstrap Percolation and Mean Subsequence Reduced (BP-MSR) algorithm - and establish sufficient conditions for individual followers to achieve consensus via the BP-MSR algorithm in arbitrary time-varying graphs. We validate our findings through simulations, demonstrating that our method guarantees partial leader-follower consensus, even when standard resilient consensus algorithms fail.
comment: 8 pages, 3 figures, Submitted to American Control Conference (ACC) 2026
Vulnerability Analysis Evaluating Bilevel Optimal Power Flow Approaches for Multiple Load Cases
This work presents two methodologies to enhance vulnerability assessment in power systems using bilevel attacker-defender network interdiction models. First, we introduce a systematic evaluation procedure for comparing different optimal power flow formulations in the lower-level problem. We demonstrate the procedure for a comparison of the widely used DC approximation and a linearized AC optimal power flow model. Second, we propose a novel scoring methodology to identify and prioritize critical attack vectors across diverse load and generation scenarios. Both methodologies go beyond traditional worst-case analysis. Case studies on a SimBench high-voltage test grid show that the DC approach fails to detect a significant portion of critical vulnerabilities. The scoring methodology further demonstrates the dependency of vulnerabilities on the considered load case and time step, highlighting the importance of assessing multiple scenarios and going beyond worst-case solutions. The proposed methodologies enhance power system vulnerability assessment and can support the effective development of robust defense strategies for future power systems.
Networked Control and Mean Field Problems Under Diagonal Dominance: Decentralized and Social Optimality
In this article, we employ an input-output approach to expand the study of cooperative multi-agent control and optimization problems characterized by mean-field interactions that admit decentralized and selfish solutions. The setting involves $n$ independent agents that interact solely through a shared cost function, which penalizes deviations of each agent from the group's average collective behavior. Building on our earlier results established for homogeneous agents, we extend the framework to nonidentical agents and show that, under a diagonal dominant interaction of the collective dynamics, with bounded local open-loop dynamics, the optimal controller for $H_\infty$ and $H_2$ norm minimization remains decentralized and selfish in the limit as the number of agents $n$ grows to infinity.
Predictive Control Barrier Functions for Discrete-Time Linear Systems with Unmodeled Delays
This paper introduces a predictive control barrier function (PCBF) framework for enforcing state constraints in discrete-time systems with unknown relative degree, which can be caused by input delays or unmodeled input dynamics. Existing discrete-time CBF formulations typically require the construction of auxiliary barrier functions when the relative degree is greater than one, which complicates implementation and may yield conservative safe sets. The proposed PCBF framework addresses this challenge by extending the prediction horizon to construct a CBF for an associated system with relative degree one. As a result, the superlevel set of the PCBF coincides with the safe set, simplifying constraint enforcement and eliminating the need for auxiliary functions. The effectiveness of the proposed method is demonstrated on a discrete-time double integrator with input delay and a bicopter system with position constraints.
comment: 8 pages, 7 figures, submitted to ACC 2026
Grid Frequency Stability Support Potential of Data Center: A Quantitative Assessment of Flexibility
The rapid expansion of data center infrastructure is reshaping power system dynamics by significantly increasing electricity demand while also offering potential for fast and controllable flexibility. To ensure reliable operation under such conditions, the frequency secured unit commitment problem must be solved with enhanced modeling of demand side frequency response. In this work, we propose a data-driven linearization framework based on decision tree based constraint learning to embed nonlinear nadir frequency constraints into mixed-integer linear programming. This approach enables tractable optimization of generation schedules and fast frequency response from data centers. Through case studies on both a benchmark system and a 2030 future scenario with higher DC penetration, we demonstrate that increasing the proportion of flexible DC load consistently improves system cost efficiency and supports renewable integration. However, this benefit exhibits diminishing marginal returns, motivating the introduction of the Marginal Flexibility Value metric to quantify the economic value of additional flexibility. The results highlight that as DCs become a larger share of system load, their active participation in frequency response will be increasingly indispensable for maintaining both economic and secure system operations.
Gain-Scheduled Passive Fault-Tolerant Control Design for Dual-System UAV Transition Flight
Dual-system UAVs with vertical take-off and landing capabilities have become increasingly popular in recent years. As a safety-critical system, it is important that a dual-system UAV can maintain safe flight after faults/failures occur. This paper proposes a gain-scheduled passive fault-tolerant control (PFTC) method for the transition flight of dual-system UAVs. In this novel FTC design method, the model uncertainties arising from the loss of control effectiveness caused by actuator faults/failures, for the first time, are treated as model input uncertainty, allowing us to use multiplicative uncertainty descriptions to represent it. The advantages of the proposed method consist in significantly reducing the number of design points, thereby simplifying the control synthesis process and improving the efficiency of designing the FTC system for dual-system UAV transition flight compared with the existing FTC design methods. As a general method, it can be applied to the design of FTC systems with multiple uncertain parameters and multiple channels. The developed passive FTC system is validated on a nonlinear six-degree-of-freedom simulator. The simulation results demonstrate that the gain-scheduled structured H infinity (GS SHIF) PFTC system provides superior fault tolerance performance compared with the LQR and structured H infinity control systems, thereby showcasing the effectiveness and the advantages of the proposed GS SHIF PFTC approach.
comment: 25 pages
ROSplane 2.0: A Fixed-Wing Autopilot for Research
Unmanned aerial vehicle (UAV) research requires the integration of cutting-edge technology into existing autopilot frameworks. This process can be arduous, requiring extensive resources, time, and detailed knowledge of the existing system. ROSplane is a lean, open-source fixed-wing autonomy stack built by researchers for researchers. It is designed to accelerate research by providing clearly defined interfaces with an easily modifiable framework. Powered by ROS 2, ROSplane allows for rapid integration of low or high-level control, path planning, or estimation algorithms. A focus on lean, easily understood code and extensive documentation lowers the barrier to entry for researchers. Recent developments to ROSplane improve its capacity to accelerate UAV research, including the transition from ROS 1 to ROS 2, enhanced estimation and control algorithms, increased modularity, and an improved aerodynamic modeling pipeline. This aerodynamic modeling pipeline significantly reduces the effort of transitioning from simulation to real-world testing without requiring expensive system identification or computational fluid dynamics tools. ROSplane's architecture reduces the effort required to integrate new research tools and methods, expediting hardware experimentation.
A Model-Based Extended State Observer for Discrete-Time Linear Multivariable Systems
A model-based extended state observer (MB-ESO) and its variant are proposed for discrete-time linear multivariable systems, where multiple disturbances are defined as an extended state vector in the same manner as in the original formulation of ESO. The variant MB-ESO extends the MB-ESO to address cases where the disturbance gain matrix is non-diagonal. Leveraging the connection between the variant MB-ESO and the well-known unknown input observer (UIO), the condition for the existence of a MB-ESO and its variant in multivariable systems is established, for the first time, i.e., no invariant zeros exist between the disturbances and the plant outputs. It is shown that, with the observer eigenvalues all placed at the origin and the subsystems decoupled, the variant MB-ESO produces the identical disturbance estimation as that of UIO. Moreover, the error characteristics of MB-ESO and its variant are analyzed and the transfer functions associated with the disturbance estimation errors are derived. It is demonstrated both mathematically and in simulations that the disturbance estimation error of MB-ESO decreases monotonically with respect to both the observer eigenvalues and time.
ROSflight 2.0: Lean ROS 2-Based Autopilot for Unmanned Aerial Vehicles
ROSflight is a lean, open-source autopilot ecosystem for unmanned aerial vehicles (UAVs). Designed by researchers for researchers, it is built to lower the barrier to entry to UAV research and accelerate the transition from simulation to hardware experiments by maintaining a lean (not full-featured), well-documented, and modular codebase. This publication builds on previous treatments and describes significant additions to the architecture that improve the modularity and usability of ROSflight, including the transition from ROS 1 to ROS 2, supported hardware, low-level actuator mixing, and the simulation environment. We believe that these changes improve the usability of ROSflight and enable ROSflight to accelerate research in areas like advanced-air mobility. Hardware results are provided, showing that ROSflight is able to control a multirotor over a serial connection at 400 Hz while closing all control loops on the companion computer.
comment: To be submitted to the 2026 IEEE International Conference on Robotics and Automation in Vienna, Austria
Optimal Pricing of Electric Vehicle Charging on Coupled Power-Transportation Network based on Generalized Sensitivity Analysis
In the last decade, charging service providers are emerging along with the prevalence of electric vehicles. These providers need to strategically optimize their charging prices to improve the profits considering operation conditions of the coupled power-transportation network. However, the optimal pricing problem generally involves the user equilibrium model, which leads to a mathematical program with equilibrium constraints. As a result, the pricing problem is non-convex and computationally intractable especially for large-scale network. To address this challenge, we propose a generalized sensitivity analysis approach for optimal pricing of electric vehicle charging on coupled power-transportation network. Specifically, we adopt a sensitivity analysis to capture the best response of charging demand to charging price in the gradient form. Consequently, charging service providers can make pricing decisions based on the gradient information instead of the conventional KKT conditions of the user equilibrium model. We then propose a tailored gradient descent algorithm to solve the whole pricing problem. The mathematical proof of validity is given and the time complexity of the proposed algorithm is theoretically polynomial. Numerical experiments on different scales of networks verify the computational efficiency of the proposed algorithm, indicating its potential in evaluating the impact of the optimal pricing on the operational performance of large-scale coupled power-transportation network.
Real-time Operation of Electric Autonomous Mobility-on-Demand System Considering Power System Regulation
Electric autonomous mobility-on-demand (EAMoD) systems are emerging all over the world. However, their potential swarm charging in depots may deteriorate operation of the power system, further in turn affecting EAMoD system's optimal operation. To prevent this latent risk, we develop a real-time coordination framework for the EAMoD system and the power system. First, the temporal-spatial characteristics of EAMoD fleets are fully described based on a Markov decision process model, including serving trips, repositioning, and charging. Second, charger accessibility of EAMoD depot charging is well modeled as real-world configuration, wherein fast and slow charge piles are both included. Third, the power system regulation model provides real-time charging regulation constraints for EAMoD systems to prevent potential overload and undervoltage. To address the poor solution quality attributed to the complex decision space of the EAMoD system, this paper proposes a piecewise linear-based approximate dynamic programming algorithm combined with model predictive control. Numerical experiments in the Manhattan and a 14-node power distribution network validate the effectiveness of the proposed algorithm and underscore the necessity of system coordination.
Structuring Automotive Data for Systems Engineering: A Taxonomy-Based Approach
Vehicle data is essential for advancing data-driven development throughout the automotive lifecycle, including requirements engineering, design, verification, and validation, and post-deployment optimization. Developers currently collect data in a decentralized and fragmented manner across simulations, test benches, and real-world driving, resulting in data silos, inconsistent formats, and limited interoperability. This leads to redundant efforts, inefficient integration, and suboptimal use of data. This fragmentation results in data silos, inconsistent storage structures, and limited interoperability, leading to redundant data collection, inefficient integration, and suboptimal application. To address these challenges, this article presents a structured literature review and develops an inductive taxonomy for automotive data. This taxonomy categorizes data according to its sources and applications, improving data accessibility and utilization. The analysis reveals a growing emphasis on real-world driving and machine learning applications while highlighting a critical gap in data availability for requirements engineering. By providing a systematic framework for structuring automotive data, this research contributes to more efficient data management and improved decision-making in the automotive industry.
A Neuro-Fuzzy System for Interpretable Long-Term Stock Market Forecasting
In the complex landscape of multivariate time series forecasting, achieving both accuracy and interpretability remains a significant challenge. This paper introduces the Fuzzy Transformer (Fuzzformer), a novel recurrent neural network architecture combined with multi-head self-attention and fuzzy inference systems to analyze multivariate stock market data and conduct long-term time series forecasting. The method leverages LSTM networks and temporal attention to condense multivariate data into interpretable features suitable for fuzzy inference systems. The resulting architecture offers comparable forecasting performance to conventional models such as ARIMA and LSTM while providing meaningful information flow within the network. The method was examined on the real world stock market index S\&P500. Initial results show potential for interpretable forecasting and identify current performance tradeoffs, suggesting practical application in understanding and forecasting stock market behavior.
comment: Published in: ERK 2025 -- 34th International Electrotechnical and Computer Science Conference, Portoro\v{z}, Slovenia, Sept. 25--26, 2025. Proceedings published by Dru\v{s}tvo Slovenska sekcija IEEE. ISSN: 2591-0442 (online). 4 pages, 2 figures
Accurate Small-Signal Modeling of Digitally Controlled Buck Converters with ADC-PWM Synchronization
Digital control has become increasingly widespread in modern power electronic converters. When acquiring feedback signals such as the inductor current, synchronizing the analog-to-digital converter (ADC) with the digital pulse-width modulator (DPWM) is commonly employed to accurately track their steady-state average. However, the small-signal implications of such synchronization have not been investigated. This paper presents an exact small-signal model for digitally controlled buck converters operating in forced continuous-conduction mode (FCCM) under constant-frequency current-mode control, explicitly accounting for DPWM-ADC synchronization. Using a sampled-data framework, the proposed model captures all sideband effects introduced by the sampling process, yielding precise predictions of both analog and digital loop gains, even at frequencies beyond the switching and sampling frequencies. Both asymmetrical and symmetrical carrier modulations are considered. Furthermore, the digital loop gain is derived in closed form using the modified z-transform, enabling low-complexity compensator design and stability assessment. Within this framework, the analog loop gain can be directly obtained from the digital loop gain, thereby eliminating the need for computationally intensive infinite series evaluations. The validity of the proposed model is confirmed through both simulation and experimental results.
Non-submodular Visual Attention for Robot Navigation
This paper presents a task-oriented computational framework to enhance Visual-Inertial Navigation (VIN) in robots, addressing challenges such as limited time and energy resources. The framework strategically selects visual features using a Mean Squared Error (MSE)-based, non-submodular objective function and a simplified dynamic anticipation model. To address the NP-hardness of this problem, we introduce four polynomial-time approximation algorithms: a classic greedy method with constant-factor guarantees; a low-rank greedy variant that significantly reduces computational complexity; a randomized greedy sampler that balances efficiency and solution quality; and a linearization-based selector based on a first-order Taylor expansion for near-constant-time execution. We establish rigorous performance bounds by leveraging submodularity ratios, curvature, and element-wise curvature analyses. Extensive experiments on both standardized benchmarks and a custom control-aware platform validate our theoretical results, demonstrating that these methods achieve strong approximation guarantees while enabling real-time deployment.
comment: 22 pages; Accepted to appear in IEEE Transactions on Robotics (T-RO)
Enhancing Urban VANETs Stability: A Single-Hop Clustering Strategy in Metropolitan Environments
Vehicular Ad-hoc Networks (VANETs), a subclass of Mobile Ad-hoc Networks (MANETs), are expected to play a crucial role in the future of intelligent transportation systems (ITSs). A key objective of VANETs is to enable efficient and cost-effective communication among vehicles while supporting a large number of network participants and minimizing infrastructure dependency. However, the highly dynamic nature of vehicular networks poses significant challenges to their deployment. Clustering techniques are employed to address these challenges, with a strong emphasis on stability, as they directly influence the routing process and enhance the quality of service (QoS). This paper explores the feasibility of reducing reliance on roadside units (RSUs) in metropolitan areas while improving cluster stability. We propose an efficient clustering algorithm tailored for urban environments, leveraging existing metropolitan infrastructure to compensate for the absence of RSUs. Our approach designates public transportation buses as primary cluster heads (CHs), minimizing reliance on additional infrastructure, while stand-alone vehicles (SAVs) dynamically select additional CHs. Through comprehensive case studies and comparative analysis with existing algorithms, our results demonstrate the superior performance of the proposed method across different transmission ranges (TRs).
comment: 10 pages, 6 figures, 5 tables, Journal
Product-oriented Product-Process-Resource Asset Network and its Representation in AutomationML for Asset Administration Shell
Current products, especially in the automotive sector, pose complex technical systems having a multi-disciplinary mechatronic nature. Industrial standards supporting system engineering and production typically (i) address the production phase only, but do not cover the complete product life cycle, and (ii) focus on production processes and resources rather than the products themselves. The presented approach is motivated by incorporating impacts of end-of-life phase of the product life cycle into the engineering phase. This paper proposes a modelling approach coming up from the Product-Process-Resource (PPR) modeling paradigm. It combines requirements on (i) respecting the product structure as a basis for the model, and (ii) it incorporates repairing, remanufacturing, or upcycling within cyber-physical production systems. The proposed model called PoPAN should accompany the product during the entire life cycle as a digital shadow encapsulated within the Asset Administration Shell of a product. To facilitate the adoption of the proposed paradigm, the paper also proposes serialization of the model in the AutomationML data format. The model is demonstrated on a use-case for disassembling electric vehicle batteries to support their remanufacturing for stationary battery applications.
comment: This work has been submitted to the IEEE for possible publication. 8 pages, 6 figures
TubeDAgger: Reducing the Number of Expert Interventions with Stochastic Reach-Tubes
Interactive Imitation Learning deals with training a novice policy from expert demonstrations in an online fashion. The established DAgger algorithm trains a robust novice policy by alternating between interacting with the environment and retraining of the network. Many variants thereof exist, that differ in the method of discerning whether to allow the novice to act or return control to the expert. We propose the use of stochastic reachtubes - common in verification of dynamical systems - as a novel method for estimating the necessity of expert intervention. Our approach does not require fine-tuning of decision thresholds per environment and effectively reduces the number of expert interventions, especially when compared with related approaches that make use of a doubt classification model.
Uncertainty-Aware Flexibility of Buildings: From Quantification to Provision
Buildings represent a promising flexibility source to support the integration of renewable energy sources, as they may shift their heating energy consumption over time without impacting users' comfort. However, a building's predicted flexibility potential is based on uncertain ambient weather forecasts and a typically inaccurate building thermal model. Hence, this paper presents an uncertainty-aware flexibility quantifier using a chance-constrained formulation. Because such a quantifier may be conservative, we additionally model real-time feedback in the quantification, in the form of affine feedback policies. Such adaptation can take the form of intra-day trades or rebound around the flexibility provision period. To assess the flexibility quantification formulations, we further assume that flexible buildings participate in secondary frequency control markets. The results show some increase in flexibility and revenues when introducing affine feedback policies. Additionally, it is demonstrated that accounting for uncertainties in the flexibility quantification is necessary, especially when intra-day trades are not available. Even though an uncertainty-ignorant potential may seem financially profitable in secondary frequency control markets, it comes at the cost of significant thermal discomfort for inhabitants. Hence, we suggest a comfort-preserving approach, aiming to truly reflect thermal discomfort on the economic flexibility revenue, to obtain a fairer comparison.
comment: Accepted in IEEE Transactions on Smart Grids
Global convergence of Oja's component flow for general square matrices and its applications
This paper establishes the global convergence properties of the Oja flow, a continuous-time algorithm for principal component extraction, for general square matrices. The Oja flow is a matrix differential equation on the Stiefel manifold designed to extract a dominant subspace. While its analysis has traditionally been restricted to symmetric positive-definite matrices, where it acts as a gradient flow, recent applications have extended its use to general matrices. In this non-symmetric case, the flow extracts the invariant subspace corresponding to the eigenvalues with the largest real parts. However, prior convergence results have been purely local, leaving the global behavior as an open problem. This paper fills this gap by providing a comprehensive global convergence analysis, establishing that the flow converges exponentially for almost all initial conditions. We also propose a modification to the algorithm that enhances its numerical stability. As an application of this theory, we develop novel methods for the model reduction of linear dynamical systems and the synthesis of low-rank stabilizing controllers.
comment: 15 pages, 6 figures
Shared Object Manipulation with a Team of Collaborative Quadrupeds
Utilizing teams of multiple robots is advantageous for handling bulky objects. Many related works focus on multi-manipulator systems, which are limited by workspace constraints. In this paper, we extend a classical hybrid motion-force controller to a team of legged manipulator systems, enabling collaborative loco-manipulation of rigid objects with a force-closed grasp. Our novel approach allows the robots to flexibly coordinate their movements, achieving efficient and stable object co-manipulation and transport, validated through extensive simulations and real-world experiments.
comment: 8 pages, 9 figures, submitted to The 2026 American Control Conference
Formation Control via Rotation Symmetry Constraints
We present a distributed formation control strategy for multi-agent systems based only on rotation symmetry constraints. We propose a potential function that enforces inter-agent \textbf{rotational} symmetries, with its gradient defining the control law driving the agents toward a desired symmetric and planar configuration. We show that only $(n-1)$ edges, the minimal connectivity requirement, are sufficient to implement the control strategy, where $n$ is the number of agents. We further augment the design to address the \textbf{maneuvering problem}, enabling the formation to undergo coordinated translations, rotations, and scalings along a predefined virtual trajectory. Numerical simulations demonstrate the effectiveness and flexibility of the proposed method.
Multi-Agent Stage-wise Conservative Linear Bandits
In many real-world applications such as recommendation systems, multiple learning agents must balance exploration and exploitation while maintaining safety guarantees to avoid catastrophic failures. We study the stochastic linear bandit problem in a multi-agent networked setting where agents must satisfy stage-wise conservative constraints. A network of $N$ agents collaboratively maximizes cumulative reward while ensuring that the expected reward at every round is no less than $(1-\alpha)$ times that of a baseline policy. Each agent observes local rewards with unknown parameters, but the network optimizes for the global parameter (average of local parameters). Agents communicate only with immediate neighbors, and each communication round incurs additional regret. We propose MA-SCLUCB (Multi-Agent Stage-wise Conservative Linear UCB), an episodic algorithm alternating between action selection and consensus-building phases. We prove that MA-SCLUCB achieves regret $\tilde{O}\left(\frac{d}{\sqrt{N}}\sqrt{T}\cdot\frac{\log(NT)}{\sqrt{\log(1/|\lambda_2|)}}\right)$ with high probability, where $d$ is the dimension, $T$ is the horizon, and $|\lambda_2|$ is the network's second largest eigenvalue magnitude. Our analysis shows: (i) collaboration yields $\frac{1}{\sqrt{N}}$ improvement despite local communication, (ii) communication overhead grows only logarithmically for well-connected networks, and (iii) stage-wise safety adds only lower-order regret. Thus, distributed learning with safety guarantees achieves near-optimal performance in reasonably connected networks.
Development and Field Validation of a Fully Customised Vehicle Scanning System on Two Full-Scale Bridges
Ensuring the structural integrity of bridges is essential for maintaining infrastructure safety and promoting long-term sustainability. In this context, Indirect Structural Health Monitoring (ISHM) through drive-by bridge inspection emerges as a promising alternative to traditional inspection methods, offering a cost-effective and scalable solution by using vehicle-mounted sensors to assess the condition of bridges without requiring direct instrumentation. This study introduces the first purpose-built electric inspection vehicle specifically designed for drive-by bridge inspection. The autonomous platform is capable of maintaining a constant low speed and offers customisable operational parameters to maximise the accuracy and repeatability of indirect sensing, capabilities not achieved in previous studies. The vehicle is deployed within an ISHM framework and tested on two full-scale bridges to evaluate its effectiveness in capturing structural dynamic responses. Two unsupervised frameworks are then employed to analyse the collected data to identify features indicative of bridge properties and structural condition. The promising findings from this study demonstrate the practical feasibility of the approach. The study also shows the potential of ISHM as a viable tool for efficient bridge monitoring, contributing to the development of next-generation structural health monitoring systems that can enhance safety, optimise maintenance strategies, and support the longevity of critical infrastructure.
Annealed Ensemble Kalman Inversion for Constrained Nonlinear Model Predictive Control: An ADMM Approach
This work proposes a novel Alternating Direction Method of Multipliers (ADMM)-based Ensemble Kalman Inversion (EKI) algorithm for solving constrained nonlinear model predictive control (NMPC) problems. First, the stage-wise nonlinear inequality constraints in the NMPC problem are embedded via an augmented Lagrangian with nonnegative slack variables. We then show that the unconstrained augmented Lagrangian formulation of the NMPC admits a Bayesian interpretation: under a Gaussian observation model, its minimizers coincide with MAP estimators, enabling solution via EKI. However, since the nonnegativity constraint on the slacks cannot be enforced via Gaussian noise, our proposed algorithm results in a two-block ADMM that alternates between (i) a primal step that minimizes the unconstrained augmented Lagrangian, (ii) a nonnegativity projection for the slacks, and (iii) a dual ascent step. To balance exploration and convergence, an annealing schedule tempers covariances and penalty weights, thereby encouraging global search early and precise constraint satisfaction later. To demonstrate the performance of the proposed method, we compare it with another iterative sampling-based approach based on Model Predictive Path Integral (MPPI) control, called DIAL-MPC.
An Interpolation-based Scheme for Rapid Frequency-Domain System Identification
We present a frequency-domain system identification scheme based on barycentric interpolation and weight optimization. The scheme is related to the Adaptive Antoulas-Anderson (AAA) algorithm for model reduction, but uses an adaptive algorithm for selection of frequency points for interrogating the system response, as would be required in identification versus model reduction. The scheme is particularly suited for systems in which any one sinusoidal response run is long or expensive, and thus there is an incentive to reduce the total number of such runs. Two key features of our algorithm are the use of transient data in sinusoidal runs to both optimize the barycentric weights, and automated next-frequency selection on an adaptive grid. Both are done with error criteria that are proxies for a system's $H^2$ and $H^\infty$ norms respectively. Furthermore, the optimization problem we formulate is convex, and can optionally guarantee stability of the identified system. Computational results on a high-order, lightly damped structural system highlights the efficacy of this scheme.
comment: 7 pages, 5 figures Submitted to IEEE American Control Conference 2026
Wireless Laser Power Transfer for Low-altitude Uncrewed Aerial Vehicle-assisted Internet of Things: Paradigms, Challenges, and Solutions
Low-altitude uncrewed aerial vehicles (UAVs) have become integral enablers for the Internet of Things (IoT) by offering enhanced coverage, improved connectivity and access to remote areas. A critical challenge limiting their operational capacity lies in the energy constraints of both aerial platforms and ground-based sensors. This paper explores WLPT as a transformative solution for sustainable energy provisioning in UAV-assisted IoT networks. We first systematically investigate the fundamental principles of WLPT and analysis the comparative advantages. Then, we introduce three operational paradigms for system integration, identify key challenges, and discuss corresponding potential solutions. In case study, we propose a multi-agent reinforcement learning framework to address the coordination and optimization challenges in WLPT-enabled UAV-assisted IoT data collection. Simulation results demonstrate that our framework significantly improves energy sustainability and data freshness. Finally, we discuss some future directions.
comment: This paper has been submitted to IEEE Internet of Things Magazine
Four-Port Probe Stations and SOLR Calibration Standard Design up to 125 GHz on 28 nm CMOS
This paper presents two innovative four-port probe stations developed by FormFactor Incorporated (FFI) and MPI Corporation (MPI), and a four-port calibration standard design up to 125 GHz for the probe stations. True four-port probing at mmWave and beyond does not yet exist, but is anticipated for future multi-band wireless devices using several antennas and RF chains. The four-port probe stations are housed in the THz measurement facility at NYU and allow simultaneous probing from East, West, North, and South orientations, which presents challenges for calibration. An on-chip Short-Open-Load-Reciprocal (SOLR) calibration (cal) standard is designed leveraging UMC's 28 nm CMOS process. S/O/L standard S-parameters are extracted using a virtual multiline Thru-Reflect-Line (mTRL) cal and used to validate SOLR cal performance via simulations up to 125 GHz. The novel probing solutions from MPI and FFI, along with the SOLR cal, open up considerable opportunities for precise RF characterization across wide frequency ranges.
comment: 3 pages, 5 figures, Asia-Pacific Microwave Conference 2025
Modeling and Mixed-Integer Nonlinear MPC of Positive-Negative Pressure Pneumatic Systems
Positive-negative pressure regulation is critical to soft robotic actuators, enabling large motion ranges and versatile actuation modes. However, it remains challenging due to complex nonlinearities, oscillations, and direction-dependent, piecewise dynamics introduced by affordable pneumatic valves and the bidirectional architecture. We present a model-based control framework that couples a physics-grounded switched nonlinear plant model (inflation/deflation modes) with a mixed-integer nonlinear model predictive controller (MI-NMPC). The controller co-optimizes mode scheduling and PWM inputs to realize accurate reference tracking while enforcing input constraints and penalizing energy consumption and excessive switching. To make discrete mode decisions tractable, we employ a Combinatorial Integral Approximation that relaxes binary mode variables to continuous surrogates within the valve-scheduling layer. With parameters identified from the physical system, simulations with step and sinusoidal references validate the proposed MI-NMPC, showing a consistently favorable trade-off among accuracy, control effort, and switching, and outperforming conventional PID and NMPC with heuristic mode selection.
comment: Has been submitted to conference
MM-LMPC: Multi-Modal Learning Model Predictive Control via Bandit-Based Mode Selection
Learning Model Predictive Control (LMPC) improves performance on iterative tasks by leveraging data from previous executions. At each iteration, LMPC constructs a sampled safe set from past trajectories and uses it as a terminal constraint, with a terminal cost given by the corresponding cost-to-go. While effective, LMPC heavily depends on the initial trajectories: states with high cost-to-go are rarely selected as terminal candidates in later iterations, leaving parts of the state space unexplored and potentially missing better solutions. For example, in a reach-avoid task with two possible routes, LMPC may keep refining the initially shorter path while neglecting the alternative path that could lead to a globally better solution. To overcome this limitation, we propose Multi-Modal LMPC (MM-LMPC), which clusters past trajectories into modes and maintains mode-specific terminal sets and value functions. A bandit-based meta-controller with a Lower Confidence Bound (LCB) policy balances exploration and exploitation across modes, enabling systematic refinement of all modes. This allows MM-LMPC to escape high-cost local optima and discover globally superior solutions. We establish recursive feasibility, closed-loop stability, asymptotic convergence to the best mode, and a logarithmic regret bound. Simulations on obstacle-avoidance tasks validate the performance improvements of the proposed method.
comment: This paper is submitted to 2026 American Control Conference (ACC)
Learning Passive Continuous-Time Dynamics with Multistep Port-Hamiltonian Gaussian Processes
We propose the multistep port-Hamiltonian Gaussian process (MS-PHS GP) to learn physically consistent continuous-time dynamics and a posterior over the Hamiltonian from noisy, irregularly-sampled trajectories. By placing a GP prior on the Hamiltonian surface $H$ and encoding variable-step multistep integrator constraints as finite linear functionals, MS-PHS GP enables closed-form conditioning of both the vector field and the Hamiltonian surface without latent states, while enforcing energy balance and passivity by design. We state a finite-sample vector-field bound that separates the estimation and variable-step discretization terms. Lastly, we demonstrate improved vector-field recovery and well-calibrated Hamiltonian uncertainty on mass-spring, Van der Pol, and Duffing benchmarks.
Combining Large Language Models and Gradient-Free Optimization for Automatic Control Policy Synthesis
Large Language models (LLMs) have shown promise as generators of symbolic control policies, producing interpretable program-like representations through iterative search. However, these models are not capable of separating the functional structure of a policy from the numerical values it is parametrized by, thus making the search process slow and inefficient. We propose a hybrid approach that decouples structural synthesis from parameter optimization by introducing an additional optimization layer for local parameter search. In our method, the numerical parameters of LLM-generated programs are extracted and optimized numerically to maximize task performance. With this integration, an LLM iterates over the functional structure of programs, while a separate optimization loop is used to find a locally optimal set of parameters accompanying candidate programs. We evaluate our method on a set of control tasks, showing that it achieves higher returns and improved sample efficiency compared to purely LLM-guided search. We show that combining symbolic program synthesis with numerical optimization yields interpretable yet high-performing policies, bridging the gap between language-model-guided design and classical control tuning. Our code is available at https://sites.google.com/berkeley.edu/colmo.
comment: 8 pages, 7 figures
Machine Learning Detection of Lithium Plating in Lithium-ion Cells: A Gaussian Process Approach
Lithium plating during fast charging is a critical degradation mechanism that accelerates capacity fade and can trigger catastrophic safety failures. Recent work has identified a distinctive dQ/dV peak above 4.0 V as a reliable signature of plating onset; however, conventional methods for computing dQ/dV rely on finite differencing with filtering, which amplifies sensor noise and introduces bias in peak location. In this paper, we propose a Gaussian Process (GP) framework for lithium plating detection by directly modeling the charge-voltage relationship Q(V) as a stochastic process with calibrated uncertainty. Leveraging the property that derivatives of GPs remain GPs, we infer dQ/dV analytically and probabilistically from the posterior, enabling robust detection without ad hoc smoothing. The framework provides three key benefits: (i) noise-aware inference with hyperparameters learned from data, (ii) closed-form derivatives with credible intervals for uncertainty quantification, and (iii) scalability to online variants suitable for embedded BMS. Experimental validation on Li-ion coin cells across a range of C-rates (0.2C-1C) and temperatures (0-40\deg C) demonstrates that the GP-based method reliably detects plating peaks under low-temperature, high-rate charging, while correctly reporting no peaks in baseline cases. The concurrence of GP-identified differential peaks, reduced charge throughput, and capacity fade measured via reference performance tests confirms the method's accuracy and robustness, establishing a practical pathway for real-time lithium plating detection.
comment: Submitted to American Control Conference 2026 - ACC 2026
Event-Based Control via Sparsity-Promoting Regularization: A Rollout Approach with Performance Guarantees
This paper presents a controller design framework aiming to balance control performance and actuation rate. Control performance is evaluated by an infinite-horizon average cost, and the number of control actions is penalized via sparsity-promoting regularization. Since the formulated optimal control problem has a combinatorial nature, we employ a rollout algorithm to obtain a tractable suboptimal solution. In the proposed scheme, actuation timings are determined through a multistage minimization procedure based on a receding-horizon approach, and the corresponding control inputs are computed online. We establish theoretical performance guarantees with respect to periodic control and prove the stability of the closed-loop system. The effectiveness of the proposed method is demonstrated through a numerical example.
comment: 14 pages
Multi-Agent Guided Policy Search for Non-Cooperative Dynamic Games
Multi-agent reinforcement learning (MARL) optimizes strategic interactions in non-cooperative dynamic games, where agents have misaligned objectives. However, data-driven methods such as multi-agent policy gradients (MA-PG) often suffer from instability and limit-cycle behaviors. Prior stabilization techniques typically rely on entropy-based exploration, which slows learning and increases variance. We propose a model-based approach that incorporates approximate priors into the reward function as regularization. In linear quadratic (LQ) games, we prove that such priors stabilize policy gradients and guarantee local exponential convergence to an approximate Nash equilibrium. We then extend this idea to infinite-horizon nonlinear games by introducing Multi-agent Guided Policy Search (MA-GPS), which constructs short-horizon local LQ approximations from trajectories of current policies to guide training. Experiments on nonlinear vehicle platooning and a six-player strategic basketball formation show that MA-GPS achieves faster convergence and more stable learning than existing MARL methods.
comment: We fix a typo in equation (7), where we accidentally typed an extra symbol \nabla before w(\pi)
Almost Sure Convergence of Networked Policy Gradient over Time-Varying Networks in Markov Potential Games
We propose networked policy gradient play for solving Markov potential games with continuous and/or discrete state-action pairs. During the game, agents use parametrized and differentiable policies that depend on the current state and the policy parameters of other agents. During training, agents update their policy parameters following stochastic gradients. The gradient estimation involves two consecutive episodes, generating unbiased estimators of reward and policy score functions. In addition, it involves keeping estimates of others' parameters using consensus steps given local estimates received through a time-varying communication network. In Markov potential games, there exists a potential value function among agents with gradients corresponding to the gradients of local value functions. Using this structure, we prove almost sure convergence to a stationary point of the potential value function with rate $O(1/\epsilon^2)$. Compared to previous works, our results do not require bounded policy gradients or initial agreement on the values of individual policy parameters. Numerical experiments on a dynamic multi-agent newsvendor problem verify the convergence of local beliefs and gradients. It further shows that networked policy gradient play converges as fast as independent policy gradient updates, while collecting higher rewards.
comment: 15 pages, extended journal version
Vulnerability-Based Optimal Grid Defense Strategies for Enhancing Cyber-Physical Energy System Resilience
An approach is proposed to identify optimal asset protection strategies based on vulnerability assessment outcomes. Traditional bilevel attacker-defender models emphasize worst-case scenarios but offer limited defensive guidance. In contrast, trilevel models introduce high computational complexity and rely on fixed network configurations. The proposed critical-components method leverages vulnerability assessment results to determine protection strategies, effectively outsourcing the upper-level defense decision. This enables adaptability to diverse network topologies, assessment techniques, and cyber-physical energy systems without the overhead of multi-level optimization. Case studies demonstrate the potential for improved system resilience across varying operational conditions.
comment: Accepted at UPEC 2025. Camera-ready Accepted Author Manuscript (AAM). Version of Record will be available on IEEE Xplore (DOI to be added)
On the Application of Model Predictive Control to a Weighted Coverage Path Planning Problem
This paper considers the application of Model Predictive Control (MPC) to a weighted coverage path planning (WCPP) problem. The problem appears in a wide range of practical applications, including search and rescue (SAR) missions. The basic setup is that one (or multiple) agents can move around a given search space and collect rewards from a given spatial distribution. Unlike an artificial potential field, each reward can only be collected once. In contrast to a Traveling Salesman Problem (TSP), the agent moves in a continuous space. Moreover, he is not obliged to cover all locations and/or may return to previously visited locations. The WCPP problem is tackled by a new Model Predictive Control (MPC) formulation with so-called Coverage Constraints (CCs). It is shown that the solution becomes more effective if the solver is initialized with a TSP-based heuristic. With and without this initialization, the proposed MPC approach clearly outperforms a naive MPC formulation, as demonstrated in a small simulation study.
TaxSolver: A methodology to realize optimal income tax reform
Across the globe there are growing calls to streamline and improve ever more complex income tax codes. Executing reform has proven difficult. Even when the desired outcomes are clear, the tools to design fitting reforms are lacking. To remedy this, we developed \texttt{TaxSolver}: a methodology to help policymakers realize optimal tax reform. \texttt{TaxSolver} allows policymakers to focus solely on what they aim to achieve with a reform -- like redistributing wealth, incentivizing labor market participation or reducing complexity -- and the guarantees within which reform is acceptable -- like limited fluctuations in taxpayer incomes or shocks to overall tax revenue. Given these goals and fiscal guarantees, \texttt{TaxSolver} finds the optimal set of tax rules that satisfies all the criteria or shows that the set of demands are not mathematically feasible. We illustrate \texttt{TaxSolver} by reforming various simulated examples of tax codes, including some that reflect the complexity and size of a real-world tax system.
comment: 47 pages, 14 figures, 4 tables
Decentralized Online Regularized Learning Over Random Time-Varying Graphs
We study the decentralized online regularized linear regression algorithm over random time-varying graphs. At each time step, every node runs an online estimation algorithm consisting of an innovation term processing its own new measurement, a consensus term taking a weighted sum of estimations of its own and its neighbors with additive and multiplicative communication noises and a regularization term preventing over-fitting. It is not required that the regression matrices and graphs satisfy special statistical assumptions such as mutual independence, spatio-temporal independence or stationarity. We develop the nonnegative supermartingale inequality of the estimation error, and prove that the estimations of all nodes converge to the unknown true parameter vector almost surely if the algorithm gains, graphs and regression matrices jointly satisfy the sample path spatio-temporal persistence of excitation condition. Especially, this condition holds by choosing appropriate algorithm gains if the graphs are uniformly conditionally jointly connected and conditionally balanced, and the regression models of all nodes are uniformly conditionally spatio-temporally jointly observable, under which the algorithm converges in mean square and almost surely. In addition, we prove that the regret upper bound is $O(T^{1-\tau}\ln T)$, where $\tau\in (0.5,1)$ is a constant depending on the algorithm gains.
Heterogeneous Predictor-based Risk-Aware Planning with Conformal Prediction in Dense, Uncertain Environments
Real-time planning among many uncertain, dynamic obstacles is challenging because predicting every agent with high fidelity is both unnecessary and computationally expensive. We present Heterogeneous Predictor-based Risk-Aware Planning (H-PRAP), a framework that allocates prediction effort to where it matters. H-PRAP introduces the Probability-based Collision Risk Index (P-CRI), a closed-form, horizon-level collision index obtained by calibrating a Gaussian surrogate with conformal prediction. P-CRI drives a router that assigns high-risk obstacles to accurate but expensive predictors and low-risk obstacles to lightweight predictors, while preserving distribution-free coverage across heterogeneous predictors through conformal prediction. The selected predictions and their conformal radii are embedded in a chance-constrained model predictive control (MPC) problem, yielding receding-horizon policies with explicit safety margins. We analyze the safety-efficiency trade-off under prediction compute budget: more portion of low-fidelity predictions reduce residual risk from dropped obstacles, but in the same time induces larger conformal radii and degrades trajectory efficiency and shrinks MPC feasibility. Extensive numerical simulations in dense, uncertain environments validate that H-PRAP attains best balance between trajectory success rate (i.e., no collisions) and the time to reach the goal (i.e., trajectory efficiency) compared to single prediction architectures.
Graphon Particle Systems, Part II: Dynamics of Distributed Stochastic Continuum Optimization
We study the distributed optimization problem over a graphon with a continuum of nodes, which is regarded as the limit of the distributed networked optimization as the number of nodes goes to infinity. Each node has a private local cost function. The global cost function, which all nodes cooperatively minimize, is the integral of the local cost functions on the node set. We propose stochastic gradient descent and gradient tracking algorithms over the graphon. We establish a general lemma for the upper bound estimation related to a class of time-varying differential inequalities with negative linear terms, based upon which, we prove that for both kinds of algorithms, the second moments of the nodes' states are uniformly bounded. Especially, for the stochastic gradient tracking algorithm, we transform the convergence analysis into the asymptotic property of coupled nonlinear differential inequalities with time-varying coefficients and develop a decoupling method. For both kinds of algorithms, we show that by choosing the time-varying algorithm gains properly, all nodes' states achieve $\mathcal{L}^{\infty}$-consensus for a connected graphon. Furthermore, if the local cost functions are strongly convex, then all nodes' states converge to the minimizer of the global cost function and the auxiliary states in the stochastic gradient tracking algorithm converge to the gradient value of the global cost function at the minimizer uniformly in mean square.
Graphon Particle Systems, Part I: Spatio-Temporal Approximation and Law of Large Numbers
We study a class of graphon particle systems with time-varying random coefficients. In a graphon particle system, the interactions among particles are characterized by the coupled mean field terms through an underlying graphon and the randomness of the coefficients comes from exogenous stochastic processes. By constructing two-level approximated sequences converging in 2-Wasserstein distance, we prove the existence and uniqueness of the solution to the system. Besides, by constructing two-level approximated functions converging to the graphon mean field terms, we establish the law of large numbers, which reveals that if the number of particles tends to infinity and the discretization step tends to zero, then the discrete-time interacting particle system over a large-scale network converges to the graphon particle system. As a byproduct, we discover that the graphon particle system can describe the limiting dynamics of the distributed stochastic gradient descent algorithm over the large-scale network and prove that if the gradients of the local cost functions are Lipschitz continuous, then the graphon particle system can be regarded as the spatio-temporal approximation of the discrete-time distributed stochastic gradient descent algorithm as the number of network nodes tends to infinity and the algorithm step size tends to zero.
Transient Stability Analysis of a Hybrid Grid-Forming and Grid-Following RES System Considering Multi-Mode Control Switching
The inherent control switching of renewable energy sources (RESs) during intricate transient processes introduces complexity to the dynamic behavior of modern power systems. This paper reveals the dynamic coupling between grid-forming (GFM)/grid-following (GFL)-based RES and dominant instability modes of the hybrid system. First, six control combinations are systematically investigated by pairing the two GFM-RES modes, normal control (NC) and current saturation (CS), with the three GFL-RES modes: normal control, low voltage ride-through (LVRT), and high voltage ride-through (HVRT). Based on switching system theory, the coupled power flow and dynamic motion models are developed considering multi-mode switching characteristics. It is revealed that the hybrid system exhibits two distinct instability modes when the GFM-RES and GFL-RES exceed their P-f and V-f desynchronization boundaries, respectively. The two-dimensional spatiotemporal damping characteristics of GFL-RES induced by GFM-RES are also uncovered for the first time. A novel criterion is proposed to quantify the impact of GFM-RES on GFL-RES dynamics, capturing both its stabilizing and destabilizing effects under different control combinations. High-fidelity electromagnetic transient simulations validate the correctness of the analysis framework.
Learning linear dynamical systems under convex constraints
We consider the problem of finite-time identification of linear dynamical systems from $T$ samples of a single trajectory. Recent results have predominantly focused on the setup where either no structural assumption is made on the system matrix $A^* \in \mathbb{R}^{n \times n}$, or specific structural assumptions (e.g. sparsity) are made on $A^*$. We assume prior structural information on $A^*$ is available, which can be captured in the form of a convex set $\mathcal{K}$ containing $A^*$. For the solution of the ensuing constrained least squares estimator, we derive non-asymptotic error bounds in the Frobenius norm that depend on the local size of $\mathcal{K}$ at $A^*$. To illustrate the usefulness of these results, we instantiate them for four examples, namely when (i) $A^*$ is sparse and $\mathcal{K}$ is a suitably scaled $\ell_1$ ball; (ii) $\mathcal{K}$ is a subspace; (iii) $\mathcal{K}$ consists of matrices each of which is formed by sampling a bivariate convex function on a uniform $n \times n$ grid (convex regression); (iv) $\mathcal{K}$ consists of matrices each row of which is formed by uniform sampling (with step size $1/T$) of a univariate Lipschitz function. In all these situations, we show that $A^*$ can be reliably estimated for values of $T$ much smaller than what is needed for the unconstrained setting.
comment: 29 pages; corrected minor typos throughout; added additional explanations at certain places; statements of main results unchanged
Diffusion Model-based Parameter Estimation in Dynamic Power Systems
Parameter estimation, which represents a classical inverse problem, is often ill-posed as different parameter combinations can yield identical outputs. This non-uniqueness poses a critical barrier to accurate and unique identification. This work introduces a novel parameter estimation framework to address such limits: the Joint Conditional Diffusion Model-based Inverse Problem Solver (JCDI). By leveraging the stochasticity of diffusion models, JCDI produces possible solutions revealing underlying distributions. Joint conditioning on multiple observations further narrows the posterior distributions of non-identifiable parameters. For the challenging task in dynamic power systems: composite load model parameterization, JCDI achieves a 58.6% reduction in parameter estimation error compared to the single-condition model. It also accurately replicates system's dynamic responses under various electrical faults, with root mean square errors below 4*10^(-3), outperforming existing deep-reinforcement-learning and supervised learning approaches. Given its data-driven nature, JCDI provides a universal framework for parameter estimation while effectively mitigating the non-uniqueness challenge across scientific domains.
Safe Event-triggered Gaussian Process Learning for Barrier-Constrained Control
While control barrier functions (CBFs) are employed in addressing safety, control synthesis methods based on them generally rely on accurate system dynamics. This is a critical limitation, since the dynamics of complex systems are often not fully known. Supervised machine learning techniques hold great promise for alleviating this weakness by inferring models from data. We propose a novel \revision{approach for safe event-triggered learning of Gaussian process models in CBF-based continuous-time control for unknown control-affine systems. By applying a finite excitation at triggering times, our approach ensures a sufficient information gain to maintain the feasibility of the CBF-based safety condition with high probability. Our approach probabilistically guarantees safety based on a suitable GP prior and rules out} Zeno behavior in the triggering scheme. The effectiveness of the proposed approach and theory is demonstrated in simulations.
comment: The first two authors contributed equally to the work
Joint Price and Power MPC for Peak Power Reduction at Workplace EV Charging Stations
Demand charge often constitutes a significant portion of electricity costs for commercial electric vehicle (EV) charging station operators. This paper explores control methods to reduce peak power consumption at workplace EV charging stations in a joint price and power optimization framework. We optimize a menu of price options to incentivize users to select controllable charging service. Using this framework, we propose a model predictive control approach to reduce both demand charge and overall operator costs. Through a Monte Carlo simulation, we find that our algorithm outperforms a state-of-the-art benchmark optimization strategy and can significantly reduce station operator costs.
An Iterative Bayesian Approach for System Identification based on Linear Gaussian Models
We tackle the problem of system identification, where we select inputs, observe the corresponding outputs from the true system, and optimize the parameters of our model to best fit the data. We propose a practical and computationally tractable methodology that is compatible with any system and parametric family of models. Our approach only requires input-output data from the system and first-order information of the model with respect to the parameters. Our approach consists of two modules. First, we formulate the problem of system identification from a Bayesian perspective and use a linear Gaussian model approximation to iteratively optimize the model's parameters. In each iteration, we propose to use the input-output data to tune the covariance of the linear Gaussian model. This online covariance calibration stabilizes fitting and signals model inaccuracy. Secondly, we define a Gaussian-based uncertainty measure for the model parameters, which we can then minimize with respect to the next selected input. We test our method with linear and nonlinear dynamics.
comment: Submitted to the 2026 American Control Conference
Stochastically Structured Reservoir Computers for Financial and Economic System Identification
This paper introduces a methodology for identifying and simulating financial and economic systems using stochastically structured reservoir computers (SSRCs). The framework combines structure-preserving embeddings with graph-informed coupling matrices to model inter-agent dynamics while enhancing interpretability. A constrained optimization scheme guarantees compliance with both stochastic and structural constraints. Two empirical case studies, a nonlinear stochastic dynamic model and regional inflation network dynamics, demonstrate the effectiveness of the approach in capturing complex nonlinear patterns and enabling interpretable predictive analysis under uncertainty.
On the Gaussian Limit of the Output of IIR Filters
We study the asymptotic distribution of the output of a stable Linear Time-Invariant (LTI) system driven by a non-Gaussian stochastic input. Motivated by longstanding heuristics in the stochastic describing function method, we rigorously characterize when the output process becomes approximately Gaussian, even when the input is not. Using the Wasserstein-1 distance as a quantitative measure of non-Gaussianity, we derive upper bounds on the distance between the appropriately scaled output and a standard normal distribution. These bounds are obtained via Stein's method and depend explicitly on the system's impulse response and the dependence structure of the input process. We show that when the dominant pole of the system approaches the edge of stability and the input satisfies one of the following conditions: (i) independence, (ii) positive correlation with a real and positive dominant pole, or (iii) sufficient correlation decay, the output converges to a standard normal distribution at rate $O(1/\sqrt{t})$. We also present counterexamples where convergence fails, thereby motivating the stated assumptions. Our results provide a rigorous foundation for the widespread observation that outputs of low-pass LTI systems tend to be approximately Gaussian.
comment: 8 pages, 1 figure, accepted for publication IEEE Conference on Decision and Control 2025
On the Standard Performance Criteria for Applied Control Design: PID, MPC or Machine Learning Controller?
The traditional control theory and its application to basic and complex systems have reached an advanced level of maturity. This includes aerial, marine, and ground vehicles, as well as robotics, chemical, transportation, and electrical systems widely used in our daily lives. The emerging era of data-driven methods, Large Language Models (LLMs), and AI-based controllers does not indicate a weakness in well-established control theory. Instead, it aims to reduce dependence on models and uncertainties, address increasingly complex systems, and potentially achieve decision-making capabilities comparable to human-level performance. This revolution integrates knowledge from computer science, machine learning, biology, and classical control, producing promising algorithms that are yet to demonstrate widespread real-world applicability. Despite the maturity of control theory and the presence of various performance criteria, there is still a lack of standardised metrics for testing, evaluation, Verification and Validation ($V\&V$) of algorithms. This gap can lead to algorithms that, while optimal in certain aspects, may fall short of practical implementation, sparking debates within the literature. For a controller to succeed in real-world applications, it must satisfy three key categories of performance metrics: tracking quality, control effort (energy consumption), and robustness. This paper rather takes an applied perspective, proposing and consolidating standard performance criteria for testing and analysing control systems, intended for researchers and students. The proposed framework ensures the post-design applicability of a black-box algorithm, aligning with modern data analysis and $V\&V$ perspectives to prevent resource allocation to systems with limited impact or imprecise claims.
Reconstructing Brain Causal Dynamics for Subject and Task Fingerprints using fMRI Time-series Data
Purpose: Recently, there has been a revived interest in system neuroscience causation models, driven by their unique capability to unravel complex relationships in multi-scale brain networks. In this paper, we present a novel method that leverages causal dynamics to achieve effective fMRI-based subject and task fingerprinting. Methods: By applying an implicit-explicit discretization scheme, we develop a two-timescale linear state-space model. Through data-driven identification of its parameters, the model captures causal signatures, including directed interactions among brain regions from a spatial perspective, and disentangled fast and slow dynamic modes of brain activity from a temporal perspective. These causal signatures are then integrated with: (i) a modal decomposition and projection method for model-based subject identification, and (ii) a Graph Neural Network (GNN) framework for learning-based task classification. Furthermore, we introduce the concept of the brain reachability landscape as a novel visualization tool, which quantitatively characterizes the maximum possible activation levels of brain regions under various fMRI tasks. Results: We evaluate the proposed approach using the Human Connectome Project dataset and demonstrate its advantage over non-causality-based methods. The obtained causal signatures are visualized and demonstrate clear biological relevance with established understandings of brain function. Conclusion: We verified the feasibility and effectiveness of utilizing brain causal signatures for subject and task fingerprinting. Additionally, our work paves the way for further studies on causal fingerprints with potential applications in both healthy controls and neurodegenerative diseases.
Continuous Wrist Control on the Hannes Prosthesis: a Vision-based Shared Autonomy Framework ICRA 2025
Most control techniques for prosthetic grasping focus on dexterous fingers control, but overlook the wrist motion. This forces the user to perform compensatory movements with the elbow, shoulder and hip to adapt the wrist for grasping. We propose a computer vision-based system that leverages the collaboration between the user and an automatic system in a shared autonomy framework, to perform continuous control of the wrist degrees of freedom in a prosthetic arm, promoting a more natural approach-to-grasp motion. Our pipeline allows to seamlessly control the prosthetic wrist to follow the target object and finally orient it for grasping according to the user intent. We assess the effectiveness of each system component through quantitative analysis and finally deploy our method on the Hannes prosthetic arm. Code and videos: https://hsp-iit.github.io/hannes-wrist-control.
comment: Accepted to ICRA 2025. Project website: https://hsp-iit.github.io/hannes-wrist-control
Data-Driven Stabilization of Unknown Linear-Threshold Network Dynamics
This paper studies the data-driven control of unknown linear-threshold network dynamics to stabilize the state to a reference value. We consider two types of controllers: (i) a state feedback controller with feed-forward reference input and (ii) an augmented feedback controller with error integration. The first controller features a simpler structure and is easier to design, while the second offers improved performance in the presence of system parameter changes and disturbances. Our design strategy employs state-input datasets to construct data-based representations of the closed-loop dynamics. Since these representations involve linear threshold functions, we rewrite them as switched linear systems, and formulate the design problem as that of finding a common controller for all the resulting modes. This gives rise to a set of linear matrix inequalities (LMIs) whose solutions corresponds to the controller gain matrices. We analyze the computational complexity of solving the LMIs and propose a simplified, sufficient set of conditions that scales linearly with the system state. Simulations on two case studies involving regulation of firing rate dynamics in rodent brains and of arousal level dynamics in humans demonstrate the effectiveness of the controller designs.
comment: 14 pages
Path Integral Bottleneck: An Algorithm-Agnostic Framework of Computation and Control
Executing a control sequence requires computation. While this is a simple observation, developing a framework that relates a controller's required computation to its ability to successfully control a system (e.g. lower control cost) is challenging, especially when the controller appears on alternative compute platforms (e.g. biological neural networks). More specifically, we want a framework where, given an observed closed-loop trajectory, we can quantify the computation effort needed to produce that trajectory. To enable effective comparisons of closed-loop systems across alternative compute platforms, we present the Path Integral Bottleneck (PI-IB), a method to produce an analytical, algorithm-agnostic description of the compute-control relationship. With the PI-IB framework, we can plot tradeoffs between performance and computation effort for any given plant description and control cost function. Simulations of the cart-pole reveal fundamental control-compute tradeoffs, exposing regions where the task performance-per-compute is higher than others.
comment: 8 pages, 8 Figures, Under Review
Kapitza-Inspired Stabilization of Non-Foster Circuits via Time Modulations
With his formal analysis in 1951, the physicist Pyotr Kapitza demonstrated that an inverted pendulum with an externally vibrating base can be stable in its upper position, thus overcoming the force of gravity. Kapitza's work is an example that an originally unstable system can become stable after a minor perturbation of its properties or initial conditions is applied. Inspired by his ideas, we show how non-Foster circuits can be stabilized with the application of external \textit{electrical vibration}, i.e., time modulations. Non-Foster circuits are highly appreciated in the engineering community since their bandwidth characteristics are not limited by passive-circuits bounds. Unfortunately, non-Foster circuits are usually unstable and they must be stabilized prior to operation. Here, we focus on the study of non-Foster $L(t)C$ circuits with time-varying inductors and time-invariant negative capacitors. We find an intrinsic connection between Kapitza's inverted pendulum and non-Foster $L(t)C$ resonators. Moreover, we show how positive time-varying modulations of $L(t)>0$ can overcome and stabilize non-Foster negative capacitances $C<0$. These findings open up an alternative manner of stabilizing electric circuits with the use of time modulations, and lay the groundwork for application of, what we coin \textit{Vibrational Electromagnetics}, in more complex media.
comment: 9 pages (7 pages main text, 2 pages supplementary materials), 4 figures; a minor issue in Fig. 3(a) is corrected. The accepted version is uploaded now
Stochastic Approximation in a Markovian Framework Revisited: Lipschitz Continuity of the Poisson Equation
In this paper we revisit a fundamental technical issue within the theory of stochastic approximation (SA) in a Markovian framework, first proposed in the book by Djereveckii and Fradkov (1981), and further developed in much detail in the book by Benveniste, M{\'e}tivier, and Priouret (1990). This theory is instrumental in many application areas such as the statistical analysis of Hidden Markov Models arising in telecommunication, quantized linear stochastic systems, and more recently in active learning and reinforcement learning. The problem at hand is the verification of the existence, uniqueness and Lipschitz-continuity of the solution of a parameter-dependent Poisson equation, in an appropriate weighted sup-norm, associated with a collection of Markov chains on general state spaces. Verification of the above facts is vital in the analysis of SA processes presented in (Benveniste et al., 1990) via the ODE (ordinary differential equations) method, requiring substantial technical effort. The motivation and focus of the paper is to address this technical issue, by presenting a simple set of conditions, under which the above properties of the Poisson equation at hand can be conveniently established. The starting point of our work is an intricate result of Hairer and Mattingly (2011) proving that by tilting standard conditions of mainstream stability theory for Markov chains, the transition kernels prove to be contractions in the space of differences of probability measures in a suitable metric. To demonstrate the applicability of our results, the proposed conditions are verified for a class of queuing system with open-loop control.
Systems and Control (EESS)
Probabilistic Control Barrier Functions: Safety in Probability for Discrete-Time Stochastic Systems
Control systems operating in the real world face countless sources of unpredictable uncertainties. These random disturbances can render deterministic guarantees inapplicable and cause catastrophic safety failures. To overcome this, this paper proposes a method for designing safe controllers for discrete-time stochastic systems that retain probabilistic guarantees of safety. To do this we modify the traditional notion of a control barrier function (CBF) to explicitly account for these stochastic uncertainties and call these new modified functions probabilistic CBFs. We show that probabilistic CBFs can be used to design controllers that guarantee safety over a finite number of time steps with a prescribed probability. Next, by leveraging various uncertainty quantification methods, such as concentration inequalities, the scenario approach, and conformal prediction, we provide a variety of sufficient conditions that result in computationally tractable controllers with tunable probabilistic guarantees across a plethora of practical scenarios. Finally, we showcase the applicability of our results in simulation and hardware for the control of a quadruped robot.
Off-Policy Reinforcement Learning with Anytime Safety Guarantees via Robust Safe Gradient Flow
This paper considers the problem of solving constrained reinforcement learning (RL) problems with anytime guarantees, meaning that the algorithmic solution must yield a constraint-satisfying policy at every iteration of its evolution. Our design is based on a discretization of the Robust Safe Gradient Flow (RSGF), a continuous-time dynamics for anytime constrained optimization whose forward invariance and stability properties we formally characterize. The proposed strategy, termed RSGF-RL, is an off-policy algorithm which uses episodic data to estimate the value functions and their gradients and updates the policy parameters by solving a convex quadratically constrained quadratic program. Our technical analysis combines statistical analysis, the theory of stochastic approximation, and convex analysis to determine the number of episodes sufficient to ensure that safe policies are updated to safe policies and to recover from an unsafe policy, both with an arbitrary user-specified probability, and to establish the asymptotic convergence to the set of KKT points of the RL problem almost surely. Simulations on a navigation example and the cart-pole system illustrate the superior performance of RSGF-RL with respect to the state of the art.
A Robust Neural Control Design for Multi-drone Slung Payload Manipulation with Control Contraction Metrics
This paper presents a robust neural control design for a three-drone slung payload transportation system to track a reference path under external disturbances. The control contraction metric (CCM) is used to generate a neural exponentially converging baseline controller while complying with control input saturation constraints. We also incorporate the uncertainty and disturbance estimator (UDE) technique to dynamically compensate for persistent disturbances. The proposed framework yields a modularized design, allowing the controller and estimator to perform their individual tasks and achieve a zero trajectory tracking error if the disturbances meet certain assumptions. The stability and robustness of the complete system, incorporating both the CCM controller and the UDE compensator, are presented. Simulations are conducted to demonstrate the capability of the proposed control design to follow complicated trajectories under external disturbances.
comment: Submit to the 2026 American Control Conference (ACC)
Pose Estimation of a Thruster-Driven Bioinspired Multi-Link Robot
This work demonstrates pose (position and shape) estimation for a free-floating, bioinspired multi-link robot with unactuated joints, link-mounted thrusters for control, and a single gyroscope per link, resulting in an underactuated, minimally sensed platform. Through a proof-of-concept hardware experiment and offline Kalman filter analysis, we show that the robot's pose can be reliably estimated. State estimation is performed using an unscented Kalman filter augmented with Gaussian process residual learning to compensate for non-zero-mean, non-Gaussian noise. We further show that a filter trained on a multi-gait dataset (forward, backward, left, right, and turning) performs comparably to one trained on a larger forward-gait-only dataset when both are evaluated on the same forward-gait test trajectory. These results reveal overlap in the gait input space, which can be exploited to reduce training data requirements while enhancing the filter's generalizability across multiple gaits.
comment: 8 pages, 8 figures, 3 tables
Adversarial Social Influence: Modeling Persuasion in Contested Social Networks
We present the Social Influence Game (SIG), a framework for modeling adversarial persuasion in social networks with an arbitrary number of competing players. Our goal is to provide a tractable and interpretable model of contested influence that scales to large systems while capturing the structural leverage points of networks. Each player allocates influence from a fixed budget to steer opinions that evolve under DeGroot dynamics, and we prove that the resulting optimization problem is a difference-of-convex program. To enable scalability, we develop an Iterated Linear (IL) solver that approximates player objectives with linear programs. In experiments on random and archetypical networks, IL achieves solutions within 7% of nonlinear solvers while being over 10x faster, scaling to large social networks. This paper lays a foundation for asymptotic analysis of contested influence in complex networks.
Density-Ratio Weighted Behavioral Cloning: Learning Control Policies from Corrupted Datasets
Offline reinforcement learning (RL) enables policy optimization from fixed datasets, making it suitable for safety-critical applications where online exploration is infeasible. However, these datasets are often contaminated by adversarial poisoning, system errors, or low-quality samples, leading to degraded policy performance in standard behavioral cloning (BC) and offline RL methods. This paper introduces Density-Ratio Weighted Behavioral Cloning (Weighted BC), a robust imitation learning approach that uses a small, verified clean reference set to estimate trajectory-level density ratios via a binary discriminator. These ratios are clipped and used as weights in the BC objective to prioritize clean expert behavior while down-weighting or discarding corrupted data, without requiring knowledge of the contamination mechanism. We establish theoretical guarantees showing convergence to the clean expert policy with finite-sample bounds that are independent of the contamination rate. A comprehensive evaluation framework is established, which incorporates various poisoning protocols (reward, state, transition, and action) on continuous control benchmarks. Experiments demonstrate that Weighted BC maintains near-optimal performance even at high contamination ratios outperforming baselines such as traditional BC, batch-constrained Q-learning (BCQ) and behavior regularized actor-critic (BRAC).
Comparative Field Deployment of Reinforcement Learning and Model Predictive Control for Residential HVAC
Advanced control strategies like Model Predictive Control (MPC) offer significant energy savings for HVAC systems but often require substantial engineering effort, limiting scalability. Reinforcement Learning (RL) promises greater automation and adaptability, yet its practical application in real-world residential settings remains largely undemonstrated, facing challenges related to safety, interpretability, and sample efficiency. To investigate these practical issues, we performed a direct comparison of an MPC and a model-based RL controller, with each controller deployed for a one-month period in an occupied house with a heat pump system in West Lafayette, Indiana. This investigation aimed to explore scalability of the chosen RL and MPC implementations while ensuring safety and comparability. The advanced controllers were evaluated against each other and against the existing controller. RL achieved substantial energy savings (22\% relative to the existing controller), slightly exceeding MPC's savings (20\%), albeit with modestly higher occupant discomfort. However, when energy savings were normalized for the level of comfort provided, MPC demonstrated superior performance. This study's empirical results show that while RL reduces engineering overhead, it introduces practical trade-offs in model accuracy and operational robustness. The key lessons learned concern the difficulties of safe controller initialization, navigating the mismatch between control actions and their practical implementation, and maintaining the integrity of online learning in a live environment. These insights pinpoint the essential research directions needed to advance RL from a promising concept to a truly scalable HVAC control solution.
comment: 27 pages, 11 figures, 4 tables. Under review for Applied Energy
Touching the tumor boundary: A pilot study on ultrasound based virtual fixtures for breast-conserving surgery
Purpose: Delineating tumor boundaries during breast-conserving surgery is challenging as tumors are often highly mobile, non-palpable, and have irregularly shaped borders. To address these challenges, we introduce a cooperative robotic guidance system that applies haptic feedback for tumor localization. In this pilot study, we aim to assess if and how this system can be successfully integrated into breast cancer care. Methods: A small haptic robot is retrofitted with an electrocautery blade to operate as a cooperatively controlled surgical tool. Ultrasound and electromagnetic navigation are used to identify the tumor boundaries and position. A forbidden region virtual fixture is imposed when the surgical tool collides with the tumor boundary. We conducted a study where users were asked to resect tumors from breast simulants both with and without the haptic guidance. We then assess the results of these simulated resections both qualitatively and quantitatively. Results: Virtual fixture guidance is shown to improve resection margins. On average, users find the task to be less mentally demanding, frustrating, and effort intensive when haptic feedback is available. We also discovered some unanticipated impacts on surgical workflow that will guide design adjustments and training protocol moving forward. Conclusion: Our results suggest that virtual fixtures can help localize tumor boundaries in simulated breast-conserving surgery. Future work will include an extensive user study to further validate these results and fine-tune our guidance system.
A New Partial State-Feedback IDA-PBC for Two-Dimensional Nonlinear Systems: Application to Power Converters with Experimental Results
In this paper we propose a variation of the widely popular Interconnection-and-Damping-Assigment Passivity-Based Control (IDA-PBC) based on Poincare's Lemma to design output feedback globally stabilizing controllers for two dimensional systems. The procedure is constructive and, in comparison with the classical IDA-PBC, whose application is often stymied by the need to solve the (infamous) matching partial differential equation (PDE), in this new method the PDE is replaced by an ordinary differential equation, whose solution is far simpler. The procedure is then applied for the design of voltage-feedback controllers for the three most typical DC-to-DC power converter topologies: the Buck, Boost and Buck-Boost. It is assumed that these converters feed an uncertain load, which is characterized by a static relation between its voltage and current. In the case when the load consists of the parallel connection of a resistive term and a constant power load we propose an adaptive version of the design, adding an identification scheme for the load parameters. This allows the controller to regulate the converter output when the load varies-that is a typical scenario in these applications. Extensive numerical simulations and experimental results validate the approach.
Robust Data-Driven Control for Nonlinear Systems Using their Digital Twins and Quadratic Funnels
This paper examines a robust data-driven approach for the safe deployment of systems with nonlinear dynamics using their imperfect digital twins. Our contribution involves proposing a method that fuses the digital twin's nominal trajectory with online, data-driven uncertainty quantification to synthesize robust tracking controllers. Specifically, we derive data-driven bounds to capture the deviations of the actual system from its prescribed nominal trajectory informed via its digital twin. Subsequently, the dataset is used in the synthesis of quadratic funnels -- robust positive invariant tubes around the nominal trajectory -- via linear matrix inequalities built on the time-series data. The resulting controller guarantees constraint satisfaction while adapting to the true system behavior through a segmented learning strategy, where each segment's controller is synthesized using uncertainty information from the previous segment. This work establishes a systematic framework for obtaining safety certificates in learning-based control of nonlinear systems with imperfect models.
comment: 8 pages, 3 figures
Beyond Collision Cones: Dynamic Obstacle Avoidance for Nonholonomic Robots via Dynamic Parabolic Control Barrier Functions
Control Barrier Functions (CBFs) are a powerful tool for ensuring the safety of autonomous systems, yet applying them to nonholonomic robots in cluttered, dynamic environments remains an open challenge. State-of-the-art methods often rely on collision-cone or velocity-obstacle constraints which, by only considering the angle of the relative velocity, are inherently conservative and can render the CBF-based quadratic program infeasible, particularly in dense scenarios. To address this issue, we propose a Dynamic Parabolic Control Barrier Function (DPCBF) that defines the safe set using a parabolic boundary. The parabola's vertex and curvature dynamically adapt based on both the distance to an obstacle and the magnitude of the relative velocity, creating a less restrictive safety constraint. We prove that the proposed DPCBF is valid for a kinematic bicycle model subject to input constraints. Extensive comparative simulations demonstrate that our DPCBF-based controller significantly enhances navigation success rates and QP feasibility compared to baseline methods. Our approach successfully navigates through dense environments with up to 100 dynamic obstacles, scenarios where collision cone-based methods fail due to infeasibility.
comment: The first two authors contributed equally to this work. Project page: https://www.taekyung.me/dpcbf
DeMuon: A Decentralized Muon for Matrix Optimization over Graphs
In this paper, we propose DeMuon, a method for decentralized matrix optimization over a given communication topology. DeMuon incorporates matrix orthogonalization via Newton-Schulz iterations-a technique inherited from its centralized predecessor, Muon-and employs gradient tracking to mitigate heterogeneity among local functions. Under heavy-tailed noise conditions and additional mild assumptions, we establish the iteration complexity of DeMuon for reaching an approximate stochastic stationary point. This complexity result matches the best-known complexity bounds of centralized algorithms in terms of dependence on the target tolerance. To the best of our knowledge, DeMuon is the first direct extension of Muon to decentralized optimization over graphs with provable complexity guarantees. We conduct preliminary numerical experiments on decentralized transformer pretraining over graphs with varying degrees of connectivity. Our numerical results demonstrate a clear margin of improvement of DeMuon over other popular decentralized algorithms across different network topologies.
A Control Theory inspired Exploration Method for a Linear Bandit driven by a Linear Gaussian Dynamical System
The paper introduces a linear bandit environment where the reward is the output of a known Linear Gaussian Dynamical System (LGDS). In this environment, we address the fundamental challenge of balancing exploration -- gathering information about the environment -- and exploitation -- selecting to the action with the highest predicted reward. We propose two algorithms, Kalman filter Upper Confidence Bound (Kalman-UCB) and Information filter Directed Exploration Action-selection (IDEA). Kalman-UCB uses the principle of optimism in the face of uncertainty. IDEA selects actions that maximize the combination of the predicted reward and a term that quantifies how much an action minimizes the error of the Kalman filter state prediction, which depends on the LGDS property called observability. IDEA is motivated by applications such as hyperparameter optimization in machine learning. A major problem encountered in hyperparameter optimization is the large action spaces, which hinder the performance of methods inspired by principle of optimism in the face of uncertainty as they need to explore each action to lower reward prediction uncertainty. To predict if either Kalman-UCB or IDEA will perform better, a metric based on the LGDS properties is provided. This metric is validated with numerical results across a variety of randomly generated environments.
Integrated Security Mechanisms for Weight Protection in Memristive Crossbar Arrays
Memristive crossbar arrays enable in-memory computing by performing parallel analog computations directly within memory, making them well-suited for machine learning, neural networks, and neuromorphic systems. However, despite their advantages, non-volatile memristors are vulnerable to security threats (such as adversarial extraction of stored weights when the hardware is compromised. Protecting these weights is essential since they represent valuable intellectual property resulting from lengthy and costly training processes using large, often proprietary, datasets. As a solution we propose two security mechanisms: Keyed Permutor and Watermark Protection Columns; where both safeguard critical weights and establish verifiable ownership (even in cases of data leakage). Our approach integrates efficiently with existing memristive crossbar architectures without significant design modifications. Simulations across 45nm, 22nm, and 7nm CMOS nodes, using a realistic interconnect model and a large RF dataset, show that both mechanisms offer robust protection with under 10% overhead in area, delay and power. We also present initial experiments employing the widely known MNIST dataset; further highlighting the feasibility of securing memristive in-memory computing systems with minimal performance trade-offs.
comment: 2 pages, 2 figures
Safety-Critical Control via Recurrent Tracking Functions
This paper addresses the challenge of synthesizing safety-critical controllers for high-order nonlinear systems, where constructing valid Control Barrier Functions (CBFs) remains computationally intractable. Leveraging layered control, we design CBFs in reduced-order models (RoMs) while regulating full-order models' (FoMs) dynamics at the same time. Traditional Lyapunov tracking functions are required to decrease monotonically, but systematic synthesis methods for such functions exist only for fully-actuated systems. To overcome this limitation, we introduce Recurrent Tracking Functions (RTFs), which replace the monotonic decay requirement with a weaker finite-time recurrence condition. This relaxation permits transient deviations of tracking errors while ensuring safety. By augmenting CBFs for RoMs with RTFs, we construct recurrent CBFs (RCBFs) whose zero-superlevel set is control $\tau$-recurrent, and guarantee safety for all initial states in such a set when RTFs are satisfied. We establish theoretical safety guarantees and validate the approach through numerical experiments, demonstrating RTFs' effectiveness and the safety of FoMs.
comment: 7 Pages, 2 Figures
Partial Resilient Leader-Follower Consensus in Time-Varying Graphs
This work studies resilient leader-follower consensus with a bounded number of adversaries. Existing approaches typically require robustness conditions of the entire network to guarantee resilient consensus. However, the behavior of such systems when these conditions are not fully met remains unexplored. To address this gap, we introduce the notion of partial leader-follower consensus, in which a subset of non-adversarial followers successfully tracks the leader's reference state despite insufficient robustness. We propose a novel distributed algorithm - the Bootstrap Percolation and Mean Subsequence Reduced (BP-MSR) algorithm - and establish sufficient conditions for individual followers to achieve consensus via the BP-MSR algorithm in arbitrary time-varying graphs. We validate our findings through simulations, demonstrating that our method guarantees partial leader-follower consensus, even when standard resilient consensus algorithms fail.
comment: 8 pages, 3 figures, Submitted to American Control Conference (ACC) 2026
Vulnerability Analysis Evaluating Bilevel Optimal Power Flow Approaches for Multiple Load Cases
This work presents two methodologies to enhance vulnerability assessment in power systems using bilevel attacker-defender network interdiction models. First, we introduce a systematic evaluation procedure for comparing different optimal power flow formulations in the lower-level problem. We demonstrate the procedure for a comparison of the widely used DC approximation and a linearized AC optimal power flow model. Second, we propose a novel scoring methodology to identify and prioritize critical attack vectors across diverse load and generation scenarios. Both methodologies go beyond traditional worst-case analysis. Case studies on a SimBench high-voltage test grid show that the DC approach fails to detect a significant portion of critical vulnerabilities. The scoring methodology further demonstrates the dependency of vulnerabilities on the considered load case and time step, highlighting the importance of assessing multiple scenarios and going beyond worst-case solutions. The proposed methodologies enhance power system vulnerability assessment and can support the effective development of robust defense strategies for future power systems.
Networked Control and Mean Field Problems Under Diagonal Dominance: Decentralized and Social Optimality
In this article, we employ an input-output approach to expand the study of cooperative multi-agent control and optimization problems characterized by mean-field interactions that admit decentralized and selfish solutions. The setting involves $n$ independent agents that interact solely through a shared cost function, which penalizes deviations of each agent from the group's average collective behavior. Building on our earlier results established for homogeneous agents, we extend the framework to nonidentical agents and show that, under a diagonal dominant interaction of the collective dynamics, with bounded local open-loop dynamics, the optimal controller for $H_\infty$ and $H_2$ norm minimization remains decentralized and selfish in the limit as the number of agents $n$ grows to infinity.
Predictive Control Barrier Functions for Discrete-Time Linear Systems with Unmodeled Delays
This paper introduces a predictive control barrier function (PCBF) framework for enforcing state constraints in discrete-time systems with unknown relative degree, which can be caused by input delays or unmodeled input dynamics. Existing discrete-time CBF formulations typically require the construction of auxiliary barrier functions when the relative degree is greater than one, which complicates implementation and may yield conservative safe sets. The proposed PCBF framework addresses this challenge by extending the prediction horizon to construct a CBF for an associated system with relative degree one. As a result, the superlevel set of the PCBF coincides with the safe set, simplifying constraint enforcement and eliminating the need for auxiliary functions. The effectiveness of the proposed method is demonstrated on a discrete-time double integrator with input delay and a bicopter system with position constraints.
comment: 8 pages, 7 figures, submitted to ACC 2026
Grid Frequency Stability Support Potential of Data Center: A Quantitative Assessment of Flexibility
The rapid expansion of data center infrastructure is reshaping power system dynamics by significantly increasing electricity demand while also offering potential for fast and controllable flexibility. To ensure reliable operation under such conditions, the frequency secured unit commitment problem must be solved with enhanced modeling of demand side frequency response. In this work, we propose a data-driven linearization framework based on decision tree based constraint learning to embed nonlinear nadir frequency constraints into mixed-integer linear programming. This approach enables tractable optimization of generation schedules and fast frequency response from data centers. Through case studies on both a benchmark system and a 2030 future scenario with higher DC penetration, we demonstrate that increasing the proportion of flexible DC load consistently improves system cost efficiency and supports renewable integration. However, this benefit exhibits diminishing marginal returns, motivating the introduction of the Marginal Flexibility Value metric to quantify the economic value of additional flexibility. The results highlight that as DCs become a larger share of system load, their active participation in frequency response will be increasingly indispensable for maintaining both economic and secure system operations.
Gain-Scheduled Passive Fault-Tolerant Control Design for Dual-System UAV Transition Flight
Dual-system UAVs with vertical take-off and landing capabilities have become increasingly popular in recent years. As a safety-critical system, it is important that a dual-system UAV can maintain safe flight after faults/failures occur. This paper proposes a gain-scheduled passive fault-tolerant control (PFTC) method for the transition flight of dual-system UAVs. In this novel FTC design method, the model uncertainties arising from the loss of control effectiveness caused by actuator faults/failures, for the first time, are treated as model input uncertainty, allowing us to use multiplicative uncertainty descriptions to represent it. The advantages of the proposed method consist in significantly reducing the number of design points, thereby simplifying the control synthesis process and improving the efficiency of designing the FTC system for dual-system UAV transition flight compared with the existing FTC design methods. As a general method, it can be applied to the design of FTC systems with multiple uncertain parameters and multiple channels. The developed passive FTC system is validated on a nonlinear six-degree-of-freedom simulator. The simulation results demonstrate that the gain-scheduled structured H infinity (GS SHIF) PFTC system provides superior fault tolerance performance compared with the LQR and structured H infinity control systems, thereby showcasing the effectiveness and the advantages of the proposed GS SHIF PFTC approach.
comment: 25 pages
ROSplane 2.0: A Fixed-Wing Autopilot for Research
Unmanned aerial vehicle (UAV) research requires the integration of cutting-edge technology into existing autopilot frameworks. This process can be arduous, requiring extensive resources, time, and detailed knowledge of the existing system. ROSplane is a lean, open-source fixed-wing autonomy stack built by researchers for researchers. It is designed to accelerate research by providing clearly defined interfaces with an easily modifiable framework. Powered by ROS 2, ROSplane allows for rapid integration of low or high-level control, path planning, or estimation algorithms. A focus on lean, easily understood code and extensive documentation lowers the barrier to entry for researchers. Recent developments to ROSplane improve its capacity to accelerate UAV research, including the transition from ROS 1 to ROS 2, enhanced estimation and control algorithms, increased modularity, and an improved aerodynamic modeling pipeline. This aerodynamic modeling pipeline significantly reduces the effort of transitioning from simulation to real-world testing without requiring expensive system identification or computational fluid dynamics tools. ROSplane's architecture reduces the effort required to integrate new research tools and methods, expediting hardware experimentation.
A Model-Based Extended State Observer for Discrete-Time Linear Multivariable Systems
A model-based extended state observer (MB-ESO) and its variant are proposed for discrete-time linear multivariable systems, where multiple disturbances are defined as an extended state vector in the same manner as in the original formulation of ESO. The variant MB-ESO extends the MB-ESO to address cases where the disturbance gain matrix is non-diagonal. Leveraging the connection between the variant MB-ESO and the well-known unknown input observer (UIO), the condition for the existence of a MB-ESO and its variant in multivariable systems is established, for the first time, i.e., no invariant zeros exist between the disturbances and the plant outputs. It is shown that, with the observer eigenvalues all placed at the origin and the subsystems decoupled, the variant MB-ESO produces the identical disturbance estimation as that of UIO. Moreover, the error characteristics of MB-ESO and its variant are analyzed and the transfer functions associated with the disturbance estimation errors are derived. It is demonstrated both mathematically and in simulations that the disturbance estimation error of MB-ESO decreases monotonically with respect to both the observer eigenvalues and time.
ROSflight 2.0: Lean ROS 2-Based Autopilot for Unmanned Aerial Vehicles
ROSflight is a lean, open-source autopilot ecosystem for unmanned aerial vehicles (UAVs). Designed by researchers for researchers, it is built to lower the barrier to entry to UAV research and accelerate the transition from simulation to hardware experiments by maintaining a lean (not full-featured), well-documented, and modular codebase. This publication builds on previous treatments and describes significant additions to the architecture that improve the modularity and usability of ROSflight, including the transition from ROS 1 to ROS 2, supported hardware, low-level actuator mixing, and the simulation environment. We believe that these changes improve the usability of ROSflight and enable ROSflight to accelerate research in areas like advanced-air mobility. Hardware results are provided, showing that ROSflight is able to control a multirotor over a serial connection at 400 Hz while closing all control loops on the companion computer.
comment: To be submitted to the 2026 IEEE International Conference on Robotics and Automation in Vienna, Austria
Optimal Pricing of Electric Vehicle Charging on Coupled Power-Transportation Network based on Generalized Sensitivity Analysis
In the last decade, charging service providers are emerging along with the prevalence of electric vehicles. These providers need to strategically optimize their charging prices to improve the profits considering operation conditions of the coupled power-transportation network. However, the optimal pricing problem generally involves the user equilibrium model, which leads to a mathematical program with equilibrium constraints. As a result, the pricing problem is non-convex and computationally intractable especially for large-scale network. To address this challenge, we propose a generalized sensitivity analysis approach for optimal pricing of electric vehicle charging on coupled power-transportation network. Specifically, we adopt a sensitivity analysis to capture the best response of charging demand to charging price in the gradient form. Consequently, charging service providers can make pricing decisions based on the gradient information instead of the conventional KKT conditions of the user equilibrium model. We then propose a tailored gradient descent algorithm to solve the whole pricing problem. The mathematical proof of validity is given and the time complexity of the proposed algorithm is theoretically polynomial. Numerical experiments on different scales of networks verify the computational efficiency of the proposed algorithm, indicating its potential in evaluating the impact of the optimal pricing on the operational performance of large-scale coupled power-transportation network.
Real-time Operation of Electric Autonomous Mobility-on-Demand System Considering Power System Regulation
Electric autonomous mobility-on-demand (EAMoD) systems are emerging all over the world. However, their potential swarm charging in depots may deteriorate operation of the power system, further in turn affecting EAMoD system's optimal operation. To prevent this latent risk, we develop a real-time coordination framework for the EAMoD system and the power system. First, the temporal-spatial characteristics of EAMoD fleets are fully described based on a Markov decision process model, including serving trips, repositioning, and charging. Second, charger accessibility of EAMoD depot charging is well modeled as real-world configuration, wherein fast and slow charge piles are both included. Third, the power system regulation model provides real-time charging regulation constraints for EAMoD systems to prevent potential overload and undervoltage. To address the poor solution quality attributed to the complex decision space of the EAMoD system, this paper proposes a piecewise linear-based approximate dynamic programming algorithm combined with model predictive control. Numerical experiments in the Manhattan and a 14-node power distribution network validate the effectiveness of the proposed algorithm and underscore the necessity of system coordination.
Structuring Automotive Data for Systems Engineering: A Taxonomy-Based Approach
Vehicle data is essential for advancing data-driven development throughout the automotive lifecycle, including requirements engineering, design, verification, and validation, and post-deployment optimization. Developers currently collect data in a decentralized and fragmented manner across simulations, test benches, and real-world driving, resulting in data silos, inconsistent formats, and limited interoperability. This leads to redundant efforts, inefficient integration, and suboptimal use of data. This fragmentation results in data silos, inconsistent storage structures, and limited interoperability, leading to redundant data collection, inefficient integration, and suboptimal application. To address these challenges, this article presents a structured literature review and develops an inductive taxonomy for automotive data. This taxonomy categorizes data according to its sources and applications, improving data accessibility and utilization. The analysis reveals a growing emphasis on real-world driving and machine learning applications while highlighting a critical gap in data availability for requirements engineering. By providing a systematic framework for structuring automotive data, this research contributes to more efficient data management and improved decision-making in the automotive industry.
A Neuro-Fuzzy System for Interpretable Long-Term Stock Market Forecasting
In the complex landscape of multivariate time series forecasting, achieving both accuracy and interpretability remains a significant challenge. This paper introduces the Fuzzy Transformer (Fuzzformer), a novel recurrent neural network architecture combined with multi-head self-attention and fuzzy inference systems to analyze multivariate stock market data and conduct long-term time series forecasting. The method leverages LSTM networks and temporal attention to condense multivariate data into interpretable features suitable for fuzzy inference systems. The resulting architecture offers comparable forecasting performance to conventional models such as ARIMA and LSTM while providing meaningful information flow within the network. The method was examined on the real world stock market index S\&P500. Initial results show potential for interpretable forecasting and identify current performance tradeoffs, suggesting practical application in understanding and forecasting stock market behavior.
comment: Published in: ERK 2025 -- 34th International Electrotechnical and Computer Science Conference, Portoro\v{z}, Slovenia, Sept. 25--26, 2025. Proceedings published by Dru\v{s}tvo Slovenska sekcija IEEE. ISSN: 2591-0442 (online). 4 pages, 2 figures
Accurate Small-Signal Modeling of Digitally Controlled Buck Converters with ADC-PWM Synchronization
Digital control has become increasingly widespread in modern power electronic converters. When acquiring feedback signals such as the inductor current, synchronizing the analog-to-digital converter (ADC) with the digital pulse-width modulator (DPWM) is commonly employed to accurately track their steady-state average. However, the small-signal implications of such synchronization have not been investigated. This paper presents an exact small-signal model for digitally controlled buck converters operating in forced continuous-conduction mode (FCCM) under constant-frequency current-mode control, explicitly accounting for DPWM-ADC synchronization. Using a sampled-data framework, the proposed model captures all sideband effects introduced by the sampling process, yielding precise predictions of both analog and digital loop gains, even at frequencies beyond the switching and sampling frequencies. Both asymmetrical and symmetrical carrier modulations are considered. Furthermore, the digital loop gain is derived in closed form using the modified z-transform, enabling low-complexity compensator design and stability assessment. Within this framework, the analog loop gain can be directly obtained from the digital loop gain, thereby eliminating the need for computationally intensive infinite series evaluations. The validity of the proposed model is confirmed through both simulation and experimental results.
Non-submodular Visual Attention for Robot Navigation
This paper presents a task-oriented computational framework to enhance Visual-Inertial Navigation (VIN) in robots, addressing challenges such as limited time and energy resources. The framework strategically selects visual features using a Mean Squared Error (MSE)-based, non-submodular objective function and a simplified dynamic anticipation model. To address the NP-hardness of this problem, we introduce four polynomial-time approximation algorithms: a classic greedy method with constant-factor guarantees; a low-rank greedy variant that significantly reduces computational complexity; a randomized greedy sampler that balances efficiency and solution quality; and a linearization-based selector based on a first-order Taylor expansion for near-constant-time execution. We establish rigorous performance bounds by leveraging submodularity ratios, curvature, and element-wise curvature analyses. Extensive experiments on both standardized benchmarks and a custom control-aware platform validate our theoretical results, demonstrating that these methods achieve strong approximation guarantees while enabling real-time deployment.
comment: 22 pages; Accepted to appear in IEEE Transactions on Robotics (T-RO)
Enhancing Urban VANETs Stability: A Single-Hop Clustering Strategy in Metropolitan Environments
Vehicular Ad-hoc Networks (VANETs), a subclass of Mobile Ad-hoc Networks (MANETs), are expected to play a crucial role in the future of intelligent transportation systems (ITSs). A key objective of VANETs is to enable efficient and cost-effective communication among vehicles while supporting a large number of network participants and minimizing infrastructure dependency. However, the highly dynamic nature of vehicular networks poses significant challenges to their deployment. Clustering techniques are employed to address these challenges, with a strong emphasis on stability, as they directly influence the routing process and enhance the quality of service (QoS). This paper explores the feasibility of reducing reliance on roadside units (RSUs) in metropolitan areas while improving cluster stability. We propose an efficient clustering algorithm tailored for urban environments, leveraging existing metropolitan infrastructure to compensate for the absence of RSUs. Our approach designates public transportation buses as primary cluster heads (CHs), minimizing reliance on additional infrastructure, while stand-alone vehicles (SAVs) dynamically select additional CHs. Through comprehensive case studies and comparative analysis with existing algorithms, our results demonstrate the superior performance of the proposed method across different transmission ranges (TRs).
comment: 10 pages, 6 figures, 5 tables, Journal
Product-oriented Product-Process-Resource Asset Network and its Representation in AutomationML for Asset Administration Shell
Current products, especially in the automotive sector, pose complex technical systems having a multi-disciplinary mechatronic nature. Industrial standards supporting system engineering and production typically (i) address the production phase only, but do not cover the complete product life cycle, and (ii) focus on production processes and resources rather than the products themselves. The presented approach is motivated by incorporating impacts of end-of-life phase of the product life cycle into the engineering phase. This paper proposes a modelling approach coming up from the Product-Process-Resource (PPR) modeling paradigm. It combines requirements on (i) respecting the product structure as a basis for the model, and (ii) it incorporates repairing, remanufacturing, or upcycling within cyber-physical production systems. The proposed model called PoPAN should accompany the product during the entire life cycle as a digital shadow encapsulated within the Asset Administration Shell of a product. To facilitate the adoption of the proposed paradigm, the paper also proposes serialization of the model in the AutomationML data format. The model is demonstrated on a use-case for disassembling electric vehicle batteries to support their remanufacturing for stationary battery applications.
comment: This work has been submitted to the IEEE for possible publication. 8 pages, 6 figures
TubeDAgger: Reducing the Number of Expert Interventions with Stochastic Reach-Tubes
Interactive Imitation Learning deals with training a novice policy from expert demonstrations in an online fashion. The established DAgger algorithm trains a robust novice policy by alternating between interacting with the environment and retraining of the network. Many variants thereof exist, that differ in the method of discerning whether to allow the novice to act or return control to the expert. We propose the use of stochastic reachtubes - common in verification of dynamical systems - as a novel method for estimating the necessity of expert intervention. Our approach does not require fine-tuning of decision thresholds per environment and effectively reduces the number of expert interventions, especially when compared with related approaches that make use of a doubt classification model.
Uncertainty-Aware Flexibility of Buildings: From Quantification to Provision
Buildings represent a promising flexibility source to support the integration of renewable energy sources, as they may shift their heating energy consumption over time without impacting users' comfort. However, a building's predicted flexibility potential is based on uncertain ambient weather forecasts and a typically inaccurate building thermal model. Hence, this paper presents an uncertainty-aware flexibility quantifier using a chance-constrained formulation. Because such a quantifier may be conservative, we additionally model real-time feedback in the quantification, in the form of affine feedback policies. Such adaptation can take the form of intra-day trades or rebound around the flexibility provision period. To assess the flexibility quantification formulations, we further assume that flexible buildings participate in secondary frequency control markets. The results show some increase in flexibility and revenues when introducing affine feedback policies. Additionally, it is demonstrated that accounting for uncertainties in the flexibility quantification is necessary, especially when intra-day trades are not available. Even though an uncertainty-ignorant potential may seem financially profitable in secondary frequency control markets, it comes at the cost of significant thermal discomfort for inhabitants. Hence, we suggest a comfort-preserving approach, aiming to truly reflect thermal discomfort on the economic flexibility revenue, to obtain a fairer comparison.
comment: Accepted in IEEE Transactions on Smart Grids
Global convergence of Oja's component flow for general square matrices and its applications
This paper establishes the global convergence properties of the Oja flow, a continuous-time algorithm for principal component extraction, for general square matrices. The Oja flow is a matrix differential equation on the Stiefel manifold designed to extract a dominant subspace. While its analysis has traditionally been restricted to symmetric positive-definite matrices, where it acts as a gradient flow, recent applications have extended its use to general matrices. In this non-symmetric case, the flow extracts the invariant subspace corresponding to the eigenvalues with the largest real parts. However, prior convergence results have been purely local, leaving the global behavior as an open problem. This paper fills this gap by providing a comprehensive global convergence analysis, establishing that the flow converges exponentially for almost all initial conditions. We also propose a modification to the algorithm that enhances its numerical stability. As an application of this theory, we develop novel methods for the model reduction of linear dynamical systems and the synthesis of low-rank stabilizing controllers.
comment: 15 pages, 6 figures
Shared Object Manipulation with a Team of Collaborative Quadrupeds
Utilizing teams of multiple robots is advantageous for handling bulky objects. Many related works focus on multi-manipulator systems, which are limited by workspace constraints. In this paper, we extend a classical hybrid motion-force controller to a team of legged manipulator systems, enabling collaborative loco-manipulation of rigid objects with a force-closed grasp. Our novel approach allows the robots to flexibly coordinate their movements, achieving efficient and stable object co-manipulation and transport, validated through extensive simulations and real-world experiments.
comment: 8 pages, 9 figures, submitted to The 2026 American Control Conference
Formation Control via Rotation Symmetry Constraints
We present a distributed formation control strategy for multi-agent systems based only on rotation symmetry constraints. We propose a potential function that enforces inter-agent \textbf{rotational} symmetries, with its gradient defining the control law driving the agents toward a desired symmetric and planar configuration. We show that only $(n-1)$ edges, the minimal connectivity requirement, are sufficient to implement the control strategy, where $n$ is the number of agents. We further augment the design to address the \textbf{maneuvering problem}, enabling the formation to undergo coordinated translations, rotations, and scalings along a predefined virtual trajectory. Numerical simulations demonstrate the effectiveness and flexibility of the proposed method.
Multi-Agent Stage-wise Conservative Linear Bandits
In many real-world applications such as recommendation systems, multiple learning agents must balance exploration and exploitation while maintaining safety guarantees to avoid catastrophic failures. We study the stochastic linear bandit problem in a multi-agent networked setting where agents must satisfy stage-wise conservative constraints. A network of $N$ agents collaboratively maximizes cumulative reward while ensuring that the expected reward at every round is no less than $(1-\alpha)$ times that of a baseline policy. Each agent observes local rewards with unknown parameters, but the network optimizes for the global parameter (average of local parameters). Agents communicate only with immediate neighbors, and each communication round incurs additional regret. We propose MA-SCLUCB (Multi-Agent Stage-wise Conservative Linear UCB), an episodic algorithm alternating between action selection and consensus-building phases. We prove that MA-SCLUCB achieves regret $\tilde{O}\left(\frac{d}{\sqrt{N}}\sqrt{T}\cdot\frac{\log(NT)}{\sqrt{\log(1/|\lambda_2|)}}\right)$ with high probability, where $d$ is the dimension, $T$ is the horizon, and $|\lambda_2|$ is the network's second largest eigenvalue magnitude. Our analysis shows: (i) collaboration yields $\frac{1}{\sqrt{N}}$ improvement despite local communication, (ii) communication overhead grows only logarithmically for well-connected networks, and (iii) stage-wise safety adds only lower-order regret. Thus, distributed learning with safety guarantees achieves near-optimal performance in reasonably connected networks.
Development and Field Validation of a Fully Customised Vehicle Scanning System on Two Full-Scale Bridges
Ensuring the structural integrity of bridges is essential for maintaining infrastructure safety and promoting long-term sustainability. In this context, Indirect Structural Health Monitoring (ISHM) through drive-by bridge inspection emerges as a promising alternative to traditional inspection methods, offering a cost-effective and scalable solution by using vehicle-mounted sensors to assess the condition of bridges without requiring direct instrumentation. This study introduces the first purpose-built electric inspection vehicle specifically designed for drive-by bridge inspection. The autonomous platform is capable of maintaining a constant low speed and offers customisable operational parameters to maximise the accuracy and repeatability of indirect sensing, capabilities not achieved in previous studies. The vehicle is deployed within an ISHM framework and tested on two full-scale bridges to evaluate its effectiveness in capturing structural dynamic responses. Two unsupervised frameworks are then employed to analyse the collected data to identify features indicative of bridge properties and structural condition. The promising findings from this study demonstrate the practical feasibility of the approach. The study also shows the potential of ISHM as a viable tool for efficient bridge monitoring, contributing to the development of next-generation structural health monitoring systems that can enhance safety, optimise maintenance strategies, and support the longevity of critical infrastructure.
Annealed Ensemble Kalman Inversion for Constrained Nonlinear Model Predictive Control: An ADMM Approach
This work proposes a novel Alternating Direction Method of Multipliers (ADMM)-based Ensemble Kalman Inversion (EKI) algorithm for solving constrained nonlinear model predictive control (NMPC) problems. First, the stage-wise nonlinear inequality constraints in the NMPC problem are embedded via an augmented Lagrangian with nonnegative slack variables. We then show that the unconstrained augmented Lagrangian formulation of the NMPC admits a Bayesian interpretation: under a Gaussian observation model, its minimizers coincide with MAP estimators, enabling solution via EKI. However, since the nonnegativity constraint on the slacks cannot be enforced via Gaussian noise, our proposed algorithm results in a two-block ADMM that alternates between (i) a primal step that minimizes the unconstrained augmented Lagrangian, (ii) a nonnegativity projection for the slacks, and (iii) a dual ascent step. To balance exploration and convergence, an annealing schedule tempers covariances and penalty weights, thereby encouraging global search early and precise constraint satisfaction later. To demonstrate the performance of the proposed method, we compare it with another iterative sampling-based approach based on Model Predictive Path Integral (MPPI) control, called DIAL-MPC.
An Interpolation-based Scheme for Rapid Frequency-Domain System Identification
We present a frequency-domain system identification scheme based on barycentric interpolation and weight optimization. The scheme is related to the Adaptive Antoulas-Anderson (AAA) algorithm for model reduction, but uses an adaptive algorithm for selection of frequency points for interrogating the system response, as would be required in identification versus model reduction. The scheme is particularly suited for systems in which any one sinusoidal response run is long or expensive, and thus there is an incentive to reduce the total number of such runs. Two key features of our algorithm are the use of transient data in sinusoidal runs to both optimize the barycentric weights, and automated next-frequency selection on an adaptive grid. Both are done with error criteria that are proxies for a system's $H^2$ and $H^\infty$ norms respectively. Furthermore, the optimization problem we formulate is convex, and can optionally guarantee stability of the identified system. Computational results on a high-order, lightly damped structural system highlights the efficacy of this scheme.
comment: 7 pages, 5 figures Submitted to IEEE American Control Conference 2026
Wireless Laser Power Transfer for Low-altitude Uncrewed Aerial Vehicle-assisted Internet of Things: Paradigms, Challenges, and Solutions
Low-altitude uncrewed aerial vehicles (UAVs) have become integral enablers for the Internet of Things (IoT) by offering enhanced coverage, improved connectivity and access to remote areas. A critical challenge limiting their operational capacity lies in the energy constraints of both aerial platforms and ground-based sensors. This paper explores WLPT as a transformative solution for sustainable energy provisioning in UAV-assisted IoT networks. We first systematically investigate the fundamental principles of WLPT and analysis the comparative advantages. Then, we introduce three operational paradigms for system integration, identify key challenges, and discuss corresponding potential solutions. In case study, we propose a multi-agent reinforcement learning framework to address the coordination and optimization challenges in WLPT-enabled UAV-assisted IoT data collection. Simulation results demonstrate that our framework significantly improves energy sustainability and data freshness. Finally, we discuss some future directions.
comment: This paper has been submitted to IEEE Internet of Things Magazine
Four-Port Probe Stations and SOLR Calibration Standard Design up to 125 GHz on 28 nm CMOS
This paper presents two innovative four-port probe stations developed by FormFactor Incorporated (FFI) and MPI Corporation (MPI), and a four-port calibration standard design up to 125 GHz for the probe stations. True four-port probing at mmWave and beyond does not yet exist, but is anticipated for future multi-band wireless devices using several antennas and RF chains. The four-port probe stations are housed in the THz measurement facility at NYU and allow simultaneous probing from East, West, North, and South orientations, which presents challenges for calibration. An on-chip Short-Open-Load-Reciprocal (SOLR) calibration (cal) standard is designed leveraging UMC's 28 nm CMOS process. S/O/L standard S-parameters are extracted using a virtual multiline Thru-Reflect-Line (mTRL) cal and used to validate SOLR cal performance via simulations up to 125 GHz. The novel probing solutions from MPI and FFI, along with the SOLR cal, open up considerable opportunities for precise RF characterization across wide frequency ranges.
comment: 3 pages, 5 figures, Asia-Pacific Microwave Conference 2025
Modeling and Mixed-Integer Nonlinear MPC of Positive-Negative Pressure Pneumatic Systems
Positive-negative pressure regulation is critical to soft robotic actuators, enabling large motion ranges and versatile actuation modes. However, it remains challenging due to complex nonlinearities, oscillations, and direction-dependent, piecewise dynamics introduced by affordable pneumatic valves and the bidirectional architecture. We present a model-based control framework that couples a physics-grounded switched nonlinear plant model (inflation/deflation modes) with a mixed-integer nonlinear model predictive controller (MI-NMPC). The controller co-optimizes mode scheduling and PWM inputs to realize accurate reference tracking while enforcing input constraints and penalizing energy consumption and excessive switching. To make discrete mode decisions tractable, we employ a Combinatorial Integral Approximation that relaxes binary mode variables to continuous surrogates within the valve-scheduling layer. With parameters identified from the physical system, simulations with step and sinusoidal references validate the proposed MI-NMPC, showing a consistently favorable trade-off among accuracy, control effort, and switching, and outperforming conventional PID and NMPC with heuristic mode selection.
comment: Has been submitted to conference
MM-LMPC: Multi-Modal Learning Model Predictive Control via Bandit-Based Mode Selection
Learning Model Predictive Control (LMPC) improves performance on iterative tasks by leveraging data from previous executions. At each iteration, LMPC constructs a sampled safe set from past trajectories and uses it as a terminal constraint, with a terminal cost given by the corresponding cost-to-go. While effective, LMPC heavily depends on the initial trajectories: states with high cost-to-go are rarely selected as terminal candidates in later iterations, leaving parts of the state space unexplored and potentially missing better solutions. For example, in a reach-avoid task with two possible routes, LMPC may keep refining the initially shorter path while neglecting the alternative path that could lead to a globally better solution. To overcome this limitation, we propose Multi-Modal LMPC (MM-LMPC), which clusters past trajectories into modes and maintains mode-specific terminal sets and value functions. A bandit-based meta-controller with a Lower Confidence Bound (LCB) policy balances exploration and exploitation across modes, enabling systematic refinement of all modes. This allows MM-LMPC to escape high-cost local optima and discover globally superior solutions. We establish recursive feasibility, closed-loop stability, asymptotic convergence to the best mode, and a logarithmic regret bound. Simulations on obstacle-avoidance tasks validate the performance improvements of the proposed method.
comment: This paper is submitted to 2026 American Control Conference (ACC)
Learning Passive Continuous-Time Dynamics with Multistep Port-Hamiltonian Gaussian Processes
We propose the multistep port-Hamiltonian Gaussian process (MS-PHS GP) to learn physically consistent continuous-time dynamics and a posterior over the Hamiltonian from noisy, irregularly-sampled trajectories. By placing a GP prior on the Hamiltonian surface $H$ and encoding variable-step multistep integrator constraints as finite linear functionals, MS-PHS GP enables closed-form conditioning of both the vector field and the Hamiltonian surface without latent states, while enforcing energy balance and passivity by design. We state a finite-sample vector-field bound that separates the estimation and variable-step discretization terms. Lastly, we demonstrate improved vector-field recovery and well-calibrated Hamiltonian uncertainty on mass-spring, Van der Pol, and Duffing benchmarks.
Combining Large Language Models and Gradient-Free Optimization for Automatic Control Policy Synthesis
Large Language models (LLMs) have shown promise as generators of symbolic control policies, producing interpretable program-like representations through iterative search. However, these models are not capable of separating the functional structure of a policy from the numerical values it is parametrized by, thus making the search process slow and inefficient. We propose a hybrid approach that decouples structural synthesis from parameter optimization by introducing an additional optimization layer for local parameter search. In our method, the numerical parameters of LLM-generated programs are extracted and optimized numerically to maximize task performance. With this integration, an LLM iterates over the functional structure of programs, while a separate optimization loop is used to find a locally optimal set of parameters accompanying candidate programs. We evaluate our method on a set of control tasks, showing that it achieves higher returns and improved sample efficiency compared to purely LLM-guided search. We show that combining symbolic program synthesis with numerical optimization yields interpretable yet high-performing policies, bridging the gap between language-model-guided design and classical control tuning. Our code is available at https://sites.google.com/berkeley.edu/colmo.
comment: 8 pages, 7 figures
Machine Learning Detection of Lithium Plating in Lithium-ion Cells: A Gaussian Process Approach
Lithium plating during fast charging is a critical degradation mechanism that accelerates capacity fade and can trigger catastrophic safety failures. Recent work has identified a distinctive dQ/dV peak above 4.0 V as a reliable signature of plating onset; however, conventional methods for computing dQ/dV rely on finite differencing with filtering, which amplifies sensor noise and introduces bias in peak location. In this paper, we propose a Gaussian Process (GP) framework for lithium plating detection by directly modeling the charge-voltage relationship Q(V) as a stochastic process with calibrated uncertainty. Leveraging the property that derivatives of GPs remain GPs, we infer dQ/dV analytically and probabilistically from the posterior, enabling robust detection without ad hoc smoothing. The framework provides three key benefits: (i) noise-aware inference with hyperparameters learned from data, (ii) closed-form derivatives with credible intervals for uncertainty quantification, and (iii) scalability to online variants suitable for embedded BMS. Experimental validation on Li-ion coin cells across a range of C-rates (0.2C-1C) and temperatures (0-40\deg C) demonstrates that the GP-based method reliably detects plating peaks under low-temperature, high-rate charging, while correctly reporting no peaks in baseline cases. The concurrence of GP-identified differential peaks, reduced charge throughput, and capacity fade measured via reference performance tests confirms the method's accuracy and robustness, establishing a practical pathway for real-time lithium plating detection.
comment: Submitted to American Control Conference 2026 - ACC 2026
Event-Based Control via Sparsity-Promoting Regularization: A Rollout Approach with Performance Guarantees
This paper presents a controller design framework aiming to balance control performance and actuation rate. Control performance is evaluated by an infinite-horizon average cost, and the number of control actions is penalized via sparsity-promoting regularization. Since the formulated optimal control problem has a combinatorial nature, we employ a rollout algorithm to obtain a tractable suboptimal solution. In the proposed scheme, actuation timings are determined through a multistage minimization procedure based on a receding-horizon approach, and the corresponding control inputs are computed online. We establish theoretical performance guarantees with respect to periodic control and prove the stability of the closed-loop system. The effectiveness of the proposed method is demonstrated through a numerical example.
comment: 14 pages
Multi-Agent Guided Policy Search for Non-Cooperative Dynamic Games
Multi-agent reinforcement learning (MARL) optimizes strategic interactions in non-cooperative dynamic games, where agents have misaligned objectives. However, data-driven methods such as multi-agent policy gradients (MA-PG) often suffer from instability and limit-cycle behaviors. Prior stabilization techniques typically rely on entropy-based exploration, which slows learning and increases variance. We propose a model-based approach that incorporates approximate priors into the reward function as regularization. In linear quadratic (LQ) games, we prove that such priors stabilize policy gradients and guarantee local exponential convergence to an approximate Nash equilibrium. We then extend this idea to infinite-horizon nonlinear games by introducing Multi-agent Guided Policy Search (MA-GPS), which constructs short-horizon local LQ approximations from trajectories of current policies to guide training. Experiments on nonlinear vehicle platooning and a six-player strategic basketball formation show that MA-GPS achieves faster convergence and more stable learning than existing MARL methods.
comment: We fix a typo in equation (7), where we accidentally typed an extra symbol \nabla before w(\pi)
Almost Sure Convergence of Networked Policy Gradient over Time-Varying Networks in Markov Potential Games
We propose networked policy gradient play for solving Markov potential games with continuous and/or discrete state-action pairs. During the game, agents use parametrized and differentiable policies that depend on the current state and the policy parameters of other agents. During training, agents update their policy parameters following stochastic gradients. The gradient estimation involves two consecutive episodes, generating unbiased estimators of reward and policy score functions. In addition, it involves keeping estimates of others' parameters using consensus steps given local estimates received through a time-varying communication network. In Markov potential games, there exists a potential value function among agents with gradients corresponding to the gradients of local value functions. Using this structure, we prove almost sure convergence to a stationary point of the potential value function with rate $O(1/\epsilon^2)$. Compared to previous works, our results do not require bounded policy gradients or initial agreement on the values of individual policy parameters. Numerical experiments on a dynamic multi-agent newsvendor problem verify the convergence of local beliefs and gradients. It further shows that networked policy gradient play converges as fast as independent policy gradient updates, while collecting higher rewards.
comment: 15 pages, extended journal version
Vulnerability-Based Optimal Grid Defense Strategies for Enhancing Cyber-Physical Energy System Resilience
An approach is proposed to identify optimal asset protection strategies based on vulnerability assessment outcomes. Traditional bilevel attacker-defender models emphasize worst-case scenarios but offer limited defensive guidance. In contrast, trilevel models introduce high computational complexity and rely on fixed network configurations. The proposed critical-components method leverages vulnerability assessment results to determine protection strategies, effectively outsourcing the upper-level defense decision. This enables adaptability to diverse network topologies, assessment techniques, and cyber-physical energy systems without the overhead of multi-level optimization. Case studies demonstrate the potential for improved system resilience across varying operational conditions.
comment: Accepted at UPEC 2025. Camera-ready Accepted Author Manuscript (AAM). Version of Record will be available on IEEE Xplore (DOI to be added)
On the Application of Model Predictive Control to a Weighted Coverage Path Planning Problem
This paper considers the application of Model Predictive Control (MPC) to a weighted coverage path planning (WCPP) problem. The problem appears in a wide range of practical applications, including search and rescue (SAR) missions. The basic setup is that one (or multiple) agents can move around a given search space and collect rewards from a given spatial distribution. Unlike an artificial potential field, each reward can only be collected once. In contrast to a Traveling Salesman Problem (TSP), the agent moves in a continuous space. Moreover, he is not obliged to cover all locations and/or may return to previously visited locations. The WCPP problem is tackled by a new Model Predictive Control (MPC) formulation with so-called Coverage Constraints (CCs). It is shown that the solution becomes more effective if the solver is initialized with a TSP-based heuristic. With and without this initialization, the proposed MPC approach clearly outperforms a naive MPC formulation, as demonstrated in a small simulation study.
TaxSolver: A methodology to realize optimal income tax reform
Across the globe there are growing calls to streamline and improve ever more complex income tax codes. Executing reform has proven difficult. Even when the desired outcomes are clear, the tools to design fitting reforms are lacking. To remedy this, we developed \texttt{TaxSolver}: a methodology to help policymakers realize optimal tax reform. \texttt{TaxSolver} allows policymakers to focus solely on what they aim to achieve with a reform -- like redistributing wealth, incentivizing labor market participation or reducing complexity -- and the guarantees within which reform is acceptable -- like limited fluctuations in taxpayer incomes or shocks to overall tax revenue. Given these goals and fiscal guarantees, \texttt{TaxSolver} finds the optimal set of tax rules that satisfies all the criteria or shows that the set of demands are not mathematically feasible. We illustrate \texttt{TaxSolver} by reforming various simulated examples of tax codes, including some that reflect the complexity and size of a real-world tax system.
comment: 47 pages, 14 figures, 4 tables
Decentralized Online Regularized Learning Over Random Time-Varying Graphs
We study the decentralized online regularized linear regression algorithm over random time-varying graphs. At each time step, every node runs an online estimation algorithm consisting of an innovation term processing its own new measurement, a consensus term taking a weighted sum of estimations of its own and its neighbors with additive and multiplicative communication noises and a regularization term preventing over-fitting. It is not required that the regression matrices and graphs satisfy special statistical assumptions such as mutual independence, spatio-temporal independence or stationarity. We develop the nonnegative supermartingale inequality of the estimation error, and prove that the estimations of all nodes converge to the unknown true parameter vector almost surely if the algorithm gains, graphs and regression matrices jointly satisfy the sample path spatio-temporal persistence of excitation condition. Especially, this condition holds by choosing appropriate algorithm gains if the graphs are uniformly conditionally jointly connected and conditionally balanced, and the regression models of all nodes are uniformly conditionally spatio-temporally jointly observable, under which the algorithm converges in mean square and almost surely. In addition, we prove that the regret upper bound is $O(T^{1-\tau}\ln T)$, where $\tau\in (0.5,1)$ is a constant depending on the algorithm gains.
Heterogeneous Predictor-based Risk-Aware Planning with Conformal Prediction in Dense, Uncertain Environments
Real-time planning among many uncertain, dynamic obstacles is challenging because predicting every agent with high fidelity is both unnecessary and computationally expensive. We present Heterogeneous Predictor-based Risk-Aware Planning (H-PRAP), a framework that allocates prediction effort to where it matters. H-PRAP introduces the Probability-based Collision Risk Index (P-CRI), a closed-form, horizon-level collision index obtained by calibrating a Gaussian surrogate with conformal prediction. P-CRI drives a router that assigns high-risk obstacles to accurate but expensive predictors and low-risk obstacles to lightweight predictors, while preserving distribution-free coverage across heterogeneous predictors through conformal prediction. The selected predictions and their conformal radii are embedded in a chance-constrained model predictive control (MPC) problem, yielding receding-horizon policies with explicit safety margins. We analyze the safety-efficiency trade-off under prediction compute budget: more portion of low-fidelity predictions reduce residual risk from dropped obstacles, but in the same time induces larger conformal radii and degrades trajectory efficiency and shrinks MPC feasibility. Extensive numerical simulations in dense, uncertain environments validate that H-PRAP attains best balance between trajectory success rate (i.e., no collisions) and the time to reach the goal (i.e., trajectory efficiency) compared to single prediction architectures.
Graphon Particle Systems, Part II: Dynamics of Distributed Stochastic Continuum Optimization
We study the distributed optimization problem over a graphon with a continuum of nodes, which is regarded as the limit of the distributed networked optimization as the number of nodes goes to infinity. Each node has a private local cost function. The global cost function, which all nodes cooperatively minimize, is the integral of the local cost functions on the node set. We propose stochastic gradient descent and gradient tracking algorithms over the graphon. We establish a general lemma for the upper bound estimation related to a class of time-varying differential inequalities with negative linear terms, based upon which, we prove that for both kinds of algorithms, the second moments of the nodes' states are uniformly bounded. Especially, for the stochastic gradient tracking algorithm, we transform the convergence analysis into the asymptotic property of coupled nonlinear differential inequalities with time-varying coefficients and develop a decoupling method. For both kinds of algorithms, we show that by choosing the time-varying algorithm gains properly, all nodes' states achieve $\mathcal{L}^{\infty}$-consensus for a connected graphon. Furthermore, if the local cost functions are strongly convex, then all nodes' states converge to the minimizer of the global cost function and the auxiliary states in the stochastic gradient tracking algorithm converge to the gradient value of the global cost function at the minimizer uniformly in mean square.
Graphon Particle Systems, Part I: Spatio-Temporal Approximation and Law of Large Numbers
We study a class of graphon particle systems with time-varying random coefficients. In a graphon particle system, the interactions among particles are characterized by the coupled mean field terms through an underlying graphon and the randomness of the coefficients comes from exogenous stochastic processes. By constructing two-level approximated sequences converging in 2-Wasserstein distance, we prove the existence and uniqueness of the solution to the system. Besides, by constructing two-level approximated functions converging to the graphon mean field terms, we establish the law of large numbers, which reveals that if the number of particles tends to infinity and the discretization step tends to zero, then the discrete-time interacting particle system over a large-scale network converges to the graphon particle system. As a byproduct, we discover that the graphon particle system can describe the limiting dynamics of the distributed stochastic gradient descent algorithm over the large-scale network and prove that if the gradients of the local cost functions are Lipschitz continuous, then the graphon particle system can be regarded as the spatio-temporal approximation of the discrete-time distributed stochastic gradient descent algorithm as the number of network nodes tends to infinity and the algorithm step size tends to zero.
Transient Stability Analysis of a Hybrid Grid-Forming and Grid-Following RES System Considering Multi-Mode Control Switching
The inherent control switching of renewable energy sources (RESs) during intricate transient processes introduces complexity to the dynamic behavior of modern power systems. This paper reveals the dynamic coupling between grid-forming (GFM)/grid-following (GFL)-based RES and dominant instability modes of the hybrid system. First, six control combinations are systematically investigated by pairing the two GFM-RES modes, normal control (NC) and current saturation (CS), with the three GFL-RES modes: normal control, low voltage ride-through (LVRT), and high voltage ride-through (HVRT). Based on switching system theory, the coupled power flow and dynamic motion models are developed considering multi-mode switching characteristics. It is revealed that the hybrid system exhibits two distinct instability modes when the GFM-RES and GFL-RES exceed their P-f and V-f desynchronization boundaries, respectively. The two-dimensional spatiotemporal damping characteristics of GFL-RES induced by GFM-RES are also uncovered for the first time. A novel criterion is proposed to quantify the impact of GFM-RES on GFL-RES dynamics, capturing both its stabilizing and destabilizing effects under different control combinations. High-fidelity electromagnetic transient simulations validate the correctness of the analysis framework.
Learning linear dynamical systems under convex constraints
We consider the problem of finite-time identification of linear dynamical systems from $T$ samples of a single trajectory. Recent results have predominantly focused on the setup where either no structural assumption is made on the system matrix $A^* \in \mathbb{R}^{n \times n}$, or specific structural assumptions (e.g. sparsity) are made on $A^*$. We assume prior structural information on $A^*$ is available, which can be captured in the form of a convex set $\mathcal{K}$ containing $A^*$. For the solution of the ensuing constrained least squares estimator, we derive non-asymptotic error bounds in the Frobenius norm that depend on the local size of $\mathcal{K}$ at $A^*$. To illustrate the usefulness of these results, we instantiate them for four examples, namely when (i) $A^*$ is sparse and $\mathcal{K}$ is a suitably scaled $\ell_1$ ball; (ii) $\mathcal{K}$ is a subspace; (iii) $\mathcal{K}$ consists of matrices each of which is formed by sampling a bivariate convex function on a uniform $n \times n$ grid (convex regression); (iv) $\mathcal{K}$ consists of matrices each row of which is formed by uniform sampling (with step size $1/T$) of a univariate Lipschitz function. In all these situations, we show that $A^*$ can be reliably estimated for values of $T$ much smaller than what is needed for the unconstrained setting.
comment: 29 pages; corrected minor typos throughout; added additional explanations at certain places; statements of main results unchanged
Diffusion Model-based Parameter Estimation in Dynamic Power Systems
Parameter estimation, which represents a classical inverse problem, is often ill-posed as different parameter combinations can yield identical outputs. This non-uniqueness poses a critical barrier to accurate and unique identification. This work introduces a novel parameter estimation framework to address such limits: the Joint Conditional Diffusion Model-based Inverse Problem Solver (JCDI). By leveraging the stochasticity of diffusion models, JCDI produces possible solutions revealing underlying distributions. Joint conditioning on multiple observations further narrows the posterior distributions of non-identifiable parameters. For the challenging task in dynamic power systems: composite load model parameterization, JCDI achieves a 58.6% reduction in parameter estimation error compared to the single-condition model. It also accurately replicates system's dynamic responses under various electrical faults, with root mean square errors below 4*10^(-3), outperforming existing deep-reinforcement-learning and supervised learning approaches. Given its data-driven nature, JCDI provides a universal framework for parameter estimation while effectively mitigating the non-uniqueness challenge across scientific domains.
Safe Event-triggered Gaussian Process Learning for Barrier-Constrained Control
While control barrier functions (CBFs) are employed in addressing safety, control synthesis methods based on them generally rely on accurate system dynamics. This is a critical limitation, since the dynamics of complex systems are often not fully known. Supervised machine learning techniques hold great promise for alleviating this weakness by inferring models from data. We propose a novel \revision{approach for safe event-triggered learning of Gaussian process models in CBF-based continuous-time control for unknown control-affine systems. By applying a finite excitation at triggering times, our approach ensures a sufficient information gain to maintain the feasibility of the CBF-based safety condition with high probability. Our approach probabilistically guarantees safety based on a suitable GP prior and rules out} Zeno behavior in the triggering scheme. The effectiveness of the proposed approach and theory is demonstrated in simulations.
comment: The first two authors contributed equally to the work
Joint Price and Power MPC for Peak Power Reduction at Workplace EV Charging Stations
Demand charge often constitutes a significant portion of electricity costs for commercial electric vehicle (EV) charging station operators. This paper explores control methods to reduce peak power consumption at workplace EV charging stations in a joint price and power optimization framework. We optimize a menu of price options to incentivize users to select controllable charging service. Using this framework, we propose a model predictive control approach to reduce both demand charge and overall operator costs. Through a Monte Carlo simulation, we find that our algorithm outperforms a state-of-the-art benchmark optimization strategy and can significantly reduce station operator costs.
An Iterative Bayesian Approach for System Identification based on Linear Gaussian Models
We tackle the problem of system identification, where we select inputs, observe the corresponding outputs from the true system, and optimize the parameters of our model to best fit the data. We propose a practical and computationally tractable methodology that is compatible with any system and parametric family of models. Our approach only requires input-output data from the system and first-order information of the model with respect to the parameters. Our approach consists of two modules. First, we formulate the problem of system identification from a Bayesian perspective and use a linear Gaussian model approximation to iteratively optimize the model's parameters. In each iteration, we propose to use the input-output data to tune the covariance of the linear Gaussian model. This online covariance calibration stabilizes fitting and signals model inaccuracy. Secondly, we define a Gaussian-based uncertainty measure for the model parameters, which we can then minimize with respect to the next selected input. We test our method with linear and nonlinear dynamics.
comment: Submitted to the 2026 American Control Conference
Stochastically Structured Reservoir Computers for Financial and Economic System Identification
This paper introduces a methodology for identifying and simulating financial and economic systems using stochastically structured reservoir computers (SSRCs). The framework combines structure-preserving embeddings with graph-informed coupling matrices to model inter-agent dynamics while enhancing interpretability. A constrained optimization scheme guarantees compliance with both stochastic and structural constraints. Two empirical case studies, a nonlinear stochastic dynamic model and regional inflation network dynamics, demonstrate the effectiveness of the approach in capturing complex nonlinear patterns and enabling interpretable predictive analysis under uncertainty.
On the Gaussian Limit of the Output of IIR Filters
We study the asymptotic distribution of the output of a stable Linear Time-Invariant (LTI) system driven by a non-Gaussian stochastic input. Motivated by longstanding heuristics in the stochastic describing function method, we rigorously characterize when the output process becomes approximately Gaussian, even when the input is not. Using the Wasserstein-1 distance as a quantitative measure of non-Gaussianity, we derive upper bounds on the distance between the appropriately scaled output and a standard normal distribution. These bounds are obtained via Stein's method and depend explicitly on the system's impulse response and the dependence structure of the input process. We show that when the dominant pole of the system approaches the edge of stability and the input satisfies one of the following conditions: (i) independence, (ii) positive correlation with a real and positive dominant pole, or (iii) sufficient correlation decay, the output converges to a standard normal distribution at rate $O(1/\sqrt{t})$. We also present counterexamples where convergence fails, thereby motivating the stated assumptions. Our results provide a rigorous foundation for the widespread observation that outputs of low-pass LTI systems tend to be approximately Gaussian.
comment: 8 pages, 1 figure, accepted for publication IEEE Conference on Decision and Control 2025
On the Standard Performance Criteria for Applied Control Design: PID, MPC or Machine Learning Controller?
The traditional control theory and its application to basic and complex systems have reached an advanced level of maturity. This includes aerial, marine, and ground vehicles, as well as robotics, chemical, transportation, and electrical systems widely used in our daily lives. The emerging era of data-driven methods, Large Language Models (LLMs), and AI-based controllers does not indicate a weakness in well-established control theory. Instead, it aims to reduce dependence on models and uncertainties, address increasingly complex systems, and potentially achieve decision-making capabilities comparable to human-level performance. This revolution integrates knowledge from computer science, machine learning, biology, and classical control, producing promising algorithms that are yet to demonstrate widespread real-world applicability. Despite the maturity of control theory and the presence of various performance criteria, there is still a lack of standardised metrics for testing, evaluation, Verification and Validation ($V\&V$) of algorithms. This gap can lead to algorithms that, while optimal in certain aspects, may fall short of practical implementation, sparking debates within the literature. For a controller to succeed in real-world applications, it must satisfy three key categories of performance metrics: tracking quality, control effort (energy consumption), and robustness. This paper rather takes an applied perspective, proposing and consolidating standard performance criteria for testing and analysing control systems, intended for researchers and students. The proposed framework ensures the post-design applicability of a black-box algorithm, aligning with modern data analysis and $V\&V$ perspectives to prevent resource allocation to systems with limited impact or imprecise claims.
Reconstructing Brain Causal Dynamics for Subject and Task Fingerprints using fMRI Time-series Data
Purpose: Recently, there has been a revived interest in system neuroscience causation models, driven by their unique capability to unravel complex relationships in multi-scale brain networks. In this paper, we present a novel method that leverages causal dynamics to achieve effective fMRI-based subject and task fingerprinting. Methods: By applying an implicit-explicit discretization scheme, we develop a two-timescale linear state-space model. Through data-driven identification of its parameters, the model captures causal signatures, including directed interactions among brain regions from a spatial perspective, and disentangled fast and slow dynamic modes of brain activity from a temporal perspective. These causal signatures are then integrated with: (i) a modal decomposition and projection method for model-based subject identification, and (ii) a Graph Neural Network (GNN) framework for learning-based task classification. Furthermore, we introduce the concept of the brain reachability landscape as a novel visualization tool, which quantitatively characterizes the maximum possible activation levels of brain regions under various fMRI tasks. Results: We evaluate the proposed approach using the Human Connectome Project dataset and demonstrate its advantage over non-causality-based methods. The obtained causal signatures are visualized and demonstrate clear biological relevance with established understandings of brain function. Conclusion: We verified the feasibility and effectiveness of utilizing brain causal signatures for subject and task fingerprinting. Additionally, our work paves the way for further studies on causal fingerprints with potential applications in both healthy controls and neurodegenerative diseases.
Continuous Wrist Control on the Hannes Prosthesis: a Vision-based Shared Autonomy Framework ICRA 2025
Most control techniques for prosthetic grasping focus on dexterous fingers control, but overlook the wrist motion. This forces the user to perform compensatory movements with the elbow, shoulder and hip to adapt the wrist for grasping. We propose a computer vision-based system that leverages the collaboration between the user and an automatic system in a shared autonomy framework, to perform continuous control of the wrist degrees of freedom in a prosthetic arm, promoting a more natural approach-to-grasp motion. Our pipeline allows to seamlessly control the prosthetic wrist to follow the target object and finally orient it for grasping according to the user intent. We assess the effectiveness of each system component through quantitative analysis and finally deploy our method on the Hannes prosthetic arm. Code and videos: https://hsp-iit.github.io/hannes-wrist-control.
comment: Accepted to ICRA 2025. Project website: https://hsp-iit.github.io/hannes-wrist-control
Data-Driven Stabilization of Unknown Linear-Threshold Network Dynamics
This paper studies the data-driven control of unknown linear-threshold network dynamics to stabilize the state to a reference value. We consider two types of controllers: (i) a state feedback controller with feed-forward reference input and (ii) an augmented feedback controller with error integration. The first controller features a simpler structure and is easier to design, while the second offers improved performance in the presence of system parameter changes and disturbances. Our design strategy employs state-input datasets to construct data-based representations of the closed-loop dynamics. Since these representations involve linear threshold functions, we rewrite them as switched linear systems, and formulate the design problem as that of finding a common controller for all the resulting modes. This gives rise to a set of linear matrix inequalities (LMIs) whose solutions corresponds to the controller gain matrices. We analyze the computational complexity of solving the LMIs and propose a simplified, sufficient set of conditions that scales linearly with the system state. Simulations on two case studies involving regulation of firing rate dynamics in rodent brains and of arousal level dynamics in humans demonstrate the effectiveness of the controller designs.
comment: 14 pages
Path Integral Bottleneck: An Algorithm-Agnostic Framework of Computation and Control
Executing a control sequence requires computation. While this is a simple observation, developing a framework that relates a controller's required computation to its ability to successfully control a system (e.g. lower control cost) is challenging, especially when the controller appears on alternative compute platforms (e.g. biological neural networks). More specifically, we want a framework where, given an observed closed-loop trajectory, we can quantify the computation effort needed to produce that trajectory. To enable effective comparisons of closed-loop systems across alternative compute platforms, we present the Path Integral Bottleneck (PI-IB), a method to produce an analytical, algorithm-agnostic description of the compute-control relationship. With the PI-IB framework, we can plot tradeoffs between performance and computation effort for any given plant description and control cost function. Simulations of the cart-pole reveal fundamental control-compute tradeoffs, exposing regions where the task performance-per-compute is higher than others.
comment: 8 pages, 8 Figures, Under Review
Kapitza-Inspired Stabilization of Non-Foster Circuits via Time Modulations
With his formal analysis in 1951, the physicist Pyotr Kapitza demonstrated that an inverted pendulum with an externally vibrating base can be stable in its upper position, thus overcoming the force of gravity. Kapitza's work is an example that an originally unstable system can become stable after a minor perturbation of its properties or initial conditions is applied. Inspired by his ideas, we show how non-Foster circuits can be stabilized with the application of external \textit{electrical vibration}, i.e., time modulations. Non-Foster circuits are highly appreciated in the engineering community since their bandwidth characteristics are not limited by passive-circuits bounds. Unfortunately, non-Foster circuits are usually unstable and they must be stabilized prior to operation. Here, we focus on the study of non-Foster $L(t)C$ circuits with time-varying inductors and time-invariant negative capacitors. We find an intrinsic connection between Kapitza's inverted pendulum and non-Foster $L(t)C$ resonators. Moreover, we show how positive time-varying modulations of $L(t)>0$ can overcome and stabilize non-Foster negative capacitances $C<0$. These findings open up an alternative manner of stabilizing electric circuits with the use of time modulations, and lay the groundwork for application of, what we coin \textit{Vibrational Electromagnetics}, in more complex media.
comment: 9 pages (7 pages main text, 2 pages supplementary materials), 4 figures; a minor issue in Fig. 3(a) is corrected. The accepted version is uploaded now
Stochastic Approximation in a Markovian Framework Revisited: Lipschitz Continuity of the Poisson Equation
In this paper we revisit a fundamental technical issue within the theory of stochastic approximation (SA) in a Markovian framework, first proposed in the book by Djereveckii and Fradkov (1981), and further developed in much detail in the book by Benveniste, M{\'e}tivier, and Priouret (1990). This theory is instrumental in many application areas such as the statistical analysis of Hidden Markov Models arising in telecommunication, quantized linear stochastic systems, and more recently in active learning and reinforcement learning. The problem at hand is the verification of the existence, uniqueness and Lipschitz-continuity of the solution of a parameter-dependent Poisson equation, in an appropriate weighted sup-norm, associated with a collection of Markov chains on general state spaces. Verification of the above facts is vital in the analysis of SA processes presented in (Benveniste et al., 1990) via the ODE (ordinary differential equations) method, requiring substantial technical effort. The motivation and focus of the paper is to address this technical issue, by presenting a simple set of conditions, under which the above properties of the Poisson equation at hand can be conveniently established. The starting point of our work is an intricate result of Hairer and Mattingly (2011) proving that by tilting standard conditions of mainstream stability theory for Markov chains, the transition kernels prove to be contractions in the space of differences of probability measures in a suitable metric. To demonstrate the applicability of our results, the proposed conditions are verified for a class of queuing system with open-loop control.
Multiagent Systems
DeMuon: A Decentralized Muon for Matrix Optimization over Graphs
In this paper, we propose DeMuon, a method for decentralized matrix optimization over a given communication topology. DeMuon incorporates matrix orthogonalization via Newton-Schulz iterations-a technique inherited from its centralized predecessor, Muon-and employs gradient tracking to mitigate heterogeneity among local functions. Under heavy-tailed noise conditions and additional mild assumptions, we establish the iteration complexity of DeMuon for reaching an approximate stochastic stationary point. This complexity result matches the best-known complexity bounds of centralized algorithms in terms of dependence on the target tolerance. To the best of our knowledge, DeMuon is the first direct extension of Muon to decentralized optimization over graphs with provable complexity guarantees. We conduct preliminary numerical experiments on decentralized transformer pretraining over graphs with varying degrees of connectivity. Our numerical results demonstrate a clear margin of improvement of DeMuon over other popular decentralized algorithms across different network topologies.
Partial Resilient Leader-Follower Consensus in Time-Varying Graphs
This work studies resilient leader-follower consensus with a bounded number of adversaries. Existing approaches typically require robustness conditions of the entire network to guarantee resilient consensus. However, the behavior of such systems when these conditions are not fully met remains unexplored. To address this gap, we introduce the notion of partial leader-follower consensus, in which a subset of non-adversarial followers successfully tracks the leader's reference state despite insufficient robustness. We propose a novel distributed algorithm - the Bootstrap Percolation and Mean Subsequence Reduced (BP-MSR) algorithm - and establish sufficient conditions for individual followers to achieve consensus via the BP-MSR algorithm in arbitrary time-varying graphs. We validate our findings through simulations, demonstrating that our method guarantees partial leader-follower consensus, even when standard resilient consensus algorithms fail.
comment: 8 pages, 3 figures, Submitted to American Control Conference (ACC) 2026
Exploring Network-Knowledge Graph Duality: A Case Study in Agentic Supply Chain Risk Analysis
Large Language Models (LLMs) struggle with the complex, multi-modal, and network-native data underlying financial risk. Standard Retrieval-Augmented Generation (RAG) oversimplifies relationships, while specialist models are costly and static. We address this gap with an LLM-centric agent framework for supply chain risk analysis. Our core contribution is to exploit the inherent duality between networks and knowledge graphs (KG). We treat the supply chain network as a KG, allowing us to use structural network science principles for retrieval. A graph traverser, guided by network centrality scores, efficiently extracts the most economically salient risk paths. An agentic architecture orchestrates this graph retrieval alongside data from numerical factor tables and news streams. Crucially, it employs novel ``context shells'' -- descriptive templates that embed raw figures in natural language -- to make quantitative data fully intelligible to the LLM. This lightweight approach enables the model to generate concise, explainable, and context-rich risk narratives in real-time without costly fine-tuning or a dedicated graph database.
comment: 7 pages, 3 figures
SimCity: Multi-Agent Urban Development Simulation with Rich Interactions
Large Language Models (LLMs) open new possibilities for constructing realistic and interpretable macroeconomic simulations. We present SimCity, a multi-agent framework that leverages LLMs to model an interpretable macroeconomic system with heterogeneous agents and rich interactions. Unlike classical equilibrium models that limit heterogeneity for tractability, or traditional agent-based models (ABMs) that rely on hand-crafted decision rules, SimCity enables flexible, adaptive behavior with transparent natural-language reasoning. Within SimCity, four core agent types (households, firms, a central bank, and a government) deliberate and participate in a frictional labor market, a heterogeneous goods market, and a financial market. Furthermore, a Vision-Language Model (VLM) determines the geographic placement of new firms and renders a mapped virtual city, allowing us to study both macroeconomic regularities and urban expansion dynamics within a unified environment. To evaluate the framework, we compile a checklist of canonical macroeconomic phenomena, including price elasticity of demand, Engel's Law, Okun's Law, the Phillips Curve, and the Beveridge Curve, and show that SimCity naturally reproduces these empirical patterns while remaining robust across simulation runs.
comment: 32 pages, 8 figures
Stochastic Self-Organization in Multi-Agent Systems
Multi-agent systems (MAS) based on Large Language Models (LLMs) have the potential to solve tasks that are beyond the reach of any single LLM. However, this potential can only be realized when the collaboration mechanism between agents is optimized. Specifically, optimizing the communication structure between agents is critical for fruitful collaboration. Most existing approaches rely on fixed topologies, pretrained graph generators, optimization over edges, or employ external LLM judges, thereby adding to the complexity. In this work, we introduce a response-conditioned framework that adapts communication on-the-fly. Agents independently generate responses to the user query and assess peer contributions using an approximation of the Shapley value. A directed acyclic graph (DAG) is then constructed to regulate the propagation of the responses among agents, which ensures stable and efficient message transmission from high-contributing agents to others. This graph is dynamically updated based on the agent responses from the previous collaboration round. Since the proposed framework enables the self-organization of agents without additional supervision or training, we refer to it as SelfOrg. The SelfOrg framework goes beyond task- and query-level optimization and takes into account the stochastic nature of agent responses. Experiments with both strong and weak LLM backends demonstrate robust performance, with significant gains in the weak regime where prior methods collapse. We also theoretically show that multiple agents increase the chance of correctness and that the correct responses naturally dominate the information flow.
Shared Object Manipulation with a Team of Collaborative Quadrupeds
Utilizing teams of multiple robots is advantageous for handling bulky objects. Many related works focus on multi-manipulator systems, which are limited by workspace constraints. In this paper, we extend a classical hybrid motion-force controller to a team of legged manipulator systems, enabling collaborative loco-manipulation of rigid objects with a force-closed grasp. Our novel approach allows the robots to flexibly coordinate their movements, achieving efficient and stable object co-manipulation and transport, validated through extensive simulations and real-world experiments.
comment: 8 pages, 9 figures, submitted to The 2026 American Control Conference
The Social Laboratory: A Psychometric Framework for Multi-Agent LLM Evaluation NeurIPS 2025
As Large Language Models (LLMs) transition from static tools to autonomous agents, traditional evaluation benchmarks that measure performance on downstream tasks are becoming insufficient. These methods fail to capture the emergent social and cognitive dynamics that arise when agents communicate, persuade, and collaborate in interactive environments. To address this gap, we introduce a novel evaluation framework that uses multi-agent debate as a controlled "social laboratory" to discover and quantify these behaviors. In our framework, LLM-based agents, instantiated with distinct personas and incentives, deliberate on a wide range of challenging topics under the supervision of an LLM moderator. Our analysis, enabled by a new suite of psychometric and semantic metrics, reveals several key findings. Across hundreds of debates, we uncover a powerful and robust emergent tendency for agents to seek consensus, consistently reaching high semantic agreement ({\mu} > 0.88) even without explicit instruction and across sensitive topics. We show that assigned personas induce stable, measurable psychometric profiles, particularly in cognitive effort, and that the moderators persona can significantly alter debate outcomes by structuring the environment, a key finding for external AI alignment. This work provides a blueprint for a new class of dynamic, psychometrically grounded evaluation protocols designed for the agentic setting, offering a crucial methodology for understanding and shaping the social behaviors of the next generation of AI agents. We have released the code and results at https://github.com/znreza/multi-agent-LLM-eval-for-debate.
comment: 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop on Evaluating the Evolving LLM Lifecycle: Benchmarks, Emergent Abilities, and Scaling
Capital Games and Growth Equilibria
We examine formal games that we call "capital games" in which player payoffs are known, but their payoffs are not guaranteed to be von Neumann-Morgenstern utilities. In capital games, the dynamics of player payoffs determine their utility functions. Different players can have different payoff dynamics. We make no assumptions about where these dynamics come from, but implicitly assume that they come from the players' actions and interactions over time. We define an equilibrium concept called "growth equilibrium" and show a correspondence between the growth equilibria of capital games and the Nash equilibria of standard games.
Cloud Investigation Automation Framework (CIAF): An AI-Driven Approach to Cloud Forensics
Large Language Models (LLMs) have gained prominence in domains including cloud security and forensics. Yet cloud forensic investigations still rely on manual analysis, making them time-consuming and error-prone. LLMs can mimic human reasoning, offering a pathway to automating cloud log analysis. To address this, we introduce the Cloud Investigation Automation Framework (CIAF), an ontology-driven framework that systematically investigates cloud forensic logs while improving efficiency and accuracy. CIAF standardizes user inputs through semantic validation, eliminating ambiguity and ensuring consistency in log interpretation. This not only enhances data quality but also provides investigators with reliable, standardized information for decision-making. To evaluate security and performance, we analyzed Microsoft Azure logs containing ransomware-related events. By simulating attacks and assessing CIAF's impact, results showed significant improvement in ransomware detection, achieving precision, recall, and F1 scores of 93 percent. CIAF's modular, adaptable design extends beyond ransomware, making it a robust solution for diverse cyberattacks. By laying the foundation for standardized forensic methodologies and informing future AI-driven automation, this work underscores the role of deterministic prompt engineering and ontology-based validation in enhancing cloud forensic investigations. These advancements improve cloud security while paving the way for efficient, automated forensic workflows.
A Call to Action for a Secure-by-Design Generative AI Paradigm
Large language models have gained widespread prominence, yet their vulnerability to prompt injection and other adversarial attacks remains a critical concern. This paper argues for a security-by-design AI paradigm that proactively mitigates LLM vulnerabilities while enhancing performance. To achieve this, we introduce PromptShield, an ontology-driven framework that ensures deterministic and secure prompt interactions. It standardizes user inputs through semantic validation, eliminating ambiguity and mitigating adversarial manipulation. To assess PromptShield's security and performance capabilities, we conducted an experiment on an agent-based system to analyze cloud logs within Amazon Web Services (AWS), containing 493 distinct events related to malicious activities and anomalies. By simulating prompt injection attacks and assessing the impact of deploying PromptShield, our results demonstrate a significant improvement in model security and performance, achieving precision, recall, and F1 scores of approximately 94%. Notably, the ontology-based framework not only mitigates adversarial threats but also enhances the overall performance and reliability of the system. Furthermore, PromptShield's modular and adaptable design ensures its applicability beyond cloud security, making it a robust solution for safeguarding generative AI applications across various domains. By laying the groundwork for AI safety standards and informing future policy development, this work stimulates a crucial dialogue on the pivotal role of deterministic prompt engineering and ontology-based validation in ensuring the safe and responsible deployment of LLMs in high-stakes environments.
Conflict-Based Search as a Protocol: A Multi-Agent Motion Planning Protocol for Heterogeneous Agents, Solvers, and Independent Tasks
Imagine the future construction site, hospital, office, or even sophisticated household with dozens of robots bought from different manufacturers. How can we enable these different systems to effectively move in a shared environment, given that each robot may have its own independent motion planning system? This work shows how we can get efficient collision-free movements between algorithmically heterogeneous agents by using Conflict-Based Search (Sharon et al. 2015) as a protocol. At its core, the CBS Protocol requires one specific single-agent motion planning API; finding a collision-free path that satisfies certain space-time constraints. Given such an API, CBS uses a central planner to find collision-free paths - independent of how the API is implemented. We show how this protocol enables multi-agent motion planning for a heterogeneous team of agents completing independent tasks with a variety of single-agent planners including: Heuristic Search (e.g., A*), Sampling Based Search (e.g., RRT), Optimization (e.g., Direct Collocation), Diffusion, and Reinforcement Learning.
comment: Project webpage: https://rishi-v.github.io/CBS-Protocol/
Physics-Informed Neural Controlled Differential Equations for Scalable Long Horizon Multi-Agent Motion Forecasting
Long-horizon motion forecasting for multiple autonomous robots is challenging due to non-linear agent interactions, compounding prediction errors, and continuous-time evolution of dynamics. Learned dynamics of such a system can be useful in various applications such as travel time prediction, prediction-guided planning and generative simulation. In this work, we aim to develop an efficient trajectory forecasting model conditioned on multi-agent goals. Motivated by the recent success of physics-guided deep learning for partially known dynamical systems, we develop a model based on neural Controlled Differential Equations (CDEs) for long-horizon motion forecasting. Unlike discrete-time methods such as RNNs and transformers, neural CDEs operate in continuous time, allowing us to combine physics-informed constraints and biases to jointly model multi-robot dynamics. Our approach, named PINCoDE (Physics-Informed Neural Controlled Differential Equations), learns differential equation parameters that can be used to predict the trajectories of a multi-agent system starting from an initial condition. PINCoDE is conditioned on future goals and enforces physics constraints for robot motion over extended periods of time. We adopt a strategy that scales our model from 10 robots to 100 robots without the need for additional model parameters, while producing predictions with an average ADE below 0.5 m for a 1-minute horizon. Furthermore, progressive training with curriculum learning for our PINCoDE model results in a 2.7X reduction of forecasted pose error over 4 minute horizons compared to analytical models.
On the Application of Model Predictive Control to a Weighted Coverage Path Planning Problem
This paper considers the application of Model Predictive Control (MPC) to a weighted coverage path planning (WCPP) problem. The problem appears in a wide range of practical applications, including search and rescue (SAR) missions. The basic setup is that one (or multiple) agents can move around a given search space and collect rewards from a given spatial distribution. Unlike an artificial potential field, each reward can only be collected once. In contrast to a Traveling Salesman Problem (TSP), the agent moves in a continuous space. Moreover, he is not obliged to cover all locations and/or may return to previously visited locations. The WCPP problem is tackled by a new Model Predictive Control (MPC) formulation with so-called Coverage Constraints (CCs). It is shown that the solution becomes more effective if the solver is initialized with a TSP-based heuristic. With and without this initialization, the proposed MPC approach clearly outperforms a naive MPC formulation, as demonstrated in a small simulation study.
Code Like Humans: A Multi-Agent Solution for Medical Coding EMNLP
In medical coding, experts map unstructured clinical notes to alphanumeric codes for diagnoses and procedures. We introduce Code Like Humans: a new agentic framework for medical coding with large language models. It implements official coding guidelines for human experts, and it is the first solution that can support the full ICD-10 coding system (+70K labels). It achieves the best performance to date on rare diagnosis codes (fine-tuned discriminative classifiers retain an advantage for high-frequency codes, to which they are limited). Towards future work, we also contribute an analysis of system performance and identify its `blind spots' (codes that are systematically undercoded).
comment: EMNLP Findings 2025
Curriculum Imitation Learning of Distributed Multi-Robot Policies
Learning control policies for multi-robot systems (MRS) remains a major challenge due to long-term coordination and the difficulty of obtaining realistic training data. In this work, we address both limitations within an imitation learning framework. First, we shift the typical role of Curriculum Learning in MRS, from scalability with the number of robots, to focus on improving long-term coordination. We propose a curriculum strategy that gradually increases the length of expert trajectories during training, stabilizing learning and enhancing the accuracy of long-term behaviors. Second, we introduce a method to approximate the egocentric perception of each robot using only third-person global state demonstrations. Our approach transforms idealized trajectories into locally available observations by filtering neighbors, converting reference frames, and simulating onboard sensor variability. Both contributions are integrated into a physics-informed technique to produce scalable, distributed policies from observations. We conduct experiments across two tasks with varying team sizes and noise levels. Results show that our curriculum improves long-term accuracy, while our perceptual estimation method yields policies that are robust to realistic uncertainty. Together, these strategies enable the learning of robust, distributed controllers from global demonstrations, even in the absence of expert actions or onboard measurements.
comment: Accepted and presented at the Eight Iberian Robotics Conference, 2025
Robotics
MLA: A Multisensory Language-Action Model for Multimodal Understanding and Forecasting in Robotic Manipulation
Vision-language-action models (VLAs) have shown generalization capabilities in robotic manipulation tasks by inheriting from vision-language models (VLMs) and learning action generation. Most VLA models focus on interpreting vision and language to generate actions, whereas robots must perceive and interact within the spatial-physical world. This gap highlights the need for a comprehensive understanding of robotic-specific multisensory information, which is crucial for achieving complex and contact-rich control. To this end, we introduce a multisensory language-action (MLA) model that collaboratively perceives heterogeneous sensory modalities and predicts future multisensory objectives to facilitate physical world modeling. Specifically, to enhance perceptual representations, we propose an encoder-free multimodal alignment scheme that innovatively repurposes the large language model itself as a perception module, directly interpreting multimodal cues by aligning 2D images, 3D point clouds, and tactile tokens through positional correspondence. To further enhance MLA's understanding of physical dynamics, we design a future multisensory generation post-training strategy that enables MLA to reason about semantic, geometric, and interaction information, providing more robust conditions for action generation. For evaluation, the MLA model outperforms the previous state-of-the-art 2D and 3D VLA methods by 12% and 24% in complex, contact-rich real-world tasks, respectively, while also demonstrating improved generalization to unseen configurations. Project website: https://sites.google.com/view/open-mla
Benchmarking Egocentric Visual-Inertial SLAM at City Scale ICCV 2025
Precise 6-DoF simultaneous localization and mapping (SLAM) from onboard sensors is critical for wearable devices capturing egocentric data, which exhibits specific challenges, such as a wider diversity of motions and viewpoints, prevalent dynamic visual content, or long sessions affected by time-varying sensor calibration. While recent progress on SLAM has been swift, academic research is still driven by benchmarks that do not reflect these challenges or do not offer sufficiently accurate ground truth poses. In this paper, we introduce a new dataset and benchmark for visual-inertial SLAM with egocentric, multi-modal data. We record hours and kilometers of trajectories through a city center with glasses-like devices equipped with various sensors. We leverage surveying tools to obtain control points as indirect pose annotations that are metric, centimeter-accurate, and available at city scale. This makes it possible to evaluate extreme trajectories that involve walking at night or traveling in a vehicle. We show that state-of-the-art systems developed by academia are not robust to these challenges and we identify components that are responsible for this. In addition, we design tracks with different levels of difficulty to ease in-depth analysis and evaluation of less mature approaches. The dataset and benchmark are available at https://www.lamaria.ethz.ch.
comment: ICCV 2025
OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction
A dominant paradigm for teaching humanoid robots complex skills is to retarget human motions as kinematic references to train reinforcement learning (RL) policies. However, existing retargeting pipelines often struggle with the significant embodiment gap between humans and robots, producing physically implausible artifacts like foot-skating and penetration. More importantly, common retargeting methods neglect the rich human-object and human-environment interactions essential for expressive locomotion and loco-manipulation. To address this, we introduce OmniRetarget, an interaction-preserving data generation engine based on an interaction mesh that explicitly models and preserves the crucial spatial and contact relationships between an agent, the terrain, and manipulated objects. By minimizing the Laplacian deformation between the human and robot meshes while enforcing kinematic constraints, OmniRetarget generates kinematically feasible trajectories. Moreover, preserving task-relevant interactions enables efficient data augmentation, from a single demonstration to different robot embodiments, terrains, and object configurations. We comprehensively evaluate OmniRetarget by retargeting motions from OMOMO, LAFAN1, and our in-house MoCap datasets, generating over 8-hour trajectories that achieve better kinematic constraint satisfaction and contact preservation than widely used baselines. Such high-quality data enables proprioceptive RL policies to successfully execute long-horizon (up to 30 seconds) parkour and loco-manipulation skills on a Unitree G1 humanoid, trained with only 5 reward terms and simple domain randomization shared by all tasks, without any learning curriculum.
comment: Project website: https://omniretarget.github.io
TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance
Designing dense rewards is crucial for reinforcement learning (RL), yet in robotics it often demands extensive manual effort and lacks scalability. One promising solution is to view task progress as a dense reward signal, as it quantifies the degree to which actions advance the system toward task completion over time. We present TimeRewarder, a simple yet effective reward learning method that derives progress estimation signals from passive videos, including robot demonstrations and human videos, by modeling temporal distances between frame pairs. We then demonstrate how TimeRewarder can supply step-wise proxy rewards to guide reinforcement learning. In our comprehensive experiments on ten challenging Meta-World tasks, we show that TimeRewarder dramatically improves RL for sparse-reward tasks, achieving nearly perfect success in 9/10 tasks with only 200,000 interactions per task with the environment. This approach outperformed previous methods and even the manually designed environment dense reward on both the final success rate and sample efficiency. Moreover, we show that TimeRewarder pretraining can exploit real-world human videos, highlighting its potential as a scalable approach path to rich reward signals from diverse video sources.
Graphite: A GPU-Accelerated Mixed-Precision Graph Optimization Framework
We present Graphite, a GPU-accelerated nonlinear graph optimization framework. It provides a CUDA C++ interface to enable the sharing of code between a realtime application, such as a SLAM system, and its optimization tasks. The framework supports techniques to reduce memory usage, including in-place optimization, support for multiple floating point types and mixed-precision modes, and dynamically computed Jacobians. We evaluate Graphite on well-known bundle adjustment problems and find that it achieves similar performance to MegBA, a solver specialized for bundle adjustment, while maintaining generality and using less memory. We also apply Graphite to global visual-inertial bundle adjustment on maps generated from stereo-inertial SLAM datasets, and observe speed ups of up to 59x compared to a CPU baseline. Our results indicate that our solver enables faster large-scale optimization on both desktop and resource-constrained devices.
The Trajectory Bundle Method: Unifying Sequential-Convex Programming and Sampling-Based Trajectory Optimization
We present a unified framework for solving trajectory optimization problems in a derivative-free manner through the use of sequential convex programming. Traditionally, nonconvex optimization problems are solved by forming and solving a sequence of convex optimization problems, where the cost and constraint functions are approximated locally through Taylor series expansions. This presents a challenge for functions where differentiation is expensive or unavailable. In this work, we present a derivative-free approach to form these convex approximations by computing samples of the dynamics, cost, and constraint functions and letting the solver interpolate between them. Our framework includes sample-based trajectory optimization techniques like model-predictive path integral (MPPI) control as a special case and generalizes them to enable features like multiple shooting and general equality and inequality constraints that are traditionally associated with derivative-based sequential convex programming methods. The resulting framework is simple, flexible, and capable of solving a wide variety of practical motion planning and control problems.
Radio-based Multi-Robot Odometry and Relative Localization
Radio-based methods such as Ultra-Wideband (UWB) and RAdio Detection And Ranging (radar), which have traditionally seen limited adoption in robotics, are experiencing a boost in popularity thanks to their robustness to harsh environmental conditions and cluttered environments. This work proposes a multi-robot UGV-UAV localization system that leverages the two technologies with inexpensive and readily-available sensors, such as Inertial Measurement Units (IMUs) and wheel encoders, to estimate the relative position of an aerial robot with respect to a ground robot. The first stage of the system pipeline includes a nonlinear optimization framework to trilaterate the location of the aerial platform based on UWB range data, and a radar pre-processing module with loosely coupled ego-motion estimation which has been adapted for a multi-robot scenario. Then, the pre-processed radar data as well as the relative transformation are fed to a pose-graph optimization framework with odometry and inter-robot constraints. The system, implemented for the Robotic Operating System (ROS 2) with the Ceres optimizer, has been validated in Software-in-the-Loop (SITL) simulations and in a real-world dataset. The proposed relative localization module outperforms state-of-the-art closed-form methods which are less robust to noise. Our SITL environment includes a custom Gazebo plugin for generating realistic UWB measurements modeled after real data. Conveniently, the proposed factor graph formulation makes the system readily extensible to full Simultaneous Localization And Mapping (SLAM). Finally, all the code and experimental data is publicly available to support reproducibility and to serve as a common open dataset for benchmarking.
OceanGym: A Benchmark Environment for Underwater Embodied Agents
We introduce OceanGym, the first comprehensive benchmark for ocean underwater embodied agents, designed to advance AI in one of the most demanding real-world environments. Unlike terrestrial or aerial domains, underwater settings present extreme perceptual and decision-making challenges, including low visibility, dynamic ocean currents, making effective agent deployment exceptionally difficult. OceanGym encompasses eight realistic task domains and a unified agent framework driven by Multi-modal Large Language Models (MLLMs), which integrates perception, memory, and sequential decision-making. Agents are required to comprehend optical and sonar data, autonomously explore complex environments, and accomplish long-horizon objectives under these harsh conditions. Extensive experiments reveal substantial gaps between state-of-the-art MLLM-driven agents and human experts, highlighting the persistent difficulty of perception, planning, and adaptability in ocean underwater environments. By providing a high-fidelity, rigorously designed platform, OceanGym establishes a testbed for developing robust embodied AI and transferring these capabilities to real-world autonomous ocean underwater vehicles, marking a decisive step toward intelligent agents capable of operating in one of Earth's last unexplored frontiers. The code and data are available at https://github.com/OceanGPT/OceanGym.
comment: Work in progress
Memory-Efficient 2D/3D Shape Assembly of Robot Swarms
Mean-shift-based approaches have recently emerged as the most effective methods for robot swarm shape assembly tasks. These methods rely on image-based representations of target shapes to compute local density gradients and perform mean-shift exploration, which constitute their core mechanism. However, such image representations incur substantial memory overhead, which can become prohibitive for high-resolution or 3D shapes. To overcome this limitation, we propose a memory-efficient tree map representation that hierarchically encodes user-specified shapes and is applicable to both 2D and 3D scenarios. Building on this representation, we design a behavior-based distributed controller that enables assignment-free shape assembly. Comparative 2D and 3D simulations against a state-of-the-art mean-shift algorithm demonstrate one to two orders of magnitude lower memory usage and two to three times faster shape entry while maintaining comparable uniformity. Finally, we validate the framework through physical experiments with 6 to 7 UAVs, confirming its real-world practicality.
Learning from Hallucinating Critical Points for Navigation in Dynamic Environments
Generating large and diverse obstacle datasets to learn motion planning in environments with dynamic obstacles is challenging due to the vast space of possible obstacle trajectories. Inspired by hallucination-based data synthesis approaches, we propose Learning from Hallucinating Critical Points (LfH-CP), a self-supervised framework for creating rich dynamic obstacle datasets based on existing optimal motion plans without requiring expensive expert demonstrations or trial-and-error exploration. LfH-CP factorizes hallucination into two stages: first identifying when and where obstacles must appear in order to result in an optimal motion plan, i.e., the critical points, and then procedurally generating diverse trajectories that pass through these points while avoiding collisions. This factorization avoids generative failures such as mode collapse and ensures coverage of diverse dynamic behaviors. We further introduce a diversity metric to quantify dataset richness and show that LfH-CP produces substantially more varied training data than existing baselines. Experiments in simulation demonstrate that planners trained on LfH-CP datasets achieves higher success rates compared to a prior hallucination method.
Analytic Conditions for Differentiable Collision Detection in Trajectory Optimization IROS
Optimization-based methods are widely used for computing fast, diverse solutions for complex tasks such as collision-free movement or planning in the presence of contacts. However, most of these methods require enforcing non-penetration constraints between objects, resulting in a non-trivial and computationally expensive problem. This makes the use of optimization-based methods for planning and control challenging. In this paper, we present a method to efficiently enforce non-penetration of sets while performing optimization over their configuration, which is directly applicable to problems like collision-aware trajectory optimization. We introduce novel differentiable conditions with analytic expressions to achieve this. To enforce non-collision between non-smooth bodies using these conditions, we introduce a method to approximate polytopes as smooth semi-algebraic sets. We present several numerical experiments to demonstrate the performance of the proposed method and compare the performance with other baseline methods recently proposed in the literature.
comment: 8 pages, 8 figures. Accepted to the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025
Unwinding Rotations Reduces VR Sickness in Nonsimulated Immersive Telepresence
Immersive telepresence, when a user views the video stream of a $360^\circ$ camera in a remote environment using a Head Mounted Display (HMD), has great potential to improve the sense of being in a remote environment. In most cases of immersive robotic telepresence, the camera is mounted on a mobile robot which increases the portion of the environment that the remote user can explore. However, robot motions can induce unpleasant symptoms associated with Virtual Reality (VR) sickness, degrading the overall user experience. Previous research has shown that unwinding the rotations of the robot, that is, decoupling the rotations that the camera undergoes due to robot motions from what is seen by the user, can increase user comfort and reduce VR sickness. However, that work considered a virtual environment and a simulated robot. In this work, to test whether the same hypotheses hold when the video stream from a real camera is used, we carried out a user study $(n=36)$ in which the unwinding rotations method was compared against coupled rotations in a task completed through a panoramic camera mounted on a robotic arm. Furthermore, within an inspection task which involved translations and rotations in three dimensions, we tested whether unwinding the robot rotations impacted the performance of users. The results show that the users found the unwinding rotations method to be more comfortable and preferable, and that a reduced level of VR sickness can be achieved without a significant impact on task performance.
comment: 24th IEEE International Symposium on Mixed and Augmented Reality (ISMAR)
Real-time Velocity Profile Optimization for Time-Optimal Maneuvering with Generic Acceleration Constraints
The computation of time-optimal velocity profiles along prescribed paths, subject to generic acceleration constraints, is a crucial problem in robot trajectory planning, with particular relevance to autonomous racing. However, the existing methods either support arbitrary acceleration constraints at high computational cost or use conservative box constraints for computational efficiency. We propose FBGA, a new \underline{F}orward-\underline{B}ackward algorithm with \underline{G}eneric \underline{A}cceleration constraints, which achieves both high accuracy and low computation time. FBGA operates forward and backward passes to maximize the velocity profile in short, discretized path segments, while satisfying user-defined performance limits. Tested on five racetracks and two vehicle classes, FBGA handles complex, non-convex acceleration constraints with custom formulations. Its maneuvers and lap times closely match optimal control baselines (within $0.11\%$-$0.36\%$), while being up to three orders of magnitude faster. FBGA maintains high accuracy even with coarse discretization, making it well-suited for online multi-query trajectory planning. Our open-source \texttt{C++} implementation is available at: https://anonymous.4open.science/r/FB_public_RAL.
SDA-PLANNER: State-Dependency Aware Adaptive Planner for Embodied Task Planning
Embodied task planning requires agents to produce executable actions in a close-loop manner within the environment. With progressively improving capabilities of LLMs in task decomposition, planning, and generalization, current embodied task planning methods adopt LLM-based architecture.However, existing LLM-based planners remain limited in three aspects, i.e., fixed planning paradigms, lack of action sequence constraints, and error-agnostic. In this work, we propose SDA-PLANNER, enabling an adaptive planning paradigm, state-dependency aware and error-aware mechanisms for comprehensive embodied task planning. Specifically, SDA-PLANNER introduces a State-Dependency Graph to explicitly model action preconditions and effects, guiding the dynamic revision. To handle execution error, it employs an error-adaptive replanning strategy consisting of Error Backtrack and Diagnosis and Adaptive Action SubTree Generation, which locally reconstructs the affected portion of the plan based on the current environment state. Experiments demonstrate that SDA-PLANNER consistently outperforms baselines in success rate and goal completion, particularly under diverse error conditions.
Kinodynamic Motion Planning for Mobile Robot Navigation across Inconsistent World Models
Mobile ground robots lacking prior knowledge of an environment must rely on sensor data to develop a model of their surroundings. In these scenarios, consistent identification of obstacles and terrain features can be difficult due to noise and algorithmic shortcomings, which can make it difficult for motion planning systems to generate safe motions. One particular difficulty to overcome is when regions of the cost map switch between being marked as obstacles and free space through successive planning cycles. One potential solution to this, which we refer to as Valid in Every Hypothesis (VEH), is for the planning system to plan motions that are guaranteed to be safe through a history of world models. Another approach is to track a history of world models, and adjust node costs according to the potential penalty of needing to reroute around previously hazardous areas. This work discusses three major iterations on this idea. The first iteration, called PEH, invokes a sub-search for every node expansion that crosses through a divergence point in the world models. The second and third iterations, called GEH and GEGRH respectively, defer the sub-search until after an edge expands into the goal region. GEGRH uses an additional step to revise the graph based on divergent nodes in each world. Initial results showed that, although PEH and GEH find more optimistic solutions than VEH, they are unable to generate solutions in less than one-second, which exceeds our requirements for field deployment. Analysis of results from a field experiment in an unstructured, off-road environment on a Clearpath Robotics Warthog UGV indicate that GEGRH finds lower cost trajectories and has faster average planning times than VEH. Compared to single-hypothesis (SH) search, where only the latest world model is considered, GEGRH generates more conservative plans with a small increase in average planning time.
comment: Presented at the Robotics: Science and Systems (RSS) 2025 Workshop on Resilient Off-road Autonomous Robotics (ROAR)
LLM-MCoX: Large Language Model-based Multi-robot Coordinated Exploration and Search
Autonomous exploration and object search in unknown indoor environments remain challenging for multi-robot systems (MRS). Traditional approaches often rely on greedy frontier assignment strategies with limited inter-robot coordination. In this work, we introduce LLM-MCoX (LLM-based Multi-robot Coordinated Exploration and Search), a novel framework that leverages Large Language Models (LLMs) for intelligent coordination of both homogeneous and heterogeneous robot teams tasked with efficient exploration and target object search. Our approach combines real-time LiDAR scan processing for frontier cluster extraction and doorway detection with multimodal LLM reasoning (e.g., GPT-4o) to generate coordinated waypoint assignments based on shared environment maps and robot states. LLM-MCoX demonstrates superior performance compared to existing methods, including greedy and Voronoi-based planners, achieving 22.7% faster exploration times and 50% improved search efficiency in large environments with 6 robots. Notably, LLM-MCoX enables natural language-based object search capabilities, allowing human operators to provide high-level semantic guidance that traditional algorithms cannot interpret.
Anomaly detection for generic failure monitoring in robotic assembly, screwing and manipulation
Out-of-distribution states in robot manipulation often lead to unpredictable robot behavior or task failure, limiting success rates and increasing risk of damage. Anomaly detection (AD) can identify deviations from expected patterns in data, which can be used to trigger failsafe behaviors and recovery strategies. Prior work has applied data-driven AD to time series data in specific robotic tasks, but its transferability across control strategies and task types has not been shown. Leveraging time series data, such as force/torque signals, allows to directly capture robot-environment interactions, crucial for manipulation and online failure detection. Their broad availability, high sampling rates, and low dimensionality enable high temporal resolution and efficient processing. As robotic tasks can have widely signal characteristics and requirements, AD methods which can be applied in the same way to a wide range of tasks is needed, ideally with good data efficiency. We examine three industrial robotic tasks, each presenting several anomalies. Test scenarios in robotic cabling, screwing, and sanding are built, and multimodal time series data is gathered. Several autoencoder-based methods are compared, evaluating generalization across tasks and control methods (diffusion policy, position, and impedance control). This allows us to validate the integration of AD in complex tasks involving tighter tolerances and variation from both the robot and its environment. Additionally, we evaluate data efficiency, detection latency, and task characteristics which support robust detection. The results indicate reliable detection with AUROC exceeding 0.93 in failures in the cabling and screwing task, such as incorrect or misaligned parts and obstructed targets. In the polishing task, only severe failures were reliably detected, while more subtle failure types remained undetected.
ISyHand: A Dexterous Multi-finger Robot Hand with an Articulated Palm
The rapid increase in the development of humanoid robots and customized manufacturing solutions has brought dexterous manipulation to the forefront of modern robotics. Over the past decade, several expensive dexterous hands have come to market, but advances in hardware design, particularly in servo motors and 3D printing, have recently facilitated an explosion of cheaper open-source hands. Most hands are anthropomorphic to allow use of standard human tools, and attempts to increase dexterity often sacrifice anthropomorphism. We introduce the open-source ISyHand (pronounced easy-hand), a highly dexterous, low-cost, easy-to-manufacture, on-joint servo-driven robot hand. Our hand uses off-the-shelf Dynamixel motors, fasteners, and 3D-printed parts, can be assembled within four hours, and has a total material cost of about 1,300 USD. The ISyHands's unique articulated-palm design increases overall dexterity with only a modest sacrifice in anthropomorphism. To demonstrate the utility of the articulated palm, we use reinforcement learning in simulation to train the hand to perform a classical in-hand manipulation task: cube reorientation. Our novel, systematic experiments show that the simulated ISyHand outperforms the two most comparable hands in early training phases, that all three perform similarly well after policy convergence, and that the ISyHand significantly outperforms a fixed-palm version of its own design. Additionally, we deploy a policy trained on cube reorientation on the real hand, demonstrating its ability to perform real-world dexterous manipulation.
comment: Accepted at IEEE Humanoids 2025
Terrain-Awared LiDAR-Inertial Odometry for Legged-Wheel Robots Based on Radial Basis Function Approximation
An accurate odometry is essential for legged-wheel robots operating in unstructured terrains such as bumpy roads and staircases. Existing methods often suffer from pose drift due to their ignorance of terrain geometry. We propose a terrain-awared LiDAR-Inertial odometry (LIO) framework that approximates the terrain using Radial Basis Functions (RBF) whose centers are adaptively selected and weights are recursively updated. The resulting smooth terrain manifold enables ``soft constraints" that regularize the odometry optimization and mitigates the $z$-axis pose drift under abrupt elevation changes during robot's maneuver. To ensure the LIO's real-time performance, we further evaluate the RBF-related terms and calculate the inverse of the sparse kernel matrix with GPU parallelization. Experiments on unstructured terrains demonstrate that our method achieves higher localization accuracy than the state-of-the-art baselines, especially in the scenarios that have continuous height changes or sparse features when abrupt height changes occur.
Side Scan Sonar-based SLAM for Autonomous Algae Farm Monitoring
The transition of seaweed farming to an alternative food source on an industrial scale relies on automating its processes through smart farming, equivalent to land agriculture. Key to this process are autonomous underwater vehicles (AUVs) via their capacity to automate crop and structural inspections. However, the current bottleneck for their deployment is ensuring safe navigation within farms, which requires an accurate, online estimate of the AUV pose and map of the infrastructure. To enable this, we propose an efficient side scan sonar-based (SSS) simultaneous localization and mapping (SLAM) framework that exploits the geometry of kelp farms via modeling structural ropes in the back-end as sequences of individual landmarks from each SSS ping detection, instead of combining detections into elongated representations. Our method outperforms state of the art solutions in hardware in the loop (HIL) experiments on a real AUV survey in a kelp farm. The framework and dataset can be found at https://github.com/julRusVal/sss_farm_slam.
Autonomous Multi-Robot Infrastructure for AI-Enabled Healthcare Delivery and Diagnostics
This research presents a multi-robot system for inpatient care, designed using swarm intelligence principles and incorporating wearable health sensors, RF-based communication, and AI-driven decision support. Within a simulated hospital environment, the system adopts a leader-follower swarm configuration to perform patient monitoring, medicine delivery, and emergency assistance. Due to ethical constraints, live patient trials were not conducted; instead, validation was carried out through controlled self-testing with wearable sensors. The Leader Robot acquires key physiological parameters, including temperature, SpO2, heart rate, and fall detection, and coordinates other robots when required. The Assistant Robot patrols corridors for medicine delivery, while a robotic arm provides direct drug administration. The swarm-inspired leader-follower strategy enhanced communication reliability and ensured continuous monitoring, including automated email alerts to healthcare staff. The system hardware was implemented using Arduino, Raspberry Pi, NRF24L01 RF modules, and a HuskyLens AI camera. Experimental evaluation showed an overall sensor accuracy above 94%, a 92% task-level success rate, and a 96% communication reliability rate, demonstrating system robustness. Furthermore, the AI-enabled decision support was able to provide early warnings of abnormal health conditions, highlighting the potential of the system as a cost-effective solution for hospital automation and patient safety.
comment: 11 pages, 5 figures, MSc dissertation submission draft, prepared for conference/journal consideration
Evolutionary Continuous Adaptive RL-Powered Co-Design for Humanoid Chin-Up Performance
Humanoid robots have seen significant advancements in both design and control, with a growing emphasis on integrating these aspects to enhance overall performance. Traditionally, robot design has followed a sequential process, where control algorithms are developed after the hardware is finalized. However, this can be myopic and prevent robots to fully exploit their hardware capabilities. Recent approaches advocate for co-design, optimizing both design and control in parallel to maximize robotic capabilities. This paper presents the Evolutionary Continuous Adaptive RL-based Co-Design (EA-CoRL) framework, which combines reinforcement learning (RL) with evolutionary strategies to enable continuous adaptation of the control policy to the hardware. EA-CoRL comprises two key components: Design Evolution, which explores the hardware choices using an evolutionary algorithm to identify efficient configurations, and Policy Continuous Adaptation, which fine-tunes a task-specific control policy across evolving designs to maximize performance rewards. We evaluate EA-CoRL by co-designing the actuators (gear ratios) and control policy of the RH5 humanoid for a highly dynamic chin-up task, previously unfeasible due to actuator limitations. Comparative results against state-of-the-art RL-based co-design methods show that EA-CoRL achieves higher fitness score and broader design space exploration, highlighting the critical role of continuous policy adaptation in robot co-design.
Conflict-Based Search and Prioritized Planning for Multi-Agent Path Finding Among Movable Obstacles
This paper investigates Multi-Agent Path Finding Among Movable Obstacles (M-PAMO), which seeks collision-free paths for multiple agents from their start to goal locations among static and movable obstacles. M-PAMO arises in logistics and warehouses where mobile robots are among unexpected movable objects. Although Multi-Agent Path Finding (MAPF) and single-agent Path planning Among Movable Obstacles (PAMO) were both studied, M-PAMO remains under-explored. Movable obstacles lead to new fundamental challenges as the state space, which includes both agents and movable obstacles, grows exponentially with respect to the number of agents and movable obstacles. In particular, movable obstacles often closely couple agents together spatially and temporally. This paper makes a first attempt to adapt and fuse the popular Conflict-Based Search (CBS) and Prioritized Planning (PP) for MAPF, and a recent single-agent PAMO planner called PAMO*, together to address M-PAMO. We compare their performance with up to 20 agents and hundreds of movable obstacles, and show the pros and cons of these approaches.
Towards Human Engagement with Realistic AI Combat Pilots
We present a system that enables real-time interaction between human users and agents trained to control fighter jets in simulated 3D air combat scenarios. The agents are trained in a dedicated environment using Multi-Agent Reinforcement Learning. A communication link is developed to allow seamless deployment of trained agents into VR-Forces, a widely used defense simulation tool for realistic tactical scenarios. This integration allows mixed simulations where human-controlled entities engage with intelligent agents exhibiting distinct combat behaviors. Our interaction model creates new opportunities for human-agent teaming, immersive training, and the exploration of innovative tactics in defense contexts.
comment: 13th International Conference on Human-Agent Interaction (HAI) 2025
On the Conic Complementarity of Planar Contacts
We present a unifying theoretical result that connects two foundational principles in robotics: the Signorini law for point contacts, which underpins many simulation methods for preventing object interpenetration, and the center of pressure (also known as the zero-moment point), a key concept used in, for instance, optimization-based locomotion control. Our contribution is the planar Signorini condition, a conic complementarity formulation that models general planar contacts between rigid bodies. We prove that this formulation is equivalent to enforcing the punctual Signorini law across an entire contact surface, thereby bridging the gap between discrete and continuous contact models. A geometric interpretation reveals that the framework naturally captures three physical regimes -sticking, separating, and tilting-within a unified complementarity structure. This leads to a principled extension of the classical center of pressure, which we refer to as the extended center of pressure. By establishing this connection, our work provides a mathematically consistent and computationally tractable foundation for handling planar contacts, with implications for both the accurate simulation of contact dynamics and the design of advanced control and optimization algorithms in locomotion and manipulation.
Emotionally Expressive Robots: Implications for Children's Behavior toward Robot
The growing development of robots with artificial emotional expressiveness raises important questions about their persuasive potential in children's behavior. While research highlights the pragmatic value of emotional expressiveness in human social communication, the extent to which robotic expressiveness can or should influence empathic responses in children is grounds for debate. In a pilot study with 22 children (aged 7-11) we begin to explore the ways in which different levels of embodied expressiveness (body only, face only, body and face) of two basic emotions (happiness and sadness) displayed by an anthropomorphic robot (QTRobot) might modify children's behavior in a child-robot cooperative turn-taking game. We observed that children aligned their behavior to the robot's inferred emotional state. However, higher levels of expressiveness did not result in increased alignment. The preliminary results reported here provide a starting point for reflecting on robotic expressiveness and its role in shaping children's social-emotional behavior toward robots as social peers in the near future.
S$^3$E: Self-Supervised State Estimation for Radar-Inertial System
Millimeter-wave radar for state estimation is gaining significant attention for its affordability and reliability in harsh conditions. Existing localization solutions typically rely on post-processed radar point clouds as landmark points. Nonetheless, the inherent sparsity of radar point clouds, ghost points from multi-path effects, and limited angle resolution in single-chirp radar severely degrade state estimation performance. To address these issues, we propose S$^3$E, a \textbf{S}elf-\textbf{S}upervised \textbf{S}tate \textbf{E}stimator that employs more richly informative radar signal spectra to bypass sparse points and fuses complementary inertial information to achieve accurate localization. S$^3$E fully explores the association between \textit{exteroceptive} radar and \textit{proprioceptive} inertial sensor to achieve complementary benefits. To deal with limited angle resolution, we introduce a novel cross-fusion technique that enhances spatial structure information by exploiting subtle rotational shift correlations across heterogeneous data. The experimental results demonstrate our method achieves robust and accurate performance without relying on localization ground truth supervision. To the best of our knowledge, this is the first attempt to achieve state estimation by fusing radar spectra and inertial data in a complementary self-supervised manner.
MUVLA: Learning to Explore Object Navigation via Map Understanding
In this paper, we present MUVLA, a Map Understanding Vision-Language-Action model tailored for object navigation. It leverages semantic map abstractions to unify and structure historical information, encoding spatial context in a compact and consistent form. MUVLA takes the current and history observations, as well as the semantic map, as inputs and predicts the action sequence based on the description of goal object. Furthermore, it amplifies supervision through reward-guided return modeling based on dense short-horizon progress signals, enabling the model to develop a detailed understanding of action value for reward maximization. MUVLA employs a three-stage training pipeline: learning map-level spatial understanding, imitating behaviors from mixed-quality demonstrations, and reward amplification. This strategy allows MUVLA to unify diverse demonstrations into a robust spatial representation and generate more rational exploration strategies. Experiments on HM3D and Gibson benchmarks demonstrate that MUVLA achieves great generalization and learns effective exploration behaviors even from low-quality or partially successful trajectories.
Towards Intuitive Human-Robot Interaction through Embodied Gesture-Driven Control with Woven Tactile Skins
This paper presents a novel human-robot interaction (HRI) framework that enables intuitive gesture-driven control through a capacitance-based woven tactile skin. Unlike conventional interfaces that rely on panels or handheld devices, the woven tactile skin integrates seamlessly with curved robot surfaces, enabling embodied interaction and narrowing the gap between human intent and robot response. Its woven design combines fabric-like flexibility with structural stability and dense multi-channel sensing through the interlaced conductive threads. Building on this capability, we define a gesture-action mapping of 14 single- and multi-touch gestures that cover representative robot commands, including task-space motion and auxiliary functions. A lightweight convolution-transformer model designed for gesture recognition in real time achieves an accuracy of near-100%, outperforming prior baseline approaches. Experiments on robot arm tasks, including pick-and-place and pouring, demonstrate that our system reduces task completion time by up to 57% compared with keyboard panels and teach pendants. Overall, our proposed framework demonstrates a practical pathway toward more natural and efficient embodied HRI.
State Estimation for Compliant and Morphologically Adaptive Robots ICRA 2026
Locomotion robots with active or passive compliance can show robustness to uncertain scenarios, which can be promising for agricultural, research and environmental industries. However, state estimation for these robots is challenging due to the lack of rigid-body assumptions and kinematic changes from morphing. We propose a method to estimate typical rigid-body states alongside compliance-related states, such as soft robot shape in different morphologies and locomotion modes. Our neural network-based state estimator uses a history of states and a mechanism to directly influence unreliable sensors. We test our framework on the GOAT platform, a robot capable of passive compliance and active morphing for extreme outdoor terrain. The network is trained on motion capture data in a novel compliance-centric frame that accounts for morphing-related states. Our method predicts shape-related measurements within 4.2% of the robot's size, velocities within 6.3% and 2.4% of the top linear and angular speeds, respectively, and orientation within 1.5 degrees. We also demonstrate a 300% increase in travel range during a motor malfunction when using our estimator for closed-loop autonomous outdoor operation.
comment: 8 pages, 10 figures, 1 table, submitted to ICRA 2026
Preemptive Spatiotemporal Trajectory Adjustment for Heterogeneous Vehicles in Highway Merging Zones
Aiming at the problem of driver's perception lag and low utilization efficiency of space-time resources in expressway ramp confluence area, based on the preemptive spatiotemporal trajectory Adjustment system, from the perspective of coordinating spatiotemporal resources, the reasonable value of safe space-time distance in trajectory pre-preparation is quantitatively analyzed. The minimum safety gap required for ramp vehicles to merge into the mainline is analyzed by introducing double positioning error and spatiotemporal trajectory tracking error. A merging control strategy for autonomous driving heterogeneous vehicles is proposed, which integrates vehicle type, driving intention, and safety spatiotemporal distance. The specific confluence strategies of ramp target vehicles and mainline cooperative vehicles under different vehicle types are systematically expounded. A variety of traffic flow and speed scenarios are used for full combination simulation. By comparing the time-position-speed diagram, the vehicle operation characteristics and the dynamic difference of confluence are qualitatively analyzed, and the average speed and average delay are used as the evaluation indices to quantitatively evaluate the performance advantages of the preemptive cooperative confluence control strategy. The results show that the maximum average delay improvement rates of mainline and ramp vehicles are 90.24 % and 74.24 %, respectively. The proposed strategy can effectively avoid potential vehicle conflicts and emergency braking behaviors, improve driving safety in the confluence area, and show significant advantages in driving stability and overall traffic efficiency optimization.
Reinforced Embodied Planning with Verifiable Reward for Real-World Robotic Manipulation
Enabling robots to execute long-horizon manipulation tasks from free-form language instructions remains a fundamental challenge in embodied AI. While vision-language models (VLMs) have shown promise as high-level planners, their deployment in the real world is hindered by two gaps: (i) the scarcity of large-scale, sequential manipulation data that couples natural language with multi-step action plans, and (ii) the absence of dense, interpretable rewards for fine-tuning VLMs on planning objectives. To address these issues, we propose REVER, a framework that empowers VLMs to generate and validate long-horizon manipulation plans from natural language instructions in real-world scenarios. Under REVER we train and release RoboFarseer, a VLM incentivized to emit chain-of-thought that perform temporal and spatial reasoning, ensuring physically plausible and logically coherent plans. To obtain training data, we leverage the Universal Manipulation Interface framework to capture hardware-agnostic demonstrations of atomic skills. An automated annotation engine converts each demonstration into vision-instruction-plan triplet. We introduce a verifiable reward that scores the generated plan by its ordered bipartite matching overlap with the ground-truth skill sequence. At run time, the fine-tuned VLM functions both as a planner and as a monitor, verifying step-wise completion. RoboFarseer matches or exceeds the performance of proprietary models that are orders of magnitude larger, while on open-ended planning it surpasses the best baseline by more than 40%. In real-world, long-horizon tasks, the complete system boosts overall success by roughly 60% compared with the same low-level controller without the planner. We will open-source both the dataset and the trained model upon publication.
Act to See, See to Act: Diffusion-Driven Perception-Action Interplay for Adaptive Policies NeurIPS 2025
Existing imitation learning methods decouple perception and action, which overlooks the causal reciprocity between sensory representations and action execution that humans naturally leverage for adaptive behaviors. To bridge this gap, we introduce Action--Guided Diffusion Policy (DP--AG), a unified representation learning that explicitly models a dynamic interplay between perception and action through probabilistic latent dynamics. DP--AG encodes latent observations into a Gaussian posterior via variational inference and evolves them using an action-guided SDE, where the Vector-Jacobian Product (VJP) of the diffusion policy's noise predictions serves as a structured stochastic force driving latent updates. To promote bidirectional learning between perception and action, we introduce a cycle--consistent contrastive loss that organizes the gradient flow of the noise predictor into a coherent perception--action loop, enforcing mutually consistent transitions in both latent updates and action refinements. Theoretically, we derive a variational lower bound for the action-guided SDE, and prove that the contrastive objective enhances continuity in both latent and action trajectories. Empirically, DP--AG significantly outperforms state--of--the--art methods across simulation benchmarks and real-world UR5 manipulation tasks. As a result, our DP--AG offers a promising step toward bridging biological adaptability and artificial policy learning.
comment: 42 pages, 17 figures, 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
SAC Flow: Sample-Efficient Reinforcement Learning of Flow-Based Policies via Velocity-Reparameterized Sequential Modeling
Training expressive flow-based policies with off-policy reinforcement learning is notoriously unstable due to gradient pathologies in the multi-step action sampling process. We trace this instability to a fundamental connection: the flow rollout is algebraically equivalent to a residual recurrent computation, making it susceptible to the same vanishing and exploding gradients as RNNs. To address this, we reparameterize the velocity network using principles from modern sequential models, introducing two stable architectures: Flow-G, which incorporates a gated velocity, and Flow-T, which utilizes a decoded velocity. We then develop a practical SAC-based algorithm, enabled by a noise-augmented rollout, that facilitates direct end-to-end training of these policies. Our approach supports both from-scratch and offline-to-online learning and achieves state-of-the-art performance on continuous control and robotic manipulation benchmarks, eliminating the need for common workarounds like policy distillation or surrogate objectives.
Best of Sim and Real: Decoupled Visuomotor Manipulation via Learning Control in Simulation and Perception in Real
Sim-to-real transfer remains a fundamental challenge in robot manipulation due to the entanglement of perception and control in end-to-end learning. We present a decoupled framework that learns each component where it is most reliable: control policies are trained in simulation with privileged state to master spatial layouts and manipulation dynamics, while perception is adapted only at deployment to bridge real observations to the frozen control policy. Our key insight is that control strategies and action patterns are universal across environments and can be learned in simulation through systematic randomization, while perception is inherently domain-specific and must be learned where visual observations are authentic. Unlike existing end-to-end approaches that require extensive real-world data, our method achieves strong performance with only 10-20 real demonstrations by reducing the complex sim-to-real problem to a structured perception alignment task. We validate our approach on tabletop manipulation tasks, demonstrating superior data efficiency and out-of-distribution generalization compared to end-to-end baselines. The learned policies successfully handle object positions and scales beyond the training distribution, confirming that decoupling perception from control fundamentally improves sim-to-real transfer.
comment: 10 pages, 6 figures
TacRefineNet: Tactile-Only Grasp Refinement Between Arbitrary In-Hand Object Poses
Despite progress in both traditional dexterous grasping pipelines and recent Vision-Language-Action (VLA) approaches, the grasp execution stage remains prone to pose inaccuracies, especially in long-horizon tasks, which undermines overall performance. To address this "last-mile" challenge, we propose TacRefineNet, a tactile-only framework that achieves fine in-hand pose refinement of known objects in arbitrary target poses using multi-finger fingertip sensing. Our method iteratively adjusts the end-effector pose based on tactile feedback, aligning the object to the desired configuration. We design a multi-branch policy network that fuses tactile inputs from multiple fingers along with proprioception to predict precise control updates. To train this policy, we combine large-scale simulated data from a physics-based tactile model in MuJoCo with real-world data collected from a physical system. Comparative experiments show that pretraining on simulated data and fine-tuning with a small amount of real data significantly improves performance over simulation-only training. Extensive real-world experiments validate the effectiveness of the method, achieving millimeter-level grasp accuracy using only tactile input. To our knowledge, this is the first method to enable arbitrary in-hand pose refinement via multi-finger tactile sensing alone. Project website is available at https://sites.google.com/view/tacrefinenet
comment: 9 pages, 9 figures
Boundary-to-Region Supervision for Offline Safe Reinforcement Learning NeurIPS 2025
Offline safe reinforcement learning aims to learn policies that satisfy predefined safety constraints from static datasets. Existing sequence-model-based methods condition action generation on symmetric input tokens for return-to-go and cost-to-go, neglecting their intrinsic asymmetry: return-to-go (RTG) serves as a flexible performance target, while cost-to-go (CTG) should represent a rigid safety boundary. This symmetric conditioning leads to unreliable constraint satisfaction, especially when encountering out-of-distribution cost trajectories. To address this, we propose Boundary-to-Region (B2R), a framework that enables asymmetric conditioning through cost signal realignment . B2R redefines CTG as a boundary constraint under a fixed safety budget, unifying the cost distribution of all feasible trajectories while preserving reward structures. Combined with rotary positional embeddings , it enhances exploration within the safe region. Experimental results show that B2R satisfies safety constraints in 35 out of 38 safety-critical tasks while achieving superior reward performance over baseline methods. This work highlights the limitations of symmetric token conditioning and establishes a new theoretical and practical approach for applying sequence models to safe RL. Our code is available at https://github.com/HuikangSu/B2R.
comment: NeurIPS 2025
VLA Model Post-Training via Action-Chunked PPO and Self Behavior Cloning
Reinforcement learning (RL) is a promising avenue for post-training vision-language-action (VLA) models, but practical deployment is hindered by sparse rewards and unstable training. This work mitigates these challenges by introducing an action chunk based on proximal policy optimization (PPO) with behavior cloning using self-collected demonstrations. Aggregating consecutive actions into chunks improves the temporal consistency of the policy and the density of informative feedback. In addition, an auxiliary behavior cloning loss is applied with a dynamically updated demonstration buffer that continually collects high-quality task trials during training. The relative weight between the action-chunked PPO objective and the self behavior clone auxiliary loss is adapted online to stabilize the post-training process. Experiments on the MetaWorld benchmark indicate improved performance over supervised fine-tuning, achieving a high success rate (0.93) and few steps to success (42.17). These results demonstrate the viability of RL for VLA post-training and help lay the groundwork for downstream VLA applications.
OmniNav: A Unified Framework for Prospective Exploration and Visual-Language Navigation
Embodied navigation presents a core challenge for intelligent robots, requiring the comprehension of visual environments, natural language instructions, and autonomous exploration. Existing models often fall short in offering a unified solution across diverse navigation paradigms, resulting in low success rates and limited generalization. We introduce OmniNav, a unified framework addressing instruct-goal, object-goal, point-goal navigation, and frontier-based exploration within a single architecture. Our approach features a lightweight, low-latency policy that accurately predicts continuous-space waypoints (coordinates and orientations). This policy surpasses action-chunk methods in precision and supports real-world deployment at control frequencies up to 5 Hz. Architecturally, OmniNav employs a fast-slow system design: a fast module generates waypoints using short-horizon visual context and subtasks, while a slow module performs deliberative planning with long-horizon observations and candidate frontiers to select subsequent subgoals and subtasks. This collaboration enhances path efficiency and maintains trajectory coherence, particularly in exploration and memory-intensive scenarios. Crucially, we identify that the primary bottleneck isn't merely navigation policy learning, but a robust understanding of general instructions and objects. To boost generalization, OmniNav integrates large-scale, general-purpose training datasets, including those for image captioning and visual recognition, into a joint multi-task regimen. This significantly improves success rates and robustness. Extensive experiments confirm OmniNav's state-of-the-art performance across various navigation benchmarks, with real-world deployment further validating its efficacy. OmniNav provides practical insights for embodied navigation, charting a scalable path towards versatile, highly generalizable robotic intelligence.
Hierarchical Diffusion Motion Planning with Task-Conditioned Uncertainty-Aware Priors
We propose a novel hierarchical diffusion planner that embeds task and motion structure directly in the noise model. Unlike standard diffusion-based planners that use zero-mean, isotropic Gaussian noise, we employ a family of task-conditioned structured Gaussians whose means and covariances are derived from Gaussian Process Motion Planning (GPMP): sparse, task-centric key states or their associated timings (or both) are treated as noisy observations to produce a prior instance. We first generalize the standard diffusion process to biased, non-isotropic corruption with closed-form forward and posterior expressions. Building on this, our hierarchy separates prior instantiation from trajectory denoising: the upper level instantiates a task-conditioned structured Gaussian (mean and covariance), and the lower level denoises the full trajectory under that fixed prior. Experiments on Maze2D goal-reaching and KUKA block stacking show improved success rates, smoother trajectories, and stronger task alignment compared to isotropic baselines. Ablation studies indicate that explicitly structuring the corruption process offers benefits beyond simply conditioning the neural network. Overall, our method concentrates probability mass of prior near feasible, smooth, and semantically meaningful trajectories while maintaining tractability. Our project page is available at https://hta-diffusion.github.io.
dVLA: Diffusion Vision-Language-Action Model with Multimodal Chain-of-Thought
Vision-Language-Action (VLA) models are emerging as a next-generation paradigm for robotics. We introduce dVLA, a diffusion-based VLA that leverages a multimodal chain-of-thought to unify visual perception, language reasoning, and robotic control in a single system. dVLA jointly optimizes perception, language understanding, and action under a single diffusion objective, enabling stronger cross-modal reasoning and better generalization to novel instructions and objects. For practical deployment, we mitigate inference latency by incorporating two acceleration strategies, a prefix attention mask and KV caching, yielding up to around times speedup at test-time inference. We evaluate dVLA in both simulation and the real world: on the LIBERO benchmark, it achieves state-of-the-art performance with a 96.4% average success rate, consistently surpassing both discrete and continuous action policies; on a real Franka robot, it succeeds across a diverse task suite, including a challenging bin-picking task that requires multi-step planning, demonstrating robust real-world performance. Together, these results underscore the promise of unified diffusion frameworks for practical, high-performance VLA robotics.
comment: technique report
Field Calibration of Hyperspectral Cameras for Terrain Inference
Intra-class terrain differences such as water content directly influence a vehicle's ability to traverse terrain, yet RGB vision systems may fail to distinguish these properties. Evaluating a terrain's spectral content beyond red-green-blue wavelengths to the near infrared spectrum provides useful information for intra-class identification. However, accurate analysis of this spectral information is highly dependent on ambient illumination. We demonstrate a system architecture to collect and register multi-wavelength, hyperspectral images from a mobile robot and describe an approach to reflectance calibrate cameras under varying illumination conditions. To showcase the practical applications of our system, HYPER DRIVE, we demonstrate the ability to calculate vegetative health indices and soil moisture content from a mobile robot platform.
comment: Accepted to IEEE Robotics & Automation Letters
DiSA-IQL: Offline Reinforcement Learning for Robust Soft Robot Control under Distribution Shifts
Soft snake robots offer remarkable flexibility and adaptability in complex environments, yet their control remains challenging due to highly nonlinear dynamics. Existing model-based and bio-inspired controllers rely on simplified assumptions that limit performance. Deep reinforcement learning (DRL) has recently emerged as a promising alternative, but online training is often impractical because of costly and potentially damaging real-world interactions. Offline RL provides a safer option by leveraging pre-collected datasets, but it suffers from distribution shift, which degrades generalization to unseen scenarios. To overcome this challenge, we propose DiSA-IQL (Distribution-Shift-Aware Implicit Q-Learning), an extension of IQL that incorporates robustness modulation by penalizing unreliable state-action pairs to mitigate distribution shift. We evaluate DiSA-IQL on goal-reaching tasks across two settings: in-distribution and out-of-distribution evaluation. Simulation results show that DiSA-IQL consistently outperforms baseline models, including Behavior Cloning (BC), Conservative Q-Learning (CQL), and vanilla IQL, achieving higher success rates, smoother trajectories, and improved robustness. The codes are open-sourced to support reproducibility and to facilitate further research in offline RL for soft robot control.
Learning Human Reaching Optimality Principles from Minimal Observation Inverse Reinforcement Learning
This paper investigates the application of Minimal Observation Inverse Reinforcement Learning (MO-IRL) to model and predict human arm-reaching movements with time-varying cost weights. Using a planar two-link biomechanical model and high-resolution motion-capture data from subjects performing a pointing task, we segment each trajectory into multiple phases and learn phase-specific combinations of seven candidate cost functions. MO-IRL iteratively refines cost weights by scaling observed and generated trajectories in the maximum entropy IRL formulation, greatly reducing the number of required demonstrations and convergence time compared to classical IRL approaches. Training on ten trials per posture yields average joint-angle Root Mean Squared Errors (RMSE) of 6.4 deg and 5.6 deg for six- and eight-segment weight divisions, respectively, versus 10.4 deg using a single static weight. Cross-validation on remaining trials and, for the first time, inter-subject validation on an unseen subject's 20 trials, demonstrates comparable predictive accuracy, around 8 deg RMSE, indicating robust generalization. Learned weights emphasize joint acceleration minimization during movement onset and termination, aligning with smoothness principles observed in biological motion. These results suggest that MO-IRL can efficiently uncover dynamic, subject-independent cost structures underlying human motor control, with potential applications for humanoid robots.
comment: 8 pages, 4 figures
BC-MPPI: A Probabilistic Constraint Layer for Safe Model-Predictive Path-Integral Control
Model Predictive Path Integral (MPPI) control has recently emerged as a fast, gradient-free alternative to model-predictive control in highly non-linear robotic tasks, yet it offers no hard guarantees on constraint satisfaction. We introduce Bayesian-Constraints MPPI (BC-MPPI), a lightweight safety layer that attaches a probabilistic surrogate to every state and input constraint. At each re-planning step the surrogate returns the probability that a candidate trajectory is feasible; this joint probability scales the weight given to a candidate, automatically down-weighting rollouts likely to collide or exceed limits and pushing the sampling distribution toward the safe subset; no hand-tuned penalty costs or explicit sample rejection required. We train the surrogate from 1000 offline simulations and deploy the controller on a quadrotor in MuJoCo with both static and moving obstacles. Across K in [100,1500] rollouts BC-MPPI preserves safety margins while satisfying the prescribed probability of violation. Because the surrogate is a stand-alone, version-controlled artefact and the runtime safety score is a single scalar, the approach integrates naturally with verification-and-validation pipelines for certifiable autonomous systems.
A Hierarchical Agentic Framework for Autonomous Drone-Based Visual Inspection
Autonomous inspection systems are essential for ensuring the performance and longevity of industrial assets. Recently, agentic frameworks have demonstrated significant potential for automating inspection workflows but have been limited to digital tasks. Their application to physical assets in real-world environments, however, remains underexplored. In this work, our contributions are two-fold: first, we propose a hierarchical agentic framework for autonomous drone control, and second, a reasoning methodology for individual function executions which we refer to as ReActEval. Our framework focuses on visual inspection tasks in indoor industrial settings, such as interpreting industrial readouts or inspecting equipment. It employs a multi-agent system comprising a head agent and multiple worker agents, each controlling a single drone. The head agent performs high-level planning and evaluates outcomes, while worker agents implement ReActEval to reason over and execute low-level actions. Operating entirely in natural language, ReActEval follows a plan, reason, act, evaluate cycle, enabling drones to handle tasks ranging from simple navigation (e.g., flying forward 10 meters and land) to complex high-level tasks (e.g., locating and reading a pressure gauge). The evaluation phase serves as a feedback and/or replanning stage, ensuring actions align with user objectives while preventing undesirable outcomes. We evaluate the framework in a simulated environment with two worker agents, assessing performance qualitatively and quantitatively based on task completion across varying complexity levels and workflow efficiency. By leveraging natural language processing for agent communication, our approach offers a novel, flexible, and user-accessible alternative to traditional drone-based solutions, enabling autonomous problem-solving for industrial inspection without extensive user intervention.
TGPO: Temporal Grounded Policy Optimization for Signal Temporal Logic Tasks
Learning control policies for complex, long-horizon tasks is a central challenge in robotics and autonomous systems. Signal Temporal Logic (STL) offers a powerful and expressive language for specifying such tasks, but its non-Markovian nature and inherent sparse reward make it difficult to be solved via standard Reinforcement Learning (RL) algorithms. Prior RL approaches focus only on limited STL fragments or use STL robustness scores as sparse terminal rewards. In this paper, we propose TGPO, Temporal Grounded Policy Optimization, to solve general STL tasks. TGPO decomposes STL into timed subgoals and invariant constraints and provides a hierarchical framework to tackle the problem. The high-level component of TGPO proposes concrete time allocations for these subgoals, and the low-level time-conditioned policy learns to achieve the sequenced subgoals using a dense, stage-wise reward signal. During inference, we sample various time allocations and select the most promising assignment for the policy network to rollout the solution trajectory. To foster efficient policy learning for complex STL with multiple subgoals, we leverage the learned critic to guide the high-level temporal search via Metropolis-Hastings sampling, focusing exploration on temporally feasible solutions. We conduct experiments on five environments, ranging from low-dimensional navigation to manipulation, drone, and quadrupedal locomotion. Under a wide range of STL tasks, TGPO significantly outperforms state-of-the-art baselines (especially for high-dimensional and long-horizon cases), with an average of 31.6% improvement in task success rate compared to the best baseline. The code will be available at https://github.com/mengyuest/TGPO
Robust Attitude Control of Nonlinear Multi-Rotor Dynamics with LFT Models and $\mathcal{H}_\infty$ Performance
Attitude stabilization of unmanned aerial vehicles in uncertain environments presents significant challenges due to nonlinear dynamics, parameter variations, and sensor limitations. This paper presents a comparative study of $\mathcal{H}_\infty$ and classical PID controllers for multi-rotor attitude regulation in the presence of wind disturbances and gyroscope noise. The flight dynamics are modeled using a linear parameter-varying (LPV) framework, where nonlinearities and parameter variations are systematically represented as structured uncertainties within a linear fractional transformation formulation. A robust controller based on $\mathcal{H}_\infty$ formulation is designed using only gyroscope measurements to ensure guaranteed performance bounds. Nonlinear simulation results demonstrate the effectiveness of the robust controllers compared to classical PID control, showing significant improvement in attitude regulation under severe wind disturbances.
comment: 6 pages, 6 figures, 3 tables, submitted to ACC 2026
A Novel Robust Control Method Combining DNN-Based NMPC Approximation and PI Control: Application to Exoskeleton Squat Movements
Nonlinear Model Predictive Control (NMPC) is a precise controller, but its heavy computational load often prevents application in robotic systems. Some studies have attempted to approximate NMPC using deep neural networks (NMPC-DNN). However, in the presence of unexpected disturbances or when operating conditions differ from training data, this approach lacks robustness, leading to large tracking errors. To address this issue, for the first time, the NMPC-DNN output is combined with a PI controller (Hybrid NMPC-DNN-PI). The proposed controller is validated by applying it to an exoskeleton robot during squat movement, which has a complex dynamic model and has received limited attention regarding robust nonlinear control design. A human-robot dynamic model with three active joints (ankle, knee, hip) is developed, and more than 5.3 million training samples are used to train the DNN. The results show that, under unseen conditions for the DNN, the tracking error in Hybrid NMPC-DNN-PI is significantly lower compared to NMPC-DNN. Moreover, human joint torques are greatly reduced with the use of the exoskeleton, with RMS values for the studied case reduced by 30.9%, 41.8%, and 29.7% at the ankle, knee, and hip, respectively. In addition, the computational cost of Hybrid NMPC-DNN-PI is 99.93% lower than that of NMPC.
A Systematic Study of Large Language Models for Task and Motion Planning With PDDLStream
Using large language models (LLMs) to solve complex robotics problems requires understanding their planning capabilities. Yet while we know that LLMs can plan on some problems, the extent to which these planning capabilities cover the space of robotics tasks is unclear. One promising direction is to integrate the semantic knowledge of LLMs with the formal reasoning of task and motion planning (TAMP). However, the myriad of choices for how to integrate LLMs within TAMP complicates the design of such systems. We develop 16 algorithms that use Gemini 2.5 Flash to substitute key TAMP components. Our zero-shot experiments across 4,950 problems and three domains reveal that the Gemini-based planners exhibit lower success rates and higher planning times than their engineered counterparts. We show that providing geometric details increases the number of task-planning errors compared to pure PDDL descriptions, and that (faster) non-reasoning LLM variants outperform (slower) reasoning variants in most cases, since the TAMP system can direct the LLM to correct its mistakes.
Drones that Think on their Feet: Sudden Landing Decisions with Embodied AI
Autonomous drones must often respond to sudden events, such as alarms, faults, or unexpected changes in their environment, that require immediate and adaptive decision-making. Traditional approaches rely on safety engineers hand-coding large sets of recovery rules, but this strategy cannot anticipate the vast range of real-world contingencies and quickly becomes incomplete. Recent advances in embodied AI, powered by large visual language models, provide commonsense reasoning to assess context and generate appropriate actions in real time. We demonstrate this capability in a simulated urban benchmark in the Unreal Engine, where drones dynamically interpret their surroundings and decide on sudden maneuvers for safe landings. Our results show that embodied AI makes possible a new class of adaptive recovery and decision-making pipelines that were previously infeasible to design by hand, advancing resilience and safety in autonomous aerial systems.
RoboPilot: Generalizable Dynamic Robotic Manipulation with Dual-thinking Modes
Despite rapid progress in autonomous robotics, executing complex or long-horizon tasks remains a fundamental challenge. Most current approaches follow an open-loop paradigm with limited reasoning and no feedback, resulting in poor robustness to environmental changes and severe error accumulation. We present RoboPilot, a dual-thinking closed-loop framework for robotic manipulation that supports adaptive reasoning for complex tasks in real-world dynamic environments. RoboPilot leverages primitive actions for structured task planning and flexible action generation, while introducing feedback to enable replanning from dynamic changes and execution errors. Chain-of-Thought reasoning further enhances high-level task planning and guides low-level action generation. The system dynamically switches between fast and slow thinking to balance efficiency and accuracy. To systematically evaluate the robustness of RoboPilot in diverse robot manipulation scenarios, we introduce RoboPilot-Bench, a benchmark spanning 21 tasks across 10 categories, including infeasible-task recognition and failure recovery. Experiments show that RoboPilot outperforms state-of-the-art baselines by 25.9\% in task success rate, and the real-world deployment on an industrial robot further demonstrates its robustness in real-world settings.
The Formation of Trust in Autonomous Vehicles after Interacting with Robotaxis on Public Roads
This study investigates how pedestrian trust, receptivity, and behavior evolve during interactions with Level-4 autonomous vehicles (AVs) at uncontrolled urban intersections in a naturalistic setting. While public acceptance is critical for AV adoption, most prior studies relied on simplified simulations or field tests. We conducted a real-world experiment in a commercial Robotaxi operation zone, where 33 participants repeatedly crossed an uncontrolled intersection with frequent Level-4 Robotaxi traffic. Participants completed the Pedestrian Behavior Questionnaire (PBQ), Pedestrian Receptivity Questionnaire for Fully AVs (PRQF), pre- and post-experiment Trust in AVs Scale, and Personal Innovativeness Scale (PIS). Results showed that trust in AVs significantly increased post-experiment, with the increase positively associated with the Interaction component of PRQF. Additionally, both the Positive and Error subscales of the PBQ significantly influenced trust change. This study reveals how trust forms in real-world pedestrian-AV encounters, offering insights beyond lab-based research by accounting for population heterogeneity.
comment: Proceedings of the 69th HFES International Annual Meeting
Sequence Pathfinder for Multi-Agent Pickup and Delivery in the Warehouse
Multi-Agent Pickup and Delivery (MAPD) is a challenging extension of Multi-Agent Path Finding (MAPF), where agents are required to sequentially complete tasks with fixed-location pickup and delivery demands. Although learning-based methods have made progress in MAPD, they often perform poorly in warehouse-like environments with narrow pathways and long corridors when relying only on local observations for distributed decision-making. Communication learning can alleviate the lack of global information but introduce high computational complexity due to point-to-point communication. To address this challenge, we formulate MAPF as a sequence modeling problem and prove that path-finding policies under sequence modeling possess order-invariant optimality, ensuring its effectiveness in MAPD. Building on this, we propose the Sequential Pathfinder (SePar), which leverages the Transformer paradigm to achieve implicit information exchange, reducing decision-making complexity from exponential to linear while maintaining efficiency and global awareness. Experiments demonstrate that SePar consistently outperforms existing learning-based methods across various MAPF tasks and their variants, and generalizes well to unseen environments. Furthermore, we highlight the necessity of integrating imitation learning in complex maps like warehouses.
comment: Preprint Under Review
Visual-auditory Extrinsic Contact Estimation
Robust manipulation often hinges on a robot's ability to perceive extrinsic contacts-contacts between a grasped object and its surrounding environment. However, these contacts are difficult to observe through vision alone due to occlusions, limited resolution, and ambiguous near-contact states. In this paper, we propose a visual-auditory method for extrinsic contact estimation that integrates global scene information from vision with local contact cues obtained through active audio sensing. Our approach equips a robotic gripper with contact microphones and conduction speakers, enabling the system to emit and receive acoustic signals through the grasped object to detect external contacts. We train our perception pipeline entirely in simulation and zero-shot transfer to the real world. To bridge the sim-to-real gap, we introduce a real-to-sim audio hallucination technique, injecting real-world audio samples into simulated scenes with ground-truth contact labels. The resulting multimodal model accurately estimates both the location and size of extrinsic contacts across a range of cluttered and occluded scenarios. We further demonstrate that explicit contact prediction significantly improves policy learning for downstream contact-rich manipulation tasks.
comment: 8 pages, 7 figures
AuDeRe: Automated Strategy Decision and Realization in Robot Planning and Control via LLMs
Recent advancements in large language models (LLMs) have shown significant promise in various domains, especially robotics. However, most prior LLM-based work in robotic applications either directly predicts waypoints or applies LLMs within fixed tool integration frameworks, offering limited flexibility in exploring and configuring solutions best suited to different tasks. In this work, we propose a framework that leverages LLMs to select appropriate planning and control strategies based on task descriptions, environmental constraints, and system dynamics. These strategies are then executed by calling the available comprehensive planning and control APIs. Our approach employs iterative LLM-based reasoning with performance feedback to refine the algorithm selection. We validate our approach through extensive experiments across tasks of varying complexity, from simple tracking to complex planning scenarios involving spatiotemporal constraints. The results demonstrate that using LLMs to determine planning and control strategies from natural language descriptions significantly enhances robotic autonomy while reducing the need for extensive manual tuning and expert knowledge. Furthermore, our framework maintains generalizability across different tasks and notably outperforms baseline methods that rely on LLMs for direct trajectory, control sequence, or code generation.
comment: 8 pages, 14 figures, submitted to the 2026 American Control Conference
Robot Conga: A Leader-Follower Walking Approach to Sequential Path Following in Multi-Agent Systems
Coordinated path following in multi-agent systems is a key challenge in robotics, with applications in automated logistics, surveillance, and collaborative exploration. Traditional formation control techniques often rely on time-parameterized trajectories and path integrals, which can result in synchronization issues and rigid behavior. In this work, we address the problem of sequential path following, where agents maintain fixed spatial separation along a common trajectory, guided by a leader under centralized control. We introduce Robot Conga, a leader-follower control strategy that updates each agent's desired state based on the leader's spatial displacement rather than time, assuming access to a global position reference, an assumption valid in indoor environments equipped with motion capture, vision-based tracking, or UWB localization systems. The algorithm was validated in simulation using both TurtleBot3 and quadruped (Laikago) robots. Results demonstrate accurate trajectory tracking, stable inter-agent spacing, and fast convergence, with all agents aligning within 250 time steps (approx. 0.25 seconds) in the quadruped case, and almost instantaneously in the TurtleBot3 implementation.
comment: 6 Pages, 8 Figures. Both authors have contributed equally
Apple: Toward General Active Perception via Reinforcement Learning
Active perception is a fundamental skill that enables us humans to deal with uncertainty in our inherently partially observable environment. For senses such as touch, where the information is sparse and local, active perception becomes crucial. In recent years, active perception has emerged as an important research domain in robotics. However, current methods are often bound to specific tasks or make strong assumptions, which limit their generality. To address this gap, this work introduces APPLE (Active Perception Policy Learning) - a novel framework that leverages reinforcement learning (RL) to address a range of different active perception problems. APPLE jointly trains a transformer-based perception module and decision-making policy with a unified optimization objective, learning how to actively gather information. By design, APPLE is not limited to a specific task and can, in principle, be applied to a wide range of active perception problems. We evaluate two variants of APPLE across different tasks, including tactile exploration problems from the Tactile MNIST benchmark. Experiments demonstrate the efficacy of APPLE, achieving high accuracies on both regression and classification tasks. These findings underscore the potential of APPLE as a versatile and general framework for advancing active perception in robotics.
comment: 16 pages; 13 figures Under Review
Find the Fruit: Zero-Shot Sim2Real RL for Occlusion-Aware Plant Manipulation
Autonomous harvesting in the open presents a complex manipulation problem. In most scenarios, an autonomous system has to deal with significant occlusion and require interaction in the presence of large structural uncertainties (every plant is different). Perceptual and modeling uncertainty make design of reliable manipulation controllers for harvesting challenging, resulting in poor performance during deployment. We present a sim2real reinforcement learning (RL) framework for occlusion-aware plant manipulation, where a policy is learned entirely in simulation to reposition stems and leaves to reveal target fruit(s). In our proposed approach, we decouple high-level kinematic planning from low-level compliant control which simplifies the sim2real transfer. This decomposition allows the learned policy to generalize across multiple plants with different stiffness and morphology. In experiments with multiple real-world plant setups, our system achieves up to 86.7% success in exposing target fruits, demonstrating robustness to occlusion variation and structural uncertainty.
comment: 9 Pages, 3 Figures, 1 Table
DreamControl: Human-Inspired Whole-Body Humanoid Control for Scene Interaction via Guided Diffusion
We introduce DreamControl, a novel methodology for learning autonomous whole-body humanoid skills. DreamControl leverages the strengths of diffusion models and Reinforcement Learning (RL): our core innovation is the use of a diffusion prior trained on human motion data, which subsequently guides an RL policy in simulation to complete specific tasks of interest (e.g., opening a drawer or picking up an object). We demonstrate that this human motion-informed prior allows RL to discover solutions unattainable by direct RL, and that diffusion models inherently promote natural looking motions, aiding in sim-to-real transfer. We validate DreamControl's effectiveness on a Unitree G1 robot across a diverse set of challenging tasks involving simultaneous lower and upper body control and object interaction. Project website at https://genrobo.github.io/DreamControl/
comment: https://genrobo.github.io/DreamControl/ (under submission)
Multi Layered Autonomy and AI Ecologies in Robotic Art Installations
This paper presents Symbiosis of Agents, is a large-scale installation by Baoyang Chen (baoyangchen.com), that embeds AI-driven robots in an immersive, mirror-lined arena, probing the tension between machine agency and artistic authorship. Drawing on early cybernetics, rule-based conceptual art, and seminal robotic works, it orchestrates fluid exchanges among robotic arms, quadruped machines, their environment, and the public. A three tier faith system pilots the ecology: micro-level adaptive tactics, meso-level narrative drives, and a macro-level prime directive. This hierarchy lets behaviors evolve organically in response to environmental cues and even a viewer's breath, turning spectators into co-authors of the unfolding drama. Framed by a speculative terraforming scenario that recalls the historical exploitation of marginalized labor, the piece asks who bears responsibility in AI-mediated futures. Choreographed motion, AI-generated scripts, reactive lighting, and drifting fog cast the robots as collaborators rather than tools, forging a living, emergent artwork. Exhibited internationally, Symbiosis of Agents shows how cybernetic feedback, robotic experimentation, and conceptual rule-making can converge to redefine agency, authorship, and ethics in contemporary art.
Ocean Diviner: A Diffusion-Augmented Reinforcement Learning Framework for AUV Robust Control in Underwater Tasks
Autonomous Underwater Vehicles (AUVs) are essential for marine exploration, yet their control remains highly challenging due to nonlinear dynamics and uncertain environmental disturbances. This paper presents a diffusion-augmented Reinforcement Learning (RL) framework for robust AUV control, aiming to improve AUV's adaptability in dynamic underwater environments. The proposed framework integrates two core innovations: (1) A diffusion-based action generation framework that produces physically feasible and high-quality actions, enhanced by a high-dimensional state encoding mechanism combining current observations with historical states and actions through a novel diffusion U-Net architecture, significantly improving long-horizon planning capacity for robust control. (2) A sample-efficient hybrid learning architecture that synergizes diffusion-guided exploration with RL policy optimization, where the diffusion model generates diverse candidate actions and the RL critic selects the optimal action, achieving higher exploration efficiency and policy stability in dynamic underwater environments. Extensive simulation experiments validate the framework's superior robustness and flexibility, outperforming conventional control methods in challenging marine conditions, offering enhanced adaptability and reliability for AUV operations in underwater tasks. Finally, we will release the code publicly soon to support future research in this area.
comment: Jingzehua Xu, Guanwen Xie and Weiyi Liu contributed equally to this work
LiDAR-BIND-T: Improved and Temporally Consistent Sensor Modality Translation and Fusion for Robotic Applications
This paper extends LiDAR-BIND, a modular multi-modal fusion framework that binds heterogeneous sensors (radar, sonar) to a LiDAR-defined latent space, with mechanisms that explicitly enforce temporal consistency. We introduce three contributions: (i) temporal embedding similarity that aligns consecutive latent representations, (ii) a motion-aligned transformation loss that matches displacement between predictions and ground truth LiDAR, and (iii) windowed temporal fusion using a specialised temporal module. We further update the model architecture to better preserve spatial structure. Evaluations on radar/sonar-to-LiDAR translation demonstrate improved temporal and spatial coherence, yielding lower absolute trajectory error and better occupancy map accuracy in Cartographer-based SLAM (Simultaneous Localisation and Mapping). We propose different metrics based on the Fr\'echet Video Motion Distance (FVMD) and a correlation-peak distance metric providing practical temporal quality indicators to evaluate SLAM performance. The proposed temporal LiDAR-BIND, or LiDAR-BIND-T, maintains modular modality fusion while substantially enhancing temporal stability, resulting in improved robustness and performance for downstream SLAM.
AgriCruiser: An Open Source Agriculture Robot for Over-the-row Navigation
We present the AgriCruiser, an open-source over-the-row agricultural robot developed for low-cost deployment and rapid adaptation across diverse crops and row layouts. The chassis provides an adjustable track width of 1.42 m to 1.57 m, along with a ground clearance of 0.94 m. The AgriCruiser achieves compact pivot turns with radii of 0.71 m to 0.79 m, enabling efficient headland maneuvers. The platform is designed for the integration of the other subsystems, and in this study, a precision spraying system was implemented to assess its effectiveness in weed management. In twelve flax plots, a single robotic spray pass reduced total weed populations (pigweed and Venice mallow) by 24- to 42-fold compared to manual weeding in four flax plots, while also causing less crop damage. Mobility experiments conducted on concrete, asphalt, gravel, grass, and both wet and dry soil confirmed reliable traversal consistent with torque sizing. The complete chassis can be constructed from commodity T-slot extrusion with minimal machining, resulting in a bill of materials costing approximately $5,000 - $6,000, which enables replication and customization. The mentioned results demonstrate that low-cost, reconfigurable over-the-row robots can achieve effective weed management with reduced crop damage and labor requirements, while providing a versatile foundation for phenotyping, sensing, and other agriculture applications. Design files and implementation details are released to accelerate research and adoption of modular agricultural robotics.
comment: GitHub: https://github.com/structuresComp/agri-cruiser
Multi-Robot Task Planning for Multi-Object Retrieval Tasks with Distributed On-Site Knowledge via Large Language Models
It is crucial to efficiently execute instructions such as "Find an apple and a banana" or "Get ready for a field trip," which require searching for multiple objects or understanding context-dependent commands. This study addresses the challenging problem of determining which robot should be assigned to which part of a task when each robot possesses different situational on-site knowledge-specifically, spatial concepts learned from the area designated to it by the user. We propose a task planning framework that leverages large language models (LLMs) and spatial concepts to decompose natural language instructions into subtasks and allocate them to multiple robots. We designed a novel few-shot prompting strategy that enables LLMs to infer required objects from ambiguous commands and decompose them into appropriate subtasks. In our experiments, the proposed method achieved 47/50 successful assignments, outperforming random (28/50) and commonsense-based assignment (26/50). Furthermore, we conducted qualitative evaluations using two actual mobile manipulators. The results demonstrated that our framework could handle instructions, including those involving ad hoc categories such as "Get ready for a field trip," by successfully performing task decomposition, assignment, sequential planning, and execution.
comment: Submitted to AROB-ISBC 2026 (Journal Track option)
SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions
Humans continuously infer the states, goals, and behaviors of others by perceiving their surroundings in dynamic, real-world social interactions. However, most Theory of Mind (ToM) benchmarks only evaluate static, text-based scenarios, which have a significant gap compared to real interactions. We propose the SoMi-ToM benchmark, designed to evaluate multi-perspective ToM in embodied multi-agent complex social interactions. This benchmark is based on rich multimodal interaction data generated by the interaction environment SoMi, covering diverse crafting goals and social relationships. Our framework supports multi-level evaluation: (1) first-person evaluation provides multimodal (visual, dialogue, action, etc.) input from a first-person perspective during a task for real-time state inference, (2) third-person evaluation provides complete third-person perspective video and text records after a task for goal and behavior inference. This evaluation method allows for a more comprehensive examination of a model's ToM capabilities from both the subjective immediate experience and the objective global observation. We constructed a challenging dataset containing 35 third-person perspective videos, 363 first-person perspective images, and 1225 expert-annotated multiple-choice questions (three options). On this dataset, we systematically evaluated the performance of human subjects and several state-of-the-art large vision-language models (LVLMs). The results show that LVLMs perform significantly worse than humans on SoMi-ToM: the average accuracy gap between humans and models is 40.1% in first-person evaluation and 26.4% in third-person evaluation. This indicates that future LVLMs need to further improve their ToM capabilities in embodied, complex social interactions.
comment: 24 pages, 6 figures
ADPro: a Test-time Adaptive Diffusion Policy via Manifold-constrained Denoising and Task-aware Initialization for Robotic Manipulation
Diffusion policies have recently emerged as a powerful class of visuomotor controllers for robot manipulation, offering stable training and expressive multi-modal action modeling. However, existing approaches typically treat action generation as an unconstrained denoising process, ignoring valuable a priori knowledge about geometry and control structure. In this work, we propose the Adaptive Diffusion Policy (ADP), a test-time adaptation method that introduces two key inductive biases into the diffusion. First, we embed a geometric manifold constraint that aligns denoising updates with task-relevant subspaces, leveraging the fact that the relative pose between the end-effector and target scene provides a natural gradient direction, and guiding denoising along the geodesic path of the manipulation manifold. Then, to reduce unnecessary exploration and accelerate convergence, we propose an analytically guided initialization: rather than sampling from an uninformative prior, we compute a rough registration between the gripper and target scenes to propose a structured initial noisy action. ADP is compatible with pre-trained diffusion policies and requires no retraining, enabling test-time adaptation that tailors the policy to specific tasks, thereby enhancing generalization across novel tasks and environments. Experiments on RLBench, CALVIN, and real-world datasets show that ADPro, an implementation of ADP, improves success rates, generalization, and sampling efficiency, achieving up to 25% faster execution and 9% points over strong diffusion baselines.
LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving
Recent advancements in vision foundation models (VFMs) have revolutionized visual perception in 2D, yet their potential for 3D scene understanding, particularly in autonomous driving applications, remains underexplored. In this paper, we introduce LargeAD, a versatile and scalable framework designed for large-scale 3D pretraining across diverse real-world driving datasets. Our framework leverages VFMs to extract semantically rich superpixels from 2D images, which are aligned with LiDAR point clouds to generate high-quality contrastive samples. This alignment facilitates cross-modal representation learning, enhancing the semantic consistency between 2D and 3D data. We introduce several key innovations: (i) VFM-driven superpixel generation for detailed semantic representation, (ii) a VFM-assisted contrastive learning strategy to align multimodal features, (iii) superpoint temporal consistency to maintain stable representations across time, and (iv) multi-source data pretraining to generalize across various LiDAR configurations. Our approach achieves substantial gains over state-of-the-art methods in linear probing and fine-tuning for LiDAR-based segmentation and object detection. Extensive experiments on 11 large-scale multi-sensor datasets highlight our superior performance, demonstrating adaptability, efficiency, and robustness in real-world autonomous driving scenarios.
comment: IEEE TPAMI 2025; 17 pages, 9 figures, 11 tables; Project Page at https://ldkong.com/LargeAD
OrthoLoC: UAV 6-DoF Localization and Calibration Using Orthographic Geodata NeurIPS 2025
Accurate visual localization from aerial views is a fundamental problem with applications in mapping, large-area inspection, and search-and-rescue operations. In many scenarios, these systems require high-precision localization while operating with limited resources (e.g., no internet connection or GNSS/GPS support), making large image databases or heavy 3D models impractical. Surprisingly, little attention has been given to leveraging orthographic geodata as an alternative paradigm, which is lightweight and increasingly available through free releases by governmental authorities (e.g., the European Union). To fill this gap, we propose OrthoLoC, the first large-scale dataset comprising 16,425 UAV images from Germany and the United States with multiple modalities. The dataset addresses domain shifts between UAV imagery and geospatial data. Its paired structure enables fair benchmarking of existing solutions by decoupling image retrieval from feature matching, allowing isolated evaluation of localization and calibration performance. Through comprehensive evaluation, we examine the impact of domain shifts, data resolutions, and covisibility on localization accuracy. Finally, we introduce a refinement technique called AdHoP, which can be integrated with any feature matcher, improving matching by up to 95% and reducing translation error by up to 63%. The dataset and code are available at: https://deepscenario.github.io/OrthoLoC.
comment: Accepted at NeurIPS 2025
Simulated Annealing for Multi-Robot Ergodic Information Acquisition Using Graph-Based Discretization
One of the goals of active information acquisition using multi-robot teams is to keep the relative uncertainty in each region at the same level to maintain identical acquisition quality (e.g., consistent target detection) in all the regions. To achieve this goal, ergodic coverage can be used to assign the number of samples according to the quality of observation, i.e., sampling noise levels. However, the noise levels are unknown to the robots. Although this noise can be estimated from samples, the estimates are unreliable at first and can generate fluctuating values. The main contribution of this paper is to use simulated annealing to generate the target sampling distribution, starting from uniform and gradually shifting to an estimated optimal distribution, by varying the coldness parameter of a Boltzmann distribution with the estimated sampling entropy as energy. Simulation results show a substantial improvement of both transient and asymptotic entropy compared to both uniform and direct-ergodic searches. Finally, a demonstration is performed with a TurtleBot swarm system to validate the physical applicability of the algorithm.
Towards autonomous photogrammetric forest inventory using a lightweight under-canopy robotic drone
Drones are increasingly used in forestry to capture high-resolution remote sensing data, supporting enhanced monitoring, assessment, and decision-making processes. While operations above the forest canopy are already highly automated, flying inside forests remains challenging, primarily relying on manual piloting. In dense forests, relying on the Global Navigation Satellite System (GNSS) for localization is not feasible. In addition, the drone must autonomously adjust its flight path to avoid collisions. Recently, advancements in robotics have enabled autonomous drone flights in GNSS-denied obstacle-rich areas. In this article, a step towards autonomous forest data collection is taken by building a prototype of a robotic under-canopy drone utilizing state-of-the-art open source methods and validating its performance for data collection inside forests. Specifically, the study focused on camera-based autonomous flight under the forest canopy and photogrammetric post-processing of the data collected with the low-cost onboard stereo camera. The autonomous flight capability of the prototype was evaluated through multiple test flights in boreal forests. The tree parameter estimation capability was studied by performing diameter at breast height (DBH) estimation. The prototype successfully carried out flights in selected challenging forest environments, and the experiments showed promising performance in forest 3D modeling with a miniaturized stereoscopic photogrammetric system. The DBH estimation achieved a root mean square error (RMSE) of 3.33 - 3.97 cm (10.69 - 12.98 %) across all trees. For trees with a DBH less than 30 cm, the RMSE was 1.16 - 2.56 cm (5.74 - 12.47 %). The results provide valuable insights into autonomous under-canopy forest mapping and highlight the critical next steps for advancing lightweight robotic drone systems for mapping complex forest environments.
comment: 35 pages, 11 Figures
Track Any Motions under Any Disturbances
A foundational humanoid motion tracker is expected to be able to track diverse, highly dynamic, and contact-rich motions. More importantly, it needs to operate stably in real-world scenarios against various dynamics disturbances, including terrains, external forces, and physical property changes for general practical use. To achieve this goal, we propose Any2Track (Track Any motions under Any disturbances), a two-stage RL framework to track various motions under multiple disturbances in the real world. Any2Track reformulates dynamics adaptability as an additional capability on top of basic action execution and consists of two key components: AnyTracker and AnyAdapter. AnyTracker is a general motion tracker with a series of careful designs to track various motions within a single policy. AnyAdapter is a history-informed adaptation module that endows the tracker with online dynamics adaptability to overcome the sim2real gap and multiple real-world disturbances. We deploy Any2Track on Unitree G1 hardware and achieve a successful sim2real transfer in a zero-shot manner. Any2Track performs exceptionally well in tracking various motions under multiple real-world disturbances.
Long-Horizon Visual Imitation Learning via Plan and Code Reflection
Learning from long-horizon demonstrations with complex action sequences presents significant challenges for visual imitation learning, particularly in understanding temporal relationships of actions and spatial relationships between objects. In this paper, we propose a new agent framework that incorporates two dedicated reflection modules to enhance both plan and code generation. The plan generation module produces an initial action sequence, which is then verified by the plan reflection module to ensure temporal coherence and spatial alignment with the demonstration video. The code generation module translates the plan into executable code, while the code reflection module verifies and refines the generated code to ensure correctness and consistency with the generated plan. These two reflection modules jointly enable the agent to detect and correct errors in both the plan generation and code generation, improving performance in tasks with intricate temporal and spatial dependencies. To support systematic evaluation, we introduce LongVILBench, a benchmark comprising 300 human demonstrations with action sequences of up to 18 steps. LongVILBench emphasizes temporal and spatial complexity across multiple task types. Experimental results demonstrate that existing methods perform poorly on this benchmark, whereas our new framework establishes a strong baseline for long-horizon visual imitation learning.
comment: 9 pages, 4 figures
WorldGym: World Model as An Environment for Policy Evaluation
Evaluating robot control policies is difficult: real-world testing is costly, and handcrafted simulators require manual effort to improve in realism and generality. We propose a world-model-based policy evaluation environment (WorldGym), an autoregressive, action-conditioned video generation model which serves as a proxy to real world environments. Policies are evaluated via Monte Carlo rollouts in the world model, with a vision-language model providing rewards. We evaluate a set of VLA-based real-robot policies in the world model using only initial frames from real robots, and show that policy success rates within the world model highly correlate with real-world success rates. Moreoever, we show that WorldGym is able to preserve relative policy rankings across different policy versions, sizes, and training checkpoints. Due to requiring only a single start frame as input, the world model further enables efficient evaluation of robot policies' generalization ability on novel tasks and environments. We find that modern VLA-based robot policies still struggle to distinguish object shapes and can become distracted by adversarial facades of objects. While generating highly realistic object interaction remains challenging, WorldGym faithfully emulates robot motions and offers a practical starting point for safe and reproducible policy evaluation before deployment.
comment: https://world-model-eval.github.io
How Safe Will I Be Given What I Saw? Calibrated Prediction of Safety Chances for Image-Controlled Autonomy
Autonomous robots that rely on deep neural network controllers pose critical challenges for safety prediction, especially under partial observability and distribution shift. Traditional model-based verification techniques are limited in scalability and require access to low-dimensional state models, while model-free methods often lack reliability guarantees. This paper addresses these limitations by introducing a framework for calibrated safety prediction in end-to-end vision-controlled systems, where neither the state-transition model nor the observation model is accessible. Building on the foundation of world models, we leverage variational autoencoders and recurrent predictors to forecast future latent trajectories from raw image sequences and estimate the probability of satisfying safety properties. We distinguish between monolithic and composite prediction pipelines and introduce a calibration mechanism to quantify prediction confidence. In long-horizon predictions from high-dimensional observations, the forecasted inputs to the safety evaluator can deviate significantly from the training distribution due to compounding prediction errors and changing environmental conditions, leading to miscalibrated risk estimates. To address this, we incorporate unsupervised domain adaptation to ensure robustness of safety evaluation under distribution shift in predictions without requiring manual labels. Our formulation provides theoretical calibration guarantees and supports practical evaluation across long prediction horizons. Experimental results on three benchmarks show that our UDA-equipped evaluators maintain high accuracy and substantially lower false positive rates under distribution shift. Similarly, world model-based composite predictors outperform their monolithic counterparts on long-horizon tasks, and our conformal calibration provides reliable statistical bounds.
comment: arXiv admin note: text overlap with arXiv:2308.12252
Optimization-based Task and Motion Planning under Signal Temporal Logic Specifications using Logic Network Flow ICRA
This paper proposes an optimization-based task and motion planning framework, named "Logic Network Flow", to integrate signal temporal logic (STL) specifications into efficient mixed-binary linear programmings. In this framework, temporal predicates are encoded as polyhedron constraints on each edge of the network flow, instead of as constraints between the nodes as in the traditional Logic Tree formulation. Synthesized with Dynamic Network Flows, Logic Network Flows render a tighter convex relaxation compared to Logic Trees derived from these STL specifications. Our formulation is evaluated on several multi-robot motion planning case studies. Empirical results demonstrate that our formulation outperforms Logic Tree formulation in terms of computation time for several planning problems. As the problem size scales up, our method still discovers better lower and upper bounds by exploring fewer number of nodes during the branch-and-bound process, although this comes at the cost of increased computational load for each node when exploring branches.
comment: Accepted to IEEE International Conference on Robotics and Automation (ICRA) 2025
Vision-driven River Following of UAV via Safe Reinforcement Learning using Semantic Dynamics Model
Vision-driven autonomous river following by Unmanned Aerial Vehicles is critical for applications such as rescue, surveillance, and environmental monitoring, particularly in dense riverine environments where GPS signals are unreliable. These safety-critical navigation tasks must satisfy hard safety constraints while optimizing performance. Moreover, the reward in river following is inherently history-dependent (non-Markovian) by which river segment has already been visited, making it challenging for standard safe Reinforcement Learning (SafeRL). To address these gaps, we propose three contributions. First, we introduce Marginal Gain Advantage Estimation, which refines the reward advantage function by using a sliding window baseline computed from historical episodic returns, aligning the advantage estimate with non-Markovian dynamics. Second, we develop a Semantic Dynamics Model based on patchified water semantic masks offering more interpretable and data-efficient short-term prediction of future observations compared to latent vision dynamics models. Third, we present the Constrained Actor Dynamics Estimator architecture, which integrates the actor, cost estimator, and SDM for cost advantage estimation to form a model-based SafeRL framework. Simulation results demonstrate that MGAE achieves faster convergence and superior performance over traditional critic-based methods like Generalized Advantage Estimation. SDM provides more accurate short-term state predictions that enable the cost estimator to better predict potential violations. Overall, CADE effectively integrates safety regulation into model-based RL, with the Lagrangian approach providing a "soft" balance between reward and safety during training, while the safety layer enhances inference by imposing a "hard" action overlay.
comment: Submitted to Robotics and Autonomous Systems (RAS) journal
Active Shadowing (ASD): Manipulating Perception of Robotic Behaviors via Implicit Virtual Communication
Explicit communication is often valued for its directness in presenting information but requires attention during exchange, resulting in cognitive interruptions. On the other hand, implicit communication contributes to tacit and smooth interaction, making it more suitable for teaming, but requires inference for interpretation. This paper studies a novel type of implicit visual communication (IVC) using shadows via visual projection with augmented reality, referred to as active shadowing (ASD). Prior IVC methods, such as legible motion, are often used to influence the perception of robot behavior to make it more understandable. They often require changing the physical robot behavior, resulting in suboptimality. In our work, we investigate how ASD can be used to achieve similar effects without losing optimality. Our evaluations with user studies demonstrates that ASD can effectively creates ''illusions'' that maintain optimal physical behavior without compromising its understandability. We also show that ASD can be more informative than other explicit communication methods, and examine the conditions under which ASD becomes less effective.
Multiagent Systems
An Agent-Based Simulation of Ageing Societies: Accessibility and Care Dynamics in Remote Areas
This paper presents an agent-based simulation of accessibility and care dynamics in ageing societies, applied to the Italian inner area of Premeno (VB). The model integrates census and municipal data, drone-derived elevation models, GIS road networks, and survey-based caregiving information to generate synthetic populations of older adults and their caregivers. Agents are organized into dyads with socio-economic and mobility attributes, enabling the simulation of both micro-scale accessibility and meso-scale caregiving outcomes. Two scenarios are compared: a baseline and an alternative involving the relocation of healthcare services. Key indicators include caregiver effort, overwhelmed caregivers, walkability, and unmet hours of care. Findings show that while relocation improves walkability locally, it increases unmet care hours due to detours and reduced proximity. Household income emerges as the primary driver of caregiver burden, with accessibility shaped by interactions between financial and mobility resources. Results highlight the need for interventions tailored to context-specific constraints in remote ageing communities.
comment: Comments: 9 pages, 2 figures, 6 tables; appendix included. Code and processed inputs available on request from the corresponding author
LLM-MCoX: Large Language Model-based Multi-robot Coordinated Exploration and Search
Autonomous exploration and object search in unknown indoor environments remain challenging for multi-robot systems (MRS). Traditional approaches often rely on greedy frontier assignment strategies with limited inter-robot coordination. In this work, we introduce LLM-MCoX (LLM-based Multi-robot Coordinated Exploration and Search), a novel framework that leverages Large Language Models (LLMs) for intelligent coordination of both homogeneous and heterogeneous robot teams tasked with efficient exploration and target object search. Our approach combines real-time LiDAR scan processing for frontier cluster extraction and doorway detection with multimodal LLM reasoning (e.g., GPT-4o) to generate coordinated waypoint assignments based on shared environment maps and robot states. LLM-MCoX demonstrates superior performance compared to existing methods, including greedy and Voronoi-based planners, achieving 22.7% faster exploration times and 50% improved search efficiency in large environments with 6 robots. Notably, LLM-MCoX enables natural language-based object search capabilities, allowing human operators to provide high-level semantic guidance that traditional algorithms cannot interpret.
Towards Human Engagement with Realistic AI Combat Pilots
We present a system that enables real-time interaction between human users and agents trained to control fighter jets in simulated 3D air combat scenarios. The agents are trained in a dedicated environment using Multi-Agent Reinforcement Learning. A communication link is developed to allow seamless deployment of trained agents into VR-Forces, a widely used defense simulation tool for realistic tactical scenarios. This integration allows mixed simulations where human-controlled entities engage with intelligent agents exhibiting distinct combat behaviors. Our interaction model creates new opportunities for human-agent teaming, immersive training, and the exploration of innovative tactics in defense contexts.
comment: 13th International Conference on Human-Agent Interaction (HAI) 2025
OpenID Connect for Agents (OIDC-A) 1.0: A Standard Extension for LLM-Based Agent Identity and Authorization
OpenID Connect for Agents (OIDC-A) 1.0 is an extension to OpenID Connect Core 1.0 that provides a comprehensive framework for representing, authenticating, and authorizing LLM-based agents within the OAuth 2.0 ecosystem. As autonomous AI agents become increasingly prevalent in digital systems, there is a critical need for standardized protocols to establish agent identity, verify agent attestation, represent delegation chains, and enable fine-grained authorization based on agent attributes. This specification defines standard claims, endpoints, and protocols that address these requirements while maintaining compatibility with existing OAuth 2.0 and OpenID Connect infrastructure. The proposed framework introduces mechanisms for agent identity representation, delegation chain validation, attestation verification, and capability-based authorization, providing a foundation for secure and trustworthy agent-to-service interactions in modern distributed systems.
comment: 10 pages, 5 tables, 2 code listings. Specification proposal available at https://github.com/subramanya1997/oidc-a/
Quadratic Programming Approach for Nash Equilibrium Computation in Multiplayer Imperfect-Information Games
There has been significant recent progress in algorithms for approximation of Nash equilibrium in large two-player zero-sum imperfect-information games and exact computation of Nash equilibrium in multiplayer strategic-form games. While counterfactual regret minimization and fictitious play are scalable to large games and have convergence guarantees in two-player zero-sum games, they do not guarantee convergence to Nash equilibrium in multiplayer games. We present an approach for exact computation of Nash equilibrium in multiplayer imperfect-information games that solves a quadratically-constrained program based on a nonlinear complementarity problem formulation from the sequence-form game representation. This approach capitalizes on recent advances for solving nonconvex quadratic programs. Our algorithm is able to quickly solve three-player Kuhn poker after removal of dominated actions. Of the available algorithms in the Gambit software suite, only the logit quantal response approach is successfully able to solve the game; however, the approach takes longer than our algorithm and also involves a degree of approximation. Our formulation also leads to a new approach for computing Nash equilibrium in multiplayer strategic-form games which we demonstrate to outperform a previous quadratically-constrained program formulation.
In-Context Curiosity: Distilling Exploration for Decision-Pretrained Transformers on Bandit Tasks
As large language models (LLMs) continue to grow in capability, there is increasing interest in incorporating them into decision-making tasks. A common pipeline for this is Decision-Pretrained Transformers (DPTs). However, existing training methods for DPTs often struggle to generalize beyond their pretraining data distribution. To explore mitigation of this limitation, we propose in-context curiosity -- a lightweight, exploration-inspired regularizer for offline pretraining -- and introduce the Prediction-Powered Transformer (PPT) framework. PPT augments DPT with an auxiliary reward predictor, using prediction error as an intrinsic curiosity signal to encourage broader exploration during training. In proof-of-concept experiments on Gaussian multi-armed bandits, PPT shows improved robustness: it moderates the performance degradation observed in DPT when test environments exhibit higher variance in reward, particularly when pretraining data has limited diversity. While the quality of offline data remain fundamental, our preliminary results suggest that curiosity-driven pretraining offers a promising direction for enhancing out-of-distribution generalization in in-context RL agents.
LLM-based Multi-Agent Blackboard System for Information Discovery in Data Science
The rapid advancement of Large Language Models (LLMs) has opened new opportunities in data science, yet their practical deployment is often constrained by the challenge of discovering relevant data within large heterogeneous data lakes. Existing methods struggle with this: single-agent systems are quickly overwhelmed by large, heterogeneous files in the large data lakes, while multi-agent systems designed based on a master-slave paradigm depend on a rigid central controller for task allocation that requires precise knowledge of each sub-agent's capabilities. To address these limitations, we propose a novel multi-agent communication paradigm inspired by the blackboard architecture for traditional AI models. In this framework, a central agent posts requests to a shared blackboard, and autonomous subordinate agents -- either responsible for a partition of the data lake or general information retrieval -- volunteer to respond based on their capabilities. This design improves scalability and flexibility by eliminating the need for a central coordinator to have prior knowledge of all sub-agents' expertise. We evaluate our method on three benchmarks that require explicit data discovery: KramaBench and modified versions of DS-Bench and DA-Code to incorporate data discovery. Experimental results demonstrate that the blackboard architecture substantially outperforms baselines, including RAG and the master-slave multi-agent paradigm, achieving between 13% to 57% relative improvement in end-to-end task success and up to a 9% relative gain in F1 score for data discovery over the best-performing baselines across both proprietary and open-source LLMs. Our findings establish the blackboard paradigm as a scalable and generalizable communication framework for multi-agent systems.
Reasoning-Aware Prompt Orchestration: A Foundation Model for Multi-Agent Language Model Coordination
The emergence of large language models has enabled sophisticated multi-agent systems, yet coordinating their reasoning capabilities through prompt engineering remains challenging. We present a theoretically-grounded framework for dynamic prompt orchestration that enhances reasoning across multiple specialized agents. This framework addresses three core challenges: logical consistency preservation during agent transitions, reasoning-aware prompt adaptation, and scalable coordination of distributed inference. Our approach formalizes agent states using prompt templates, reasoning context vectors, and capability matrices. We prove system convergence to stable coordination patterns when step sizes satisfy $\alpha < \frac{1}{2L}$ where $L$ is the Lipschitz constant of the state transition function. We implement this through a distributed architecture that dynamically routes reasoning tasks while maintaining semantic coherence. Experimental results on 1,000 synthetic multi-agent conversations demonstrate a 42% reduction in reasoning latency, a 23% improvement in logical consistency measured by ROUGE-L score, and an 89% success rate for task completion without context loss across agent transitions. Ablation studies identify the consensus mechanism as the primary performance driver, while revealing limitations: performance degrades beyond 10 agent transitions, and the system requires 76.5GB memory for 1,000 concurrent agents. These findings establish a new paradigm for scalable reasoning in multi-agent systems, providing theoretical foundations for understanding reasoning emergence across coordinated language models.
Robust Federated Inference
Federated inference, in the form of one-shot federated learning, edge ensembles, or federated ensembles, has emerged as an attractive solution to combine predictions from multiple models. This paradigm enables each model to remain local and proprietary while a central server queries them and aggregates predictions. Yet, the robustness of federated inference has been largely neglected, leaving them vulnerable to even simple attacks. To address this critical gap, we formalize the problem of robust federated inference and provide the first robustness analysis of this class of methods. Our analysis of averaging-based aggregators shows that the error of the aggregator is small either when the dissimilarity between honest responses is small or the margin between the two most probable classes is large. Moving beyond linear averaging, we show that problem of robust federated inference with non-linear aggregators can be cast as an adversarial machine learning problem. We then introduce an advanced technique using the DeepSet aggregation model, proposing a novel composition of adversarial training and test-time robust aggregation to robustify non-linear aggregators. Our composition yields significant improvements, surpassing existing robust aggregation methods by 4.7 - 22.2% in accuracy points across diverse benchmarks.
MAGIC-MASK: Multi-Agent Guided Inter-Agent Collaboration with Mask-Based Explainability for Reinforcement Learning
Understanding the decision-making process of Deep Reinforcement Learning agents remains a key challenge for deploying these systems in safety-critical and multi-agent environments. While prior explainability methods like StateMask, have advanced the identification of critical states, they remain limited by computational cost, exploration coverage, and lack of adaptation to multi-agent settings. To overcome these limitations, we propose a mathematically grounded framework, MAGIC-MASK (Multi-Agent Guided Inter-agent Collaboration with Mask-Based Explainability for Reinforcement Learning), that extends perturbation-based explanation to Multi-Agent Reinforcement Learning. Our method integrates Proximal Policy Optimization, adaptive epsilon-greedy exploration, and lightweight inter-agent collaboration to share masked state information and peer experience. This collaboration enables each agent to perform saliency-guided masking and share reward-based insights with peers, reducing the time required for critical state discovery, improving explanation fidelity, and leading to faster and more robust learning. The core novelty of our approach lies in generalizing explainability from single-agent to multi-agent systems through a unified mathematical formalism built on trajectory perturbation, reward fidelity analysis, and Kullback-Leibler divergence regularization. This framework yields localized, interpretable explanations grounded in probabilistic modeling and multi-agent Markov decision processes. We validate our framework on both single-agent and multi-agent benchmarks, including a multi-agent highway driving environment and Google Research Football, demonstrating that MAGIC-MASK consistently outperforms state-of-the-art baselines in fidelity, learning efficiency, and policy robustness while offering interpretable and transferable explanations.
comment: 16 pages, 3 figures
A Hierarchical Agentic Framework for Autonomous Drone-Based Visual Inspection
Autonomous inspection systems are essential for ensuring the performance and longevity of industrial assets. Recently, agentic frameworks have demonstrated significant potential for automating inspection workflows but have been limited to digital tasks. Their application to physical assets in real-world environments, however, remains underexplored. In this work, our contributions are two-fold: first, we propose a hierarchical agentic framework for autonomous drone control, and second, a reasoning methodology for individual function executions which we refer to as ReActEval. Our framework focuses on visual inspection tasks in indoor industrial settings, such as interpreting industrial readouts or inspecting equipment. It employs a multi-agent system comprising a head agent and multiple worker agents, each controlling a single drone. The head agent performs high-level planning and evaluates outcomes, while worker agents implement ReActEval to reason over and execute low-level actions. Operating entirely in natural language, ReActEval follows a plan, reason, act, evaluate cycle, enabling drones to handle tasks ranging from simple navigation (e.g., flying forward 10 meters and land) to complex high-level tasks (e.g., locating and reading a pressure gauge). The evaluation phase serves as a feedback and/or replanning stage, ensuring actions align with user objectives while preventing undesirable outcomes. We evaluate the framework in a simulated environment with two worker agents, assessing performance qualitatively and quantitatively based on task completion across varying complexity levels and workflow efficiency. By leveraging natural language processing for agent communication, our approach offers a novel, flexible, and user-accessible alternative to traditional drone-based solutions, enabling autonomous problem-solving for industrial inspection without extensive user intervention.
MADS: Multi-Agent Dialogue Simulation for Diverse Persuasion Data Generation
We propose MADS (Multi-Agent Dialogue Simulation), a scalable framework for generating persuasive multi-turn dialogues via agent self-play. MADS employs three coordinated agents: User Agents simulating diverse persona-driven behaviors, a Dialog Agent executing task-oriented persuasion strategies and an Optimization Agent evaluating and refining dialogue outcomes. We further validate its effectiveness through users' Chain-of-Attitude (CoA) modeling and dedicated LLMs' persuasion assessment. This approach enables low-cost generation of training data without human annotation, addressing key industry challenges such as lack of user data, cold-start evaluation difficulties, and prompt inefficiency. Applied to a real-world marketing scenario, MADS significantly improved the persuasion capacity of small LLMs, increasing the organic traffic conversion rate by 22.4\% (from 1.83\% to 2.24\%) , demonstrating clear business value.
comment: work in progress
MADS: Multi-Agent Dialogue Simulation for Diverse Persuasion Data Generation
We propose MADS (Multi-Agent Dialogue Simulation), a scalable framework for generating persuasive multi-turn dialogues via agent self-play. MADS employs three coordinated agents: User Agents simulating diverse persona-driven behaviors, a Dialog Agent executing task-oriented persuasion strategies and an Optimization Agent evaluating and refining dialogue outcomes. We further validate its effectiveness through users' Chain-of-Attitude (CoA) modeling and dedicated LLMs' persuasion assessment. This approach enables low-cost generation of training data without human annotation, addressing key industry challenges such as lack of user data, cold-start evaluation difficulties, and prompt inefficiency. Applied to a real-world marketing scenario, MADS significantly improved the persuasion capacity of small LLMs, increasing the organic traffic conversion rate by 22.4% (from 1.83% to 2.24%) , demonstrating clear business value.
comment: work in progress
Asymmetric Information Enhanced Mapping Framework for Multirobot Exploration based on Deep Reinforcement Learning
Despite the great development of multirobot technologies, efficiently and collaboratively exploring an unknown environment is still a big challenge. In this paper, we propose AIM-Mapping, a Asymmetric InforMation Enhanced Mapping framework. The framework fully utilizes the privilege information in the training process to help construct the environment representation as well as the supervised signal in an asymmetric actor-critic training framework. Specifically, privilege information is used to evaluate the exploration performance through an asymmetric feature representation module and a mutual information evaluation module. The decision-making network uses the trained feature encoder to extract structure information from the environment and combines it with a topological map constructed based on geometric distance. Utilizing this kind of topological map representation, we employ topological graph matching to assign corresponding boundary points to each robot as long-term goal points. We conduct experiments in real-world-like scenarios using the Gibson simulation environments. It validates that the proposed method, when compared to existing methods, achieves great performance improvement.
Sequence Pathfinder for Multi-Agent Pickup and Delivery in the Warehouse
Multi-Agent Pickup and Delivery (MAPD) is a challenging extension of Multi-Agent Path Finding (MAPF), where agents are required to sequentially complete tasks with fixed-location pickup and delivery demands. Although learning-based methods have made progress in MAPD, they often perform poorly in warehouse-like environments with narrow pathways and long corridors when relying only on local observations for distributed decision-making. Communication learning can alleviate the lack of global information but introduce high computational complexity due to point-to-point communication. To address this challenge, we formulate MAPF as a sequence modeling problem and prove that path-finding policies under sequence modeling possess order-invariant optimality, ensuring its effectiveness in MAPD. Building on this, we propose the Sequential Pathfinder (SePar), which leverages the Transformer paradigm to achieve implicit information exchange, reducing decision-making complexity from exponential to linear while maintaining efficiency and global awareness. Experiments demonstrate that SePar consistently outperforms existing learning-based methods across various MAPF tasks and their variants, and generalizes well to unseen environments. Furthermore, we highlight the necessity of integrating imitation learning in complex maps like warehouses.
comment: Preprint Under Review
Towards Agentic OS: An LLM Agent Framework for Linux Schedulers
Operating system schedulers suffer from a fundamental semantic gap, where kernel policies fail to understand application-specific needs, leading to suboptimal performance. We introduce SchedCP, the first framework that enables fully autonomous Large Language Model (LLM) agents to safely and efficiently optimize Linux schedulers without human involvement. Our core insight is that the challenge is not merely to apply a better LLM, but to architect a decoupled control plane that separates the AI's role of semantic reasoning ("what to optimize") from the system's role of execution ("how to observe and act"), thereby separating the optimization problem into two stages: goal-inference and policy-synthesis. Implemented as Model Context Protocol(MCP) server, SchedCP provides a stable interface with three key services: a Workload Analysis Engine, an evolving Scheduler Policy Repository, and an Execution Verifier that validates all AI-generated code and configure before deployment with static and dynamic analysis. We demonstrate this architecture's power with sched-agent, a multi-agent system that autonomously analyzes workloads, synthesizes custom eBPF scheduling policies, and deploys them via the sched\_ext infrastructure. Our evaluation shows that SchedCP achieves up to an 1.79x performance improvement, and a 13x cost reduction compared to naive agentic approaches, all while maintaining high success rate. By bridging the semantic gap, SchedCP democratizes expert-level system optimization and represents a step towards creating truly self-optimizing, application-aware operating systems. The code is open-sourced in https://github.com/eunomia-bpf/schedcp
Multi-Robot Task Planning for Multi-Object Retrieval Tasks with Distributed On-Site Knowledge via Large Language Models
It is crucial to efficiently execute instructions such as "Find an apple and a banana" or "Get ready for a field trip," which require searching for multiple objects or understanding context-dependent commands. This study addresses the challenging problem of determining which robot should be assigned to which part of a task when each robot possesses different situational on-site knowledge-specifically, spatial concepts learned from the area designated to it by the user. We propose a task planning framework that leverages large language models (LLMs) and spatial concepts to decompose natural language instructions into subtasks and allocate them to multiple robots. We designed a novel few-shot prompting strategy that enables LLMs to infer required objects from ambiguous commands and decompose them into appropriate subtasks. In our experiments, the proposed method achieved 47/50 successful assignments, outperforming random (28/50) and commonsense-based assignment (26/50). Furthermore, we conducted qualitative evaluations using two actual mobile manipulators. The results demonstrated that our framework could handle instructions, including those involving ad hoc categories such as "Get ready for a field trip," by successfully performing task decomposition, assignment, sequential planning, and execution.
comment: Submitted to AROB-ISBC 2026 (Journal Track option)
Voting or Consensus? Decision-Making in Multi-Agent Debate ACL2025
Much of the success of multi-agent debates depends on carefully choosing the right parameters. The decision-making protocol stands out as it can highly impact final model answers, depending on how decisions are reached. Systematic comparison of decision protocols is difficult because many studies alter multiple discussion parameters beyond the protocol. So far, it has been largely unknown how decision-making influences different tasks. This work systematically evaluates the impact of seven decision protocols (e.g., majority voting, unanimity consensus). We change only one variable at a time - the decision protocol - to analyze how different methods affect the collaboration between agents and measure differences in knowledge and reasoning tasks. Our results show that voting protocols improve performance by 13.2% in reasoning tasks and consensus protocols by 2.8% in knowledge tasks compared to other decision protocols. Increasing the number of agents improves performance, while more discussion rounds before voting reduce it. To improve decision-making by increasing answer diversity, we propose two new methods, All-Agents Drafting (AAD) and Collective Improvement (CI). Our methods improve task performance by up to 3.3% with AAD and up to 7.4% with CI. This work demonstrates the importance of decision-making in multi-agent debates beyond scaling.
comment: Accepted at ACL2025 (Findings)
Dynamic Pricing in High-Speed Railways Using Multi-Agent Reinforcement Learning
This paper addresses a critical challenge in the high-speed passenger railway industry: designing effective dynamic pricing strategies in the context of competing and cooperating operators. To address this, a multi-agent reinforcement learning (MARL) framework based on a non-zero-sum Markov game is proposed, incorporating random utility models to capture passenger decision making. Unlike prior studies in areas such as energy, airlines, and mobile networks, dynamic pricing for railway systems using deep reinforcement learning has received limited attention. A key contribution of this paper is a parametrisable and versatile reinforcement learning simulator designed to model a variety of railway network configurations and demand patterns while enabling realistic, microscopic modelling of user behaviour, called RailPricing-RL. This environment supports the proposed MARL framework, which models heterogeneous agents competing to maximise individual profits while fostering cooperative behaviour to synchronise connecting services. Experimental results validate the framework, demonstrating how user preferences affect MARL performance and how pricing policies influence passenger choices, utility, and overall system dynamics. This study provides a foundation for advancing dynamic pricing strategies in railway systems, aligning profitability with system-wide efficiency, and supporting future research on optimising pricing policies.
comment: 39 pages, 5 figures
The challenge of hidden gifts in multi-agent reinforcement learning
Sometimes we benefit from actions that others have taken even when we are unaware that they took those actions. For example, if your neighbor chooses not to take a parking spot in front of your house when you are not there, you can benefit, even without being aware that they took this action. These ``hidden gifts'' represent an interesting challenge for multi-agent reinforcement learning (MARL), since assigning credit when the beneficial actions of others are hidden is non-trivial. Here, we study the impact of hidden gifts with a very simple MARL task. In this task, agents in a grid-world environment have individual doors to unlock in order to obtain individual rewards. As well, if all the agents unlock their door the group receives a larger collective reward. However, there is only one key for all of the doors, such that the collective reward can only be obtained when the agents drop the key for others after they use it. Notably, there is nothing to indicate to an agent that the other agents have dropped the key, thus this act for others is a ``hidden gift''. We show that several different state-of-the-art MARL algorithms, including MARL specific architectures, fail to learn how to obtain the collective reward in this simple task. Interestingly, we find that decentralized actor-critic policy gradient agents can succeed when we provide them with information about their own action history, but MARL agents still cannot solve the task with action history. Finally, we derive a correction term for policy gradient agents, inspired by learning aware approaches, which reduces the variance in learning and helps them to converge to collective success more reliably. These results show that credit assignment in multi-agent settings can be particularly challenging in the presence of ``hidden gifts'', and demonstrate that self learning-awareness in decentralized agents can benefit these settings.
comment: Added LOLA baselines to appendix, new corollary proof on correction term not conflicting with individual objectives, related works on multi-objective RL and coordination MARL, expanded the contraposition appendix experiment, moved key drop rate experiments to appendix and aligned first success plots with key-drop plots
Systems and Control (CS)
OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction
A dominant paradigm for teaching humanoid robots complex skills is to retarget human motions as kinematic references to train reinforcement learning (RL) policies. However, existing retargeting pipelines often struggle with the significant embodiment gap between humans and robots, producing physically implausible artifacts like foot-skating and penetration. More importantly, common retargeting methods neglect the rich human-object and human-environment interactions essential for expressive locomotion and loco-manipulation. To address this, we introduce OmniRetarget, an interaction-preserving data generation engine based on an interaction mesh that explicitly models and preserves the crucial spatial and contact relationships between an agent, the terrain, and manipulated objects. By minimizing the Laplacian deformation between the human and robot meshes while enforcing kinematic constraints, OmniRetarget generates kinematically feasible trajectories. Moreover, preserving task-relevant interactions enables efficient data augmentation, from a single demonstration to different robot embodiments, terrains, and object configurations. We comprehensively evaluate OmniRetarget by retargeting motions from OMOMO, LAFAN1, and our in-house MoCap datasets, generating over 8-hour trajectories that achieve better kinematic constraint satisfaction and contact preservation than widely used baselines. Such high-quality data enables proprioceptive RL policies to successfully execute long-horizon (up to 30 seconds) parkour and loco-manipulation skills on a Unitree G1 humanoid, trained with only 5 reward terms and simple domain randomization shared by all tasks, without any learning curriculum.
comment: Project website: https://omniretarget.github.io
Robust Safety-Critical Control of Integrator Chains with Mismatched Perturbations via Linear Time-Varying Feedback
In this paper, we propose a novel safety-critical control framework for a chain of integrators subject to both matched and mismatched perturbations. The core of our approach is a linear, time-varying state-feedback design that simultaneously enforces stability and safety constraints. By integrating backstepping techniques with a quadratic programming (QP) formulation, we develop a systematic procedure to guarantee safety under time-varying gains. We provide rigorous theoretical guarantees for the double integrator case, both in the presence and absence of perturbations, and outline general proofs for extending the methodology to higher-order chains of integrators. This proposed framework thus bridges robustness and safety-critical performance, while overcoming the limitations of existing prescribed-time approaches.
Neural Network-based Co-design of Output-Feedback Control Barrier Function and Observer
Control Barrier Functions (CBFs) provide a powerful framework for ensuring safety in dynamical systems. However, their application typically relies on full state information, which is often violated in real-world scenarios due to the availability of partial state information. In this work, we propose a neural network-based framework for the co-design of a safety controller, observer, and CBF for partially observed continuous-time systems. By formulating barrier conditions over an augmented state space, our approach ensures safety without requiring bounded estimation errors or handcrafted barrier functions. All components are jointly trained by formulating appropriate loss functions, and we introduce a validity condition to provide formal safety guarantees beyond the training data. Finally, we demonstrate the effectiveness of the proposed approach through several case studies.
Stability Analysis of Thermohaline Convection With a Time-Varying Shear Flow Using the Lyapunov Method
This work identifies instabilities and computes the growth rate of a linear time-varying system using the Lyapunov method. The linear system describes cold fresh water on top of hot salty water with a periodically time-varying background shear flow. We employ a time-dependent weighting matrix to construct a Lyapunov function candidate, and the resulting linear matrix inequalities formulation is discretized in time using the forward Euler method. As the number of temporal discretization points increases, the growth rate predicted from the Lyapunov method or the Floquet theory will converge to the same value as that obtained from numerical simulations. We also use the Lyapunov method to analyze the instantaneous principal direction of instabilities and compare the computational resources required by the Lyapunov method, numerical simulations, and the Floquet theory.
comment: 6 pages, 5 figures
Explainable and Resilient ML-Based Physical-Layer Attack Detectors
Detection of emerging attacks on network infrastructure is a critical aspect of security management. To meet the growing scale and complexity of modern threats, machine learning (ML) techniques offer valuable tools for automating the detection of malicious activities. However, as these techniques become more complex, their internal operations grow increasingly opaque. In this context, we address the need for explainable physical-layer attack detection methods. First, we analyze the inner workings of various classifiers trained to alert about physical layer intrusions, examining how the influence of different monitored parameters varies depending on the type of attack being detected. This analysis not only improves the interpretability of the models but also suggests ways to enhance their design for increased speed. In the second part, we evaluate the detectors' resilience to malicious parameter noising. The results highlight a key trade-off between model speed and resilience. This work serves as a design guideline for developing fast and robust detectors trained on available network monitoring data.
Anticipatory Structure in the Propagation of Signal
We here report the development of a structure that shows the proteresis phenomenon in more general setting and set out its philosophical implications. In this case, the questions relate to how we are to interpret what will happen in the future, and the procollection (the counterpart of recollection) of not-yet-experienced phenomena that, when expressed, will be whatever has built up in fully determinate form already, ahead of the event. If such a process really exists, as our evidence confirms, not just as phenomenon but as a fact, then a gap exists between the actualized form of the future and its concrete expression when the event does happen. Such a fact, as hard to imagine as it is, may be intelligible, even interpretable and susceptible to mathematical and/or logical modeling. We build upon neglected theories and formulae that present time in a way that makes our results interpretable. A proteretic device is here described which shifts the input signal (event) to the future; and it is an anticipatory structure. The proteretic characteristic of neurons should also be capable of demonstration; and its neuronal behavior is possibly the reason for the fast perception/thought processes in spite of slow behaving neurons. That capacity may also account for why it is possible for animals (including humans) to interact with the environment by slightly seeing (in the sense of perceiving and/or sensing) the future. Exploiting this new proteretic technology, faster computers and more efficient cellphones, among other things, will be designed and built.
comment: 19 pages and 8 figures
Stabilization of nonlinear systems with unknown delays via delay-adaptive neural operator approximate predictors
This work establishes the first rigorous stability guarantees for approximate predictors in delay-adaptive control of nonlinear systems, addressing a key challenge in practical implementations where exact predictors are unavailable. We analyze two scenarios: (i) when the actuated input is directly measurable, and (ii) when it is estimated online. For the measurable input case, we prove semi-global practical asymptotic stability with an explicit bound proportional to the approximation error $\epsilon$. For the unmeasured input case, we demonstrate local practical asymptotic stability, with the region of attraction explicitly dependent on both the initial delay estimate and the predictor approximation error. To bridge theory and practice, we show that neural operators-a flexible class of neural network-based approximators-can achieve arbitrarily small approximation errors, thus satisfying the conditions of our stability theorems. Numerical experiments on two nonlinear benchmark systems-a biological protein activator/repressor model and a micro-organism growth Chemostat model-validate our theoretical results. In particular, our numerical simulations confirm stability under approximate predictors, highlight the strong generalization capabilities of neural operators, and demonstrate a substantial computational speedup of up to 15x compared to a baseline fixed-point method.
comment: 20 pages, 2 figures
Real-time Velocity Profile Optimization for Time-Optimal Maneuvering with Generic Acceleration Constraints
The computation of time-optimal velocity profiles along prescribed paths, subject to generic acceleration constraints, is a crucial problem in robot trajectory planning, with particular relevance to autonomous racing. However, the existing methods either support arbitrary acceleration constraints at high computational cost or use conservative box constraints for computational efficiency. We propose FBGA, a new \underline{F}orward-\underline{B}ackward algorithm with \underline{G}eneric \underline{A}cceleration constraints, which achieves both high accuracy and low computation time. FBGA operates forward and backward passes to maximize the velocity profile in short, discretized path segments, while satisfying user-defined performance limits. Tested on five racetracks and two vehicle classes, FBGA handles complex, non-convex acceleration constraints with custom formulations. Its maneuvers and lap times closely match optimal control baselines (within $0.11\%$-$0.36\%$), while being up to three orders of magnitude faster. FBGA maintains high accuracy even with coarse discretization, making it well-suited for online multi-query trajectory planning. Our open-source \texttt{C++} implementation is available at: https://anonymous.4open.science/r/FB_public_RAL.
Assessment of East-West (E-W) and South-North (S-N) facing Vertical Bifacial Photovoltaic Modules for Agrivoltaics and Dual-Land Use Applications in India
Deploying vertical bifacial PV modules can play a significant role in agrivoltaics, fencing walls, noise barriers, building integrated photovoltaics (BIPV), solar PV for electric vehicles, and many other applications. This research work presents the performance comparison of vertical bifacial photovoltaic (VBPV) modules facing East-West (E-W) and South-North (S-N) directions. Also, the VBPV modules are compared with vertical and tilted south-facing monofacial PV modules. Six PV modules (monofacial and bifacial) were installed at the rooftop of IIT Bhilai academic building, Raipur (21.16{\deg} N, 81.65{\deg} E), India, and studied for a year from May 2022 to April 2023. The results show that the E-W facing VBPV module gives two production peaks, one in the morning and another in the evening, as compared to the single notable rise at midday observed for a monofacial module. From a series of experiments, 19 days of data were collected over the one-year period from May 2022 to April 2023, with specific inclusion of important days like solstices and equinoxes. In addition, the energy generation results are compared with PVsyst simulations, while also addressing the limitations of the PVsyst simulation of vertical PV modules. E-W bifacial generation is higher than S-N bifacial and south-facing monofacial modules from February to April. The VBPV modules in E-W and S-N orientations present a promising opportunity for expanding the agrivoltaics sector in tropical and sub-tropical countries, like India. This has huge implications for addressing the sustainable development goals by simultaneously contributing to sustainable land management, green energy generation, energy security and water conservation in the vast geo-climatic expanse of tropics.
Robust NbN on Si-SiGe hybrid superconducting-semiconducting microwave quantum circuit
Advancing large-scale quantum computing requires superconducting circuits that combine long coherence times with compatibility with semiconductor technology. We investigate niobium nitride (NbN) coplanar waveguide resonators integrated with Si/SiGe quantum wells, creating a hybrid platform designed for CMOS-compatible quantum hardware. Using temperature-dependent microwave spectroscopy in the single-photon regime, we examine resonance frequency and quality factor variations to probe the underlying loss mechanisms. Our analysis identifies the roles of two-level systems, quasiparticles, and scattering processes, and connects these losses to wafer properties and fabrication methods. The devices demonstrate reproducible performance and stable operation maintained for over two years, highlighting their robustness. These results provide design guidelines for developing low-loss, CMOS-compatible superconducting circuits and support progress toward resilient, scalable architectures for quantum information processing.
Machine Learning Detection of Lithium Plating in Lithium-ion Cells: A Gaussian Process Approach
Lithium plating during fast charging is a critical degradation mechanism that accelerates capacity fade and can trigger catastrophic safety failures. Recent work has identified a distinctive dQ/dV peak above 4.0 V as a reliable signature of plating onset; however, conventional methods for computing dQ/dV rely on finite differencing with filtering, which amplifies sensor noise and introduces bias in peak location. In this paper, we propose a Gaussian Process (GP) framework for lithium plating detection by directly modeling the charge-voltage relationship Q(V) as a stochastic process with calibrated uncertainty. Leveraging the property that derivatives of GPs remain GPs, we infer dQ/dV analytically and probabilistically from the posterior, enabling robust detection without ad hoc smoothing. The framework provides three key benefits: (i) noise-aware inference with hyperparameters learned from data, (ii) closed-form derivatives with credible intervals for uncertainty quantification, and (iii) scalability to online variants suitable for embedded BMS. Experimental validation on Li-ion coin cells across a range of C-rates (0.2C-1C) and temperatures (0-40\deg C) demonstrates that the GP-based method reliably detects plating peaks under low-temperature, high-rate charging, while correctly reporting no peaks in baseline cases. The concurrence of GP-identified differential peaks, reduced charge throughput, and capacity fade measured via reference performance tests confirms the method's accuracy and robustness, establishing a practical pathway for real-time lithium plating detection.
comment: Submitted to American Control Conference - ACC 2026
Preemptive Spatiotemporal Trajectory Adjustment for Heterogeneous Vehicles in Highway Merging Zones
Aiming at the problem of driver's perception lag and low utilization efficiency of space-time resources in expressway ramp confluence area, based on the preemptive spatiotemporal trajectory Adjustment system, from the perspective of coordinating spatiotemporal resources, the reasonable value of safe space-time distance in trajectory pre-preparation is quantitatively analyzed. The minimum safety gap required for ramp vehicles to merge into the mainline is analyzed by introducing double positioning error and spatiotemporal trajectory tracking error. A merging control strategy for autonomous driving heterogeneous vehicles is proposed, which integrates vehicle type, driving intention, and safety spatiotemporal distance. The specific confluence strategies of ramp target vehicles and mainline cooperative vehicles under different vehicle types are systematically expounded. A variety of traffic flow and speed scenarios are used for full combination simulation. By comparing the time-position-speed diagram, the vehicle operation characteristics and the dynamic difference of confluence are qualitatively analyzed, and the average speed and average delay are used as the evaluation indices to quantitatively evaluate the performance advantages of the preemptive cooperative confluence control strategy. The results show that the maximum average delay improvement rates of mainline and ramp vehicles are 90.24 % and 74.24 %, respectively. The proposed strategy can effectively avoid potential vehicle conflicts and emergency braking behaviors, improve driving safety in the confluence area, and show significant advantages in driving stability and overall traffic efficiency optimization.
LiDAR Point Cloud Colourisation Using Multi-Camera Fusion and Low-Light Image Enhancement
In recent years, the fusion of camera data with LiDAR measurements has emerged as a powerful approach to enhance spatial understanding. This study introduces a novel, hardware-agnostic methodology that generates colourised point clouds from mechanical LiDAR using multiple camera inputs, providing complete 360-degree coverage. The primary innovation lies in its robustness under low-light conditions, achieved through the integration of a low-light image enhancement module within the fusion pipeline. The system requires initial calibration to determine intrinsic camera parameters, followed by automatic computation of the geometric transformation between the LiDAR and cameras, removing the need for specialised calibration targets and streamlining the setup. The data processing framework uses colour correction to ensure uniformity across camera feeds before fusion. The algorithm was tested using a Velodyne Puck Hi-Res LiDAR and a four-camera configuration. The optimised software achieved real-time performance and reliable colourisation even under very low illumination, successfully recovering scene details that would otherwise remain undetectable.
Intelligent Multi-link EDCA Optimization for Delay-Bounded QoS in Wi-Fi 7
IEEE 802.11be (Wi-Fi 7) introduces Multi-Link Operation (MLO) as a While MLO offers significant parallelism and capacity, realizing its full potential in guaranteeing strict delay bounds and optimizing Quality of Service (QoS) for diverse, heterogeneous traffic streams in complex multi-link scenarios remain a significant challenge. This is largely due to the limitations of static Enhanced Distributed Channel Access (EDCA) parameters and the complexity inherent in cross-link traffic management. To address this, this paper investigates the correlation between overall MLO QoS indicators and the configuration of EDCA parameters and Acess Catagory (AC) traffic allocation among links. Based on this analysis, we formulate a constrained optimization problem aiming to minimize the sum of overall packet loss rates for all access categories while satisfying their respective overall delay violation probability constraints. A Genetic Algorithm (GA)-based MLO EDCA QoS optimization algorithm is designed to efficiently search the complex configuration space of AC assignments and EDCA parameters. Experimental results demonstrate that the proposed approach's efficacy in generating adaptive MLO configuration strategies that align with diverse service requirements. The proposed solution significantly improves delay distribution characteristics, and enhance QoS robustness and resource utilization efficiency in high-load MLO environments.
Dynamic Causal Attack Graph based Cyber-security Risk Assessment Framework for CTCS System
Protecting the security of the train control system is a critical issue to ensure the safe and reliable operation of high-speed trains. Scientific modeling and analysis for the security risk is a promising way to guarantee system security. However, the representation and assessment of the multi-staged, causally related, and temporal-dynamic changed attack dependencies are difficult in the train control system. To solve the above challenges, a security assessment framework based on the Dynamical Causality Attack Graph (DCAG) model is introduced in this paper. Firstly, the DCAG model is generated based on the attack graph with consideration of temporal attack propagation and multi-stage attack event causality propagation. Then, the DCAG model is analyzed based on Bayesian inference and logic gateway-based inference. Through the case analysis of the CTCS-3 system, the security assessment framework is validated. With the DCAG-based security assessment framework, we can not only perform appropriate security risk quantification calculations, but also explore the importance of different attacks on system security risks, which is helpful in adjusting the cyber security defense policy.
Ingress Cryogenic Receivers Toward Scalable Quantum Information Processing: Theory and System Analysis
Current control techniques for cryogenically cooled qubits are realized with coaxial cables, posing multiple challenges in terms of cost, thermal load, size, and long-term scalability. Emerging approaches to tackle this issue include cryogenic CMOS electronics at 4 K, and photonic links for direct qubit control. In this paper, we propose a multiplexed all-passive cryogenic high frequency direct detection control platform (cryo-HFDD). The proposed classical interface for direct qubit control utilizes optical or sub-THz bands. We present the possible tradeoffs of this platform, and compare it with current state-of-the-art cryogenic CMOS and conventional coaxial approaches. We assess the feasibility of adopting these efficient links for a wide range of microwave qubit power levels. Specifically, we estimate the heat load to achieve the required signal-to-noise ratio SNR considering different noise sources, component losses, as well as link density. We show that multiplexed photonic receivers at 4 K can aggressively scale the control of thousands of qubits. This opens the door for low cost scalable quantum computing systems.
Beyond Point Estimates: Likelihood-Based Full-Posterior Wireless Localization
Modern wireless systems require not only position estimates, but also quantified uncertainty to support planning, control, and radio resource management. We formulate localization as posterior inference of an unknown transmitter location from receiver measurements. We propose Monte Carlo Candidate-Likelihood Estimation (MC-CLE), which trains a neural scoring network using Monte Carlo sampling to compare true and candidate transmitter locations. We show that in line-of-sight simulations with a multi-antenna receiver, MC-CLE learns critical properties including angular ambiguity and front-to-back antenna patterns. MC-CLE also achieves lower cross-entropy loss relative to a uniform baseline and Gaussian posteriors. alternatives under a uniform-loss metric.
Pinching-Antenna Systems (PASS)-Enabled UAV Delivery
A pinching-antenna systems (PASS)-enabled unmanned aerial vehicle (UAV) delivery framework is proposed, which exploits the capability of PASS to establish a strong line-of-sight link and reduce free-space pathloss.Aiming at minimizing the communication energy consumption in one cycle, a double-layer optimization (DLO) algorithm is developed by jointly optimizing the UAV delivery sequence and the pinching antenna (PA) activation vector. More specifically, at the outer layer, a hierarchical alternating optimization (HAO) scheme is proposed to tackle the NP-hard problem of delivery sequence planning, where a genetic algorithm performs global exploration to generate candidate solutions at the top-level, while a dynamic programming performs local refinement to obtain elite solutions at the lower-level. With determined UAV trajectory, at the inner layer, focus is placed on addressing the highly coupled mixed-integer nonlinear programming problem of PA activation vector optimization, where a pair of algorithms are proposed: 1) Branch-and-Bound (BnB) algorithm for finding global optimum; 2) incremental search and local refinement (ISLR) algorithm for reducing computational complexity. Simulation results indicate that: i) The proposed HAO-based delivery sequence planning scheme can effectively reduce the total flight distance, thereby decreasing flight time and communication energy consumption; ii) Both the proposed BnB and ISLR algorithms can achieve energy-efficient PA activation, with the former exhibiting better performance and the latter having lower complexity; iii) PASS outperforms the conventional multi-antenna systems, especially with higher communication rate requirements.
Policy Optimization in Robust Control: Weak Convexity and Subgradient Methods
Robust control seeks stabilizing policies that perform reliably under adversarial disturbances, with $\mathcal{H}_\infty$ control as a classical formulation. It is known that policy optimization of robust $\mathcal{H}_\infty$ control naturally lead to nonsmooth and nonconvex problems. This paper builds on recent advances in nonsmooth optimization to analyze discrete-time static output-feedback $\mathcal{H}_\infty$ control. We show that the $\mathcal{H}_\infty$ cost is weakly convex over any convex subset of a sublevel set. This structural property allows us to establish the first non-asymptotic deterministic convergence rate for the subgradient method under suitable assumptions. In addition, we prove a weak Polyak-{\L}ojasiewicz (PL) inequality in the state-feedback case, implying that all stationary points are globally optimal. We finally present a few numerical examples to validate the theoretical results.
comment: 9 pages, 11 figures
Unsupervised Detection of Spatiotemporal Anomalies in PMU Data Using Transformer-Based BiGAN
Ensuring power grid resilience requires the timely and unsupervised detection of anomalies in synchrophasor data streams. We introduce T-BiGAN, a novel framework that integrates window-attention Transformers within a bidirectional Generative Adversarial Network (BiGAN) to address this challenge. Its self-attention encoder-decoder architecture captures complex spatio-temporal dependencies across the grid, while a joint discriminator enforces cycle consistency to align the learned latent space with the true data distribution. Anomalies are flagged in real-time using an adaptive score that combines reconstruction error, latent space drift, and discriminator confidence. Evaluated on a realistic hardware-in-the-loop PMU benchmark, T-BiGAN achieves an ROC-AUC of 0.95 and an average precision of 0.996, significantly outperforming leading supervised and unsupervised methods. It shows particular strength in detecting subtle frequency and voltage deviations, demonstrating its practical value for live, wide-area monitoring without relying on manually labeled fault data.
Terahertz Quasi-BIC Metasurfaces for Ultra-Sensitive Biosensing and High-Speed Wireless Communications
Bound states in the continuum (BICs) have emerged as a revolutionary paradigm in terahertz (THz) photonics, enabling metasurfaces with theoretically infinite quality factors (Q-factors) and unprecedented light-matter control. This review synthesizes a decade of progress in THz-BIC research, tracing the evolution from foundational symmetry-protected designs to application-optimized quasi-BICs. We dissect multipolar origins, topological robustness, and symmetry-breaking strategies underpinning high-Q resonances, alongside computational frameworks for predictive design. The timeline highlights key milestones: early dielectric metasurfaces with high Q-factor, flexible biosensors achieving microgram detection limits, and Kerker-conditioned gas spectrometers reducing path lengths by few orders of magnitude. Emerging frontiers in reconfigurable MEMS-BICs and chiral quantum photonics are critically evaluated. Despite breakthroughs, scalability barriers persist for 6G integration, including nano-fabrication tolerances, material loss trade-offs, and dynamic control gaps. This review establishes BIC metasurfaces as pivotal enablers of compact, high-efficiency THz technologies poised to bridge the gap between fundamental discovery and commercialization of THz-based 6G communication and MedTech.
Combined Learning and Control: A New Paradigm for Optimal Control with Unknown Dynamics
In this paper, we present the combined learning-and-control (CLC) approach, which is a new way to solve optimal control problems with unknown dynamics by unifying model-based control and data-driven learning. The key idea is simple: we design a controller to be optimal for a proxy objective built on an available model while penalizing mismatches with the real system, so that the resulting controller is also optimal for the actual system. Building on the original CLC formulation, we demonstrate the framework to the linear quadratic regulator problem and make three advances: (i) we show that the CLC penalty is a sequence of stage-specific weights rather than a single constant; (ii) we identify when these weights can be set in advance and when they must depend on the (unknown) dynamics; and (iii) we develop a lightweight learning loop that tunes the weights directly from data without abandoning the benefits of a model-based design. We provide a complete algorithm and an empirical study against common baseline methods. The results clarify where prior knowledge suffices and where learning is essential, and they position CLC as a practical, theoretically grounded bridge between classical optimal control and modern learning methods.
comment: Submitted to ACC 2026
Asynchronous Nonlinear Sheaf Diffusion for Multi-Agent Coordination
Cellular sheaves and sheaf Laplacians provide a far-reaching generalization of graphs and graph Laplacians, resulting in a wide array of applications ranging from machine learning to multi-agent control. In the context of multi-agent systems, so called coordination sheaves provide a unifying formalism that models heterogeneous agents and coordination goals over undirected communication topologies, and applying sheaf diffusion drives agents to achieve their coordination goals. Existing literature on sheaf diffusion assumes that agents can communicate and compute updates synchronously, which is an unrealistic assumption in many scenarios where communication delays or heterogeneous agents with different compute capabilities cause disagreement among agents. To address these challenges, we introduce asynchronous nonlinear sheaf diffusion. Specifically, we show that under mild assumptions on the coordination sheaf and bounded delays in communication and computation, nonlinear sheaf diffusion converges to a minimizer of the Dirichlet energy of the coordination sheaf at a linear rate proportional to the delay bound. We further show that this linear convergence is attained from arbitrary initial conditions and the analysis depends on the spectrum of the sheaf Laplacian in a manner that generalizes the standard graph Laplacian case. We provide several numerical simulations to validate our theoretical results.
A Hierarchical Agentic Framework for Autonomous Drone-Based Visual Inspection
Autonomous inspection systems are essential for ensuring the performance and longevity of industrial assets. Recently, agentic frameworks have demonstrated significant potential for automating inspection workflows but have been limited to digital tasks. Their application to physical assets in real-world environments, however, remains underexplored. In this work, our contributions are two-fold: first, we propose a hierarchical agentic framework for autonomous drone control, and second, a reasoning methodology for individual function executions which we refer to as ReActEval. Our framework focuses on visual inspection tasks in indoor industrial settings, such as interpreting industrial readouts or inspecting equipment. It employs a multi-agent system comprising a head agent and multiple worker agents, each controlling a single drone. The head agent performs high-level planning and evaluates outcomes, while worker agents implement ReActEval to reason over and execute low-level actions. Operating entirely in natural language, ReActEval follows a plan, reason, act, evaluate cycle, enabling drones to handle tasks ranging from simple navigation (e.g., flying forward 10 meters and land) to complex high-level tasks (e.g., locating and reading a pressure gauge). The evaluation phase serves as a feedback and/or replanning stage, ensuring actions align with user objectives while preventing undesirable outcomes. We evaluate the framework in a simulated environment with two worker agents, assessing performance qualitatively and quantitatively based on task completion across varying complexity levels and workflow efficiency. By leveraging natural language processing for agent communication, our approach offers a novel, flexible, and user-accessible alternative to traditional drone-based solutions, enabling autonomous problem-solving for industrial inspection without extensive user intervention.
Robust Attitude Control of Nonlinear Multi-Rotor Dynamics with LFT Models and $\mathcal{H}_\infty$ Performance
Attitude stabilization of unmanned aerial vehicles in uncertain environments presents significant challenges due to nonlinear dynamics, parameter variations, and sensor limitations. This paper presents a comparative study of $\mathcal{H}_\infty$ and classical PID controllers for multi-rotor attitude regulation in the presence of wind disturbances and gyroscope noise. The flight dynamics are modeled using a linear parameter-varying (LPV) framework, where nonlinearities and parameter variations are systematically represented as structured uncertainties within a linear fractional transformation formulation. A robust controller based on $\mathcal{H}_\infty$ formulation is designed using only gyroscope measurements to ensure guaranteed performance bounds. Nonlinear simulation results demonstrate the effectiveness of the robust controllers compared to classical PID control, showing significant improvement in attitude regulation under severe wind disturbances.
comment: 6 pages, 6 figures, 3 tables, submitted to ACC 2026
A Review of Software for Designing and Operating Quantum Networks
Quantum network protocol development is crucial to realizing a production-grade network that can support distributed sensing, secure communication, and utility-scale quantum computation. However, the transition from laboratory demonstration to deployable networks requires software implementations of architectures and protocols tailored to the unique constraints of quantum systems. This paper reviews the current state of software implementations for quantum networks, organized around the three-plane abstraction of infrastructure, logical, and control/service planes. We cover software for both designing quantum network protocols (e.g., SeQUeNCe, QuISP, and NetSquid) and operating them, with a focus on essential control/service plane functions such as entanglement, topology, and resource management, in a proposed taxonomy. Our review highlights a persistent gap between theoretical protocol proposals and their realization in simulators or testbeds, particularly in dynamic topology and network management. We conclude by outlining open challenges and proposing a roadmap for developing scalable software architectures to enable hybrid, large-scale quantum networks.
comment: 12 pages, 5 figures, 3 tables, journal
A Novel Robust Control Method Combining DNN-Based NMPC Approximation and PI Control: Application to Exoskeleton Squat Movements
Nonlinear Model Predictive Control (NMPC) is a precise controller, but its heavy computational load often prevents application in robotic systems. Some studies have attempted to approximate NMPC using deep neural networks (NMPC-DNN). However, in the presence of unexpected disturbances or when operating conditions differ from training data, this approach lacks robustness, leading to large tracking errors. To address this issue, for the first time, the NMPC-DNN output is combined with a PI controller (Hybrid NMPC-DNN-PI). The proposed controller is validated by applying it to an exoskeleton robot during squat movement, which has a complex dynamic model and has received limited attention regarding robust nonlinear control design. A human-robot dynamic model with three active joints (ankle, knee, hip) is developed, and more than 5.3 million training samples are used to train the DNN. The results show that, under unseen conditions for the DNN, the tracking error in Hybrid NMPC-DNN-PI is significantly lower compared to NMPC-DNN. Moreover, human joint torques are greatly reduced with the use of the exoskeleton, with RMS values for the studied case reduced by 30.9%, 41.8%, and 29.7% at the ankle, knee, and hip, respectively. In addition, the computational cost of Hybrid NMPC-DNN-PI is 99.93% lower than that of NMPC.
Dynamic Modeling and Control System Analysis for Continuous-Disc Filters in Pulp Mill Operations
Vacuum disc filtration is critical in pulp mills for white liquor clarification and pulp washing, involving tightly coupled dynamics between rotational speed, vacuum pressure, slurry concentration, filtrate flow, and cake thickness. These nonlinear interactions are often regulated using empirical methods, lacking formal modeling and control. This article develops a dynamic, multivariable model of a continuous-disc filter (CD-filter) system based on first principles, simplified to a single representative disc for tractability. A linearized state-space model supports the design of two control strategies: a decentralized PI-based scheme and a centralized model predictive control (MPC). MATLAB-Simulink simulations reveal that MPC outperforms PI in tracking accuracy, overshoot reduction, and disturbance rejection. A 3D efficiency surface illustrates the importance of coordinating inlet flow and solids concentration. Results highlight the need for advanced multivariable control in optimizing CD-filter performance.
Event-Driven Control via Sparsity-Promoting Regularization: A Rollout Approach with Performance Guarantees
This paper presents a controller design framework aiming to balance control performance and actuation rate. Control performance is evaluated by an infinite-horizon average cost, and the number of control actions is penalized via sparsity-promoting regularization. Since the formulated optimal control problem has a combinatorial nature, we employ a rollout algorithm to obtain a tractable suboptimal solution. In the proposed scheme, actuation timings are determined through a multistage minimization procedure based on a receding-horizon approach, and the corresponding control inputs are computed online. We establish theoretical performance guarantees with respect to periodic control and prove the stability of the closed-loop system. The effectiveness of the proposed method is demonstrated through a numerical example.
comment: 14 pages
Simulated Annealing for Multi-Robot Ergodic Information Acquisition Using Graph-Based Discretization
One of the goals of active information acquisition using multi-robot teams is to keep the relative uncertainty in each region at the same level to maintain identical acquisition quality (e.g., consistent target detection) in all the regions. To achieve this goal, ergodic coverage can be used to assign the number of samples according to the quality of observation, i.e., sampling noise levels. However, the noise levels are unknown to the robots. Although this noise can be estimated from samples, the estimates are unreliable at first and can generate fluctuating values. The main contribution of this paper is to use simulated annealing to generate the target sampling distribution, starting from uniform and gradually shifting to an estimated optimal distribution, by varying the coldness parameter of a Boltzmann distribution with the estimated sampling entropy as energy. Simulation results show a substantial improvement of both transient and asymptotic entropy compared to both uniform and direct-ergodic searches. Finally, a demonstration is performed with a TurtleBot swarm system to validate the physical applicability of the algorithm.
MathBode: Frequency-Domain Fingerprints of LLM Mathematical Reasoning
This paper presents MathBode, a dynamic diagnostic for mathematical reasoning in large language models (LLMs). Instead of one-shot accuracy, MathBode treats each parametric problem as a system: we drive a single parameter sinusoidally and fit first-harmonic responses of model outputs and exact solutions. This yields interpretable, frequency-resolved metrics -- gain (amplitude tracking) and phase (lag) -- that form Bode-style fingerprints. Across five closed-form families (linear solve, ratio/saturation, compound interest, 2x2 linear systems, similar triangles), the diagnostic surfaces systematic low-pass behavior and growing phase lag that accuracy alone obscures. We compare several models against a symbolic baseline that calibrates the instrument ($G \approx 1$, $\phi \approx 0$). Results separate frontier from mid-tier models on dynamics, providing a compact, reproducible protocol that complements standard benchmarks with actionable measurements of reasoning fidelity and consistency. We open-source the dataset and code to enable further research and adoption.
Parasitic actuation delay limits the minimum employable time headway in connected and autonomous vehicles
Adaptive andcooperative adaptive cruise control (ACC and CACC) and next generation CACC (CACC+) systems usually employ a constant time headway policy (CTHP) for platooning of connected and autonomous vehicles (CAVs). In ACC, the ego vehicle uses onboard sensors to measure the position and velocity of the predecessor vehicle to maintain a desired spacing. The CACC and CACC+systems use additional information, such as acceleration(s) communicated through vehicle-to-vehicle (V2V) communication of the predecessor vehicle(s); these systems have been shown to result in improved spacing performance, throughput, and safety over ACC. Parasitic dynamics are generally difficult to model and the parasitic parameters (delay, lag, etc.) are difficult to obtain. Parasitic actuation delays can have deleterious effects and impose limits on the mobility and safety of CAVs. It is reasonable to assume that the bounds on parasitic actuation delays are known a priori. For CAVs, we need to address both internal stability and string stability in the presence of parasitic actuation delays. This requires robustness of string and internal stability for all values of parasitic actuation delays that are within the specified upper bound. In this paper, we provide the minimum employable time headway for ACC, CACC, and CACC+ (`r' predecessors look-ahead), respectively. The inclusion of the internal stability in the string stability condition is analyzed based on Pontryagin's interlacing theorem for time delay systems. We provide comparative numerical results to corroborate the achieved theoretical results.
Hierarchical Analysis and Control of Epidemic Spreading over Networks using Dissipativity and Mesh Stability
Analyzing and controlling spreading processes are challenging problems due to the involved non-linear node (subsystem) dynamics, unknown disturbances, complex interconnections, and the large-scale and multi-level nature of the problems. The dissipativity concept provides a practical framework for addressing such concerns, thanks to the energy-based representation it offers for subsystems and the compositional properties it provides for the analysis and control of interconnected (networked) systems comprised of such subsystems. Therefore, in this paper, we utilize the dissipativity concept to analyze and control a spreading process that occurs over a hierarchy of nodes, groups, and a network (i.e., a spreading network). We start by generalizing several existing results on dissipativity-based topology design for networked systems. Next, we model the considered spreading network as a networked system and establish the dissipativity properties of its nodes. The generalized topology design method is then applied at multiple levels of the considered spreading network to formulate its analysis and control problems as Linear Matrix Inequality (LMI) problems. We identify and enforce localized necessary conditions to support the feasibility of the LMI problem solved at each subsequent hierarchical level of the spreading network. Consequently, the proposed method does not involve iterative multi-level optimization stages that are computationally inefficient. The proposed control solution ensures that the spreading network is not only stable but also dissipative and mesh-stable. Compared to conventional methods, such as threshold pruning and high-degree edge removal, our approach offers superior performance in terms of infection containment, control efficiency, and disturbance robustness. Extensive numerical results demonstrate the effectiveness of the proposed technique.
comment: To be submitted to American Control Conference 2026
A Hybrid Systems Model of Feedback Optimization for Linear Systems: Convergence and Robustness
Feedback optimization algorithms compute inputs to a system using real-time output measurements, which helps mitigate the effects of disturbances. However, existing work often models both system dynamics and computations in either discrete or continuous time, which may not accurately model some applications. In this work, we model linear system dynamics in continuous time, and we model the computations of inputs in discrete time. Therefore, we present a novel hybrid systems model of feedback optimization. We first establish the well-posedness of this hybrid model and establish completeness of solutions while ruling out Zeno behavior. Then we show the state of the system converges exponentially fast to a ball of known radius about a desired goal state. Next we analytically show that this system is robust to perturbations in (i) the values of measured outputs, (ii) the matrices that model the linear time-invariant system, and (iii) the times at which inputs are applied to the system. Simulation results confirm that this approach successfully mitigates the effects of disturbances.
comment: 15 Pages, 2 Figures, submitted to American Control Conference 2026
MAPPO for Edge Server Monitoring
In this paper, we consider a goal-oriented communication problem for edge server monitoring, where jobs arrive intermittently at multiple dispatchers and must be assigned to shared edge servers with finite queues and time-varying availability. Accurate knowledge of server status is critical for sustaining high throughput, yet remains challenging under dynamic workloads and partial observability. To address this challenge, each dispatcher maintains server knowledge through two complementary mechanisms: (i) active status queries that provide instantaneous updates at a communication cost, and (ii) job execution feedback that reveals server conditions upon successful or failed job completion. We formulate a cooperative multi-agent distributed decision-making problem in which dispatchers jointly optimize query scheduling to balance throughput against communication overhead. To solve this problem, we propose a Multi-Agent Proximal Policy Optimization (MAPPO)-based algorithm that leverages centralized training with decentralized execution (CTDE) to learn distributed query-and-dispatch policies under partial and stale observations. Experiments show that MAPPO achieves superior throughput-cost tradeoffs and significantly outperforms baseline strategies across varying query costs, job arrival rates, and dispatchers.
comment: 6 pages, 4 figures. Accepted to IEEE MILCOM 2025
Robot Conga: A Leader-Follower Walking Approach to Sequential Path Following in Multi-Agent Systems
Coordinated path following in multi-agent systems is a key challenge in robotics, with applications in automated logistics, surveillance, and collaborative exploration. Traditional formation control techniques often rely on time-parameterized trajectories and path integrals, which can result in synchronization issues and rigid behavior. In this work, we address the problem of sequential path following, where agents maintain fixed spatial separation along a common trajectory, guided by a leader under centralized control. We introduce Robot Conga, a leader-follower control strategy that updates each agent's desired state based on the leader's spatial displacement rather than time, assuming access to a global position reference, an assumption valid in indoor environments equipped with motion capture, vision-based tracking, or UWB localization systems. The algorithm was validated in simulation using both TurtleBot3 and quadruped (Laikago) robots. Results demonstrate accurate trajectory tracking, stable inter-agent spacing, and fast convergence, with all agents aligning within 250 time steps (approx. 0.25 seconds) in the quadruped case, and almost instantaneously in the TurtleBot3 implementation.
comment: 6 Pages, 8 Figures. Both authors have contributed equally
Edge Server Monitoring for Job Assignment
In this paper, we study a goal-oriented communication problem for edge server monitoring, where compute jobs arrive intermittently at dispatchers and must be immediately assigned to distributed edge servers. Due to competing workloads and the dynamic nature of the edge environment, server availability fluctuates over time. To maintain accurate estimates of server availability states, each dispatcher updates its belief using two mechanisms: (i) active queries over shared communication channels and (ii) feedback from past job executions. We formulate a query scheduling problem that maximizes the job success rate under limited communication resources for queries. This problem is modeled as a Restless Multi-Armed Bandit (RMAB) with multiple actions and addressed using a Net-Gain Maximization (NGM) scheduling algorithm, which selects servers to query based on their expected improvement in execution performance. Simulation results show that the proposed NGM Policy significantly outperforms baseline strategies, achieving up to a 30% gain over the Round-Robin Policy and up to a 107% gain over the Never-Query Policy.
comment: Accepted to IEEE MILCOM 2025 (Networking Protocols and Performance Track), 6 pages, 2 figures
On Fixed-Time Stability for a Class of Singularly Perturbed Systems using Composite Lyapunov Functions
Fixed-time stable dynamical systems are capable of achieving exact convergence to an equilibrium point within a fixed time that is independent of the initial conditions of the system. This property makes them highly appealing for designing control, estimation, and optimization algorithms in applications with stringent performance requirements. However, the set of tools available for analyzing the interconnection of fixed-time stable systems is rather limited compared to their asymptotic counterparts. In this paper, we address some of these limitations by exploiting the emergence of multiple time scales in nonlinear singularly perturbed dynamical systems, where the fast dynamics and the slow dynamics are fixed-time stable on their own. By extending the so-called composite Lyapunov method from asymptotic stability to the context of fixed-time stability, we provide a novel class of Lyapunov-based sufficient conditions to certify fixed-time stability in a class of singularly perturbed dynamical systems. The results are illustrated, analytically and numerically, using a fixed-time gradient flow system interconnected with a fixed-time plant and an additional high-order example.
Ocean Diviner: A Diffusion-Augmented Reinforcement Learning Framework for AUV Robust Control in Underwater Tasks
Autonomous Underwater Vehicles (AUVs) are essential for marine exploration, yet their control remains highly challenging due to nonlinear dynamics and uncertain environmental disturbances. This paper presents a diffusion-augmented Reinforcement Learning (RL) framework for robust AUV control, aiming to improve AUV's adaptability in dynamic underwater environments. The proposed framework integrates two core innovations: (1) A diffusion-based action generation framework that produces physically feasible and high-quality actions, enhanced by a high-dimensional state encoding mechanism combining current observations with historical states and actions through a novel diffusion U-Net architecture, significantly improving long-horizon planning capacity for robust control. (2) A sample-efficient hybrid learning architecture that synergizes diffusion-guided exploration with RL policy optimization, where the diffusion model generates diverse candidate actions and the RL critic selects the optimal action, achieving higher exploration efficiency and policy stability in dynamic underwater environments. Extensive simulation experiments validate the framework's superior robustness and flexibility, outperforming conventional control methods in challenging marine conditions, offering enhanced adaptability and reliability for AUV operations in underwater tasks. Finally, we will release the code publicly soon to support future research in this area.
comment: Jingzehua Xu, Guanwen Xie and Weiyi Liu contributed equally to this work
Data-Driven Estimation of Structured Singular Values
Estimating the size of the modeling error is crucial for robust control. Over the years, numerous metrics have been developed to quantify the model error in a control relevant manner. One of the most important such metrics is the structured singular value, as it leads to necessary and sufficient conditions for ensuring stability and robustness in feedback control under structured model uncertainty. Although the computation of the structured singular value is often intractable, lower and upper bounds for it can often be obtained if a model of the system is known. In this paper, we introduce a fully data-driven method to estimate a lower bound for the structured singular value, by conducting experiments on the system and applying power iterations to the collected data. Our numerical simulations demonstrate that this method effectively lower bounds the structured singular value, yielding results comparable to the MATLAB$^\copyright$ Robust Control Toolbox.
comment: Published in IEEE Control Systems Letters, vol. 9, pp. 1976-1981, 2025
Spatial exponential decay of perturbations in optimal control of general evolution equations
We analyze the robustness of optimally controlled evolution equations with respect to spatially localized perturbations. We prove that if the involved operators are domain-uniformly stabilizable and detectable, then these localized perturbations only have a local effect on the optimal solution. We characterize this domain-uniform stabilizability and detectability for the transport equation with constant transport velocity, showing that even for unitary semigroups, optimality implies exponential damping. We extend this result to the case of a space-dependent transport velocity. Finally we leverage the results for the transport equation to characterize domain-uniform stabilizability of the wave equation. Numerical examples in one space dimension complement the theoretical results.
comment: 53 pages, 5 figures
Neural Operator Feedback for a First-Order PIDE with Spatially-Varying State Delay
A transport PDE with a spatial integral and recirculation with constant delay has been a benchmark for neural operator approximations of PDE backstepping controllers. Introducing a spatially-varying delay into the model gives rise to a gain operator defined through integral equations which the operator's input -- the varying delay function -- enters in previously unencountered manners, including in the limits of integration and as the inverse of the `delayED time' function. This, in turn, introduces novel mathematical challenges in estimating the operator's Lipschitz constant. The backstepping kernel function having two branches endows the feedback law with a two-branch structure, where only one of the two feedback branches depends on both of the kernel branches. For this rich feedback structure, we propose a neural operator approximation of such a two-branch feedback law and prove the approximator to be semiglobally practically stabilizing. With numerical results we illustrate the training of the neural operator and its stabilizing capability.
comment: This 14 page paper contains 1 table and 9 figures
Optimal transmission expansion modestly reduces decarbonization costs of U.S. electricity
Expanding interregional transmission is widely viewed as essential for integrating clean energy into decarbonized power systems. Using the open-source Switch capacity expansion model with detailed representation of existing U.S. generation and transmission infrastructure, solar, wind, and storage resources, and hourly operations, we evaluate the role of transmission across least-cost, socially optimal, and zero-emissions scenarios for 2050. An optimal nationwide plan would more than triple interregional transmission capacity, yet this reduces the cost of a zero-emissions system by only 7% relative to relying on existing transmission, as storage, solar and wind siting, and nuclear generation serve as close substitutes. Regional cost and rent effects vary, with transmission generally favoring wind and hydrogen resources over solar and batteries. Sensitivity analysis shows diminishing returns: one-fifth of the benefits of full expansion can be achieved with one-twelfth of the added capacity, while cost reductions for batteries and hydrogen provide comparable or greater system savings than transmission. Reconductoring -- quadrupling line capacity at half the cost of new builds achieves nearly all the benefits of unconstrained expansion. These results suggest that while substantial transmission expansion is economically justified, a diverse set of flexibility resources can substitute for large-scale grid build-out, and the relative value of transmission is highly contingent on technological and cost developments.
comment: 30 pages, 10 figures, 1 table in main paper. Additional 11 pages including 7 additional figures and one table in the appendix
Optimized dynamic scheduling of an exclusive bus lane or high occupancy vehicle lane in a bimodal traffic corridor
Efficient management of traffic corridors is critical for sustaining urban mobility. Exclusive bus lane (EBL) and high occupancy vehicle lane (HOVL) are two prominent strategies for enhancing public transit services and alleviating congestion. EBLs prioritize bus transit by providing dedicated lanes for faster travel times, while HOVLs encourage carpooling by reserving lanes for high-occupancy vehicles. However, static implementations of these policies may underutilize road resources and disrupt general-purpose lanes. Dynamic implementation, based on real-time demand, can potentially maximize road efficiency and minimize negative impacts. This study first compares the mixed traffic policy (MTP), exclusive bus lane policy (EBLP), and high occupancy vehicle lane policy (HOVLP) by formulating the total system costs in the context of a bimodal traffic corridor involving private cars and public buses. Under each lane policy, the bus frequency is optimized together with the modal split equilibrium derived separately. Based on dynamic demand simulated using an Ornstein-Uhlenbeck (O-U) process, switching thresholds are then derived to identify optimal periods for implementing each policy. Results reveal significant reductions in total system costs with the proposed dynamic policy schedules. Compared to static implementations, the dynamic policy schedules achieve cost reductions of 11.9%, 6.70%, and 43.64% relative to MTP-only, EBLP-only, and HOVLP-only scenarios, respectively. Additionally, in two real case studies of existing EBL and HOVL operations in Seattle, the proposed dynamic policy reduces total costs by 32.5% and 28.6%, respectively. The findings provide valuable insights for policymakers and transit planners, offering a robust framework for dynamically scheduling and integrating EBL and HOVL policies to optimize urban corridor efficiency and reduce overall system costs.
comment: This version is identical to the manuscript submitted to International Journal of Sustainable Transportation. It corrects inconsistencies present in the previous arXiv version and is the version under journal review. No scientific content has been altered
Spectral Flow Learning Theory: Finite-Sample Guarantees for Vector-Field Identification
We study the identification of continuous-time vector fields from irregularly sampled trajectories. We introduce Spectral Flow Learning (SFL), which learns in a windowed flow space using a lag-linear label operator that aggregates lagged Koopman actions. We provide finite-sample high-probability (FS-HP) guarantees for the class of variable-step linear multistep methods (vLLM). The FS-HP rates are constructed using spectral regularization with qualification-controlled filters for flow predictors under standard source and filter assumptions. A multistep observability inequality links flow error to vector-field error and yields two-term bounds that combine a statistical rate with an explicit discretization bias from vLMM theory. This preliminary preprint states the results and sketches proofs, with full proofs and extensions deferred to a journal version.
Pinching-Antenna Systems (PASS): Power Radiation Model and Optimal Beamforming Design
Pinching-antenna systems (PASS) improve wireless links by configuring the locations of activated pinching antennas along dielectric waveguides, namely pinching beamforming. In this paper, a novel adjustable power radiation model is proposed for PASS, where power radiation ratios of pinching antennas can be flexibly controlled by tuning the spacing between pinching antennas and waveguides. A closed-form pinching antenna spacing arrangement strategy is derived to achieve the commonly assumed equal-power radiation. Based on this, a practical PASS framework relying on discrete activation is considered, where pinching antennas can only be activated among a set of predefined locations. A transmit power minimization problem is formulated, which jointly optimizes the transmit beamforming, pinching beamforming, and the numbers of activated pinching antennas, subject to each user's minimum rate requirement. (1) To solve the resulting highly coupled mixed-integer nonlinear programming (MINLP) problem, branch-and-bound (BnB)-based algorithms are proposed for both single-user and multi-user scenarios, which is guaranteed to converge to globally optimal solutions. (2) A low-complexity many-to-many matching algorithm is further developed. Combined with the Karush-Kuhn-Tucker (KKT) theory, locally optimal and pairwise-stable solutions are obtained within polynomial-time complexity. Simulation results demonstrate that: (i) PASS significantly outperforms conventional multi-antenna architectures, particularly when the number of users and the spatial range increase; and (ii) The proposed matching-based algorithm achieves near-optimal performance, resulting in only a slight performance loss while significantly reducing computational overheads. Code is available at https://github.com/xiaoxiaxusummer/PASS_Discrete
comment: [Update] Detailed proof and numerical verification of the adjustable power radiation model have been added. Code is available at https://github.com/xiaoxiaxusummer/PASS_Discrete
A Coordinated Routing Approach for Enhancing Bus Timeliness and Travel Efficiency in Mixed-Traffic Environment
In this paper, we propose a coordinated routing strategy aimed at improving bus schedule adherence and enhancing travel efficiency for connected and automated vehicles (CAVs) operating within a mixed-traffic urban network. Our approach capitalizes on the existence of dedicated lanes for buses and CAVs, leveraging real-time traffic data to dynamically reroute CAVs in anticipation of congestion. By continuously monitoring traffic conditions on dedicated lanes and tracking the real-time positions of buses, we enable the system to proactively adjust CAV routes when potential interference with bus operations is detected. This coordination mitigates delays affecting transit services and reduces travel time for CAVs. We evaluate the proposed strategy through simulation studies conducted in the SUMO. The results demonstrate significant improvements in both transit reliability and CAV operational performance across a range of traffic conditions.
Systems and Control (EESS)
OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction
A dominant paradigm for teaching humanoid robots complex skills is to retarget human motions as kinematic references to train reinforcement learning (RL) policies. However, existing retargeting pipelines often struggle with the significant embodiment gap between humans and robots, producing physically implausible artifacts like foot-skating and penetration. More importantly, common retargeting methods neglect the rich human-object and human-environment interactions essential for expressive locomotion and loco-manipulation. To address this, we introduce OmniRetarget, an interaction-preserving data generation engine based on an interaction mesh that explicitly models and preserves the crucial spatial and contact relationships between an agent, the terrain, and manipulated objects. By minimizing the Laplacian deformation between the human and robot meshes while enforcing kinematic constraints, OmniRetarget generates kinematically feasible trajectories. Moreover, preserving task-relevant interactions enables efficient data augmentation, from a single demonstration to different robot embodiments, terrains, and object configurations. We comprehensively evaluate OmniRetarget by retargeting motions from OMOMO, LAFAN1, and our in-house MoCap datasets, generating over 8-hour trajectories that achieve better kinematic constraint satisfaction and contact preservation than widely used baselines. Such high-quality data enables proprioceptive RL policies to successfully execute long-horizon (up to 30 seconds) parkour and loco-manipulation skills on a Unitree G1 humanoid, trained with only 5 reward terms and simple domain randomization shared by all tasks, without any learning curriculum.
comment: Project website: https://omniretarget.github.io
Robust Safety-Critical Control of Integrator Chains with Mismatched Perturbations via Linear Time-Varying Feedback
In this paper, we propose a novel safety-critical control framework for a chain of integrators subject to both matched and mismatched perturbations. The core of our approach is a linear, time-varying state-feedback design that simultaneously enforces stability and safety constraints. By integrating backstepping techniques with a quadratic programming (QP) formulation, we develop a systematic procedure to guarantee safety under time-varying gains. We provide rigorous theoretical guarantees for the double integrator case, both in the presence and absence of perturbations, and outline general proofs for extending the methodology to higher-order chains of integrators. This proposed framework thus bridges robustness and safety-critical performance, while overcoming the limitations of existing prescribed-time approaches.
Neural Network-based Co-design of Output-Feedback Control Barrier Function and Observer
Control Barrier Functions (CBFs) provide a powerful framework for ensuring safety in dynamical systems. However, their application typically relies on full state information, which is often violated in real-world scenarios due to the availability of partial state information. In this work, we propose a neural network-based framework for the co-design of a safety controller, observer, and CBF for partially observed continuous-time systems. By formulating barrier conditions over an augmented state space, our approach ensures safety without requiring bounded estimation errors or handcrafted barrier functions. All components are jointly trained by formulating appropriate loss functions, and we introduce a validity condition to provide formal safety guarantees beyond the training data. Finally, we demonstrate the effectiveness of the proposed approach through several case studies.
Stability Analysis of Thermohaline Convection With a Time-Varying Shear Flow Using the Lyapunov Method
This work identifies instabilities and computes the growth rate of a linear time-varying system using the Lyapunov method. The linear system describes cold fresh water on top of hot salty water with a periodically time-varying background shear flow. We employ a time-dependent weighting matrix to construct a Lyapunov function candidate, and the resulting linear matrix inequalities formulation is discretized in time using the forward Euler method. As the number of temporal discretization points increases, the growth rate predicted from the Lyapunov method or the Floquet theory will converge to the same value as that obtained from numerical simulations. We also use the Lyapunov method to analyze the instantaneous principal direction of instabilities and compare the computational resources required by the Lyapunov method, numerical simulations, and the Floquet theory.
comment: 6 pages, 5 figures
Explainable and Resilient ML-Based Physical-Layer Attack Detectors
Detection of emerging attacks on network infrastructure is a critical aspect of security management. To meet the growing scale and complexity of modern threats, machine learning (ML) techniques offer valuable tools for automating the detection of malicious activities. However, as these techniques become more complex, their internal operations grow increasingly opaque. In this context, we address the need for explainable physical-layer attack detection methods. First, we analyze the inner workings of various classifiers trained to alert about physical layer intrusions, examining how the influence of different monitored parameters varies depending on the type of attack being detected. This analysis not only improves the interpretability of the models but also suggests ways to enhance their design for increased speed. In the second part, we evaluate the detectors' resilience to malicious parameter noising. The results highlight a key trade-off between model speed and resilience. This work serves as a design guideline for developing fast and robust detectors trained on available network monitoring data.
Anticipatory Structure in the Propagation of Signal
We here report the development of a structure that shows the proteresis phenomenon in more general setting and set out its philosophical implications. In this case, the questions relate to how we are to interpret what will happen in the future, and the procollection (the counterpart of recollection) of not-yet-experienced phenomena that, when expressed, will be whatever has built up in fully determinate form already, ahead of the event. If such a process really exists, as our evidence confirms, not just as phenomenon but as a fact, then a gap exists between the actualized form of the future and its concrete expression when the event does happen. Such a fact, as hard to imagine as it is, may be intelligible, even interpretable and susceptible to mathematical and/or logical modeling. We build upon neglected theories and formulae that present time in a way that makes our results interpretable. A proteretic device is here described which shifts the input signal (event) to the future; and it is an anticipatory structure. The proteretic characteristic of neurons should also be capable of demonstration; and its neuronal behavior is possibly the reason for the fast perception/thought processes in spite of slow behaving neurons. That capacity may also account for why it is possible for animals (including humans) to interact with the environment by slightly seeing (in the sense of perceiving and/or sensing) the future. Exploiting this new proteretic technology, faster computers and more efficient cellphones, among other things, will be designed and built.
comment: 19 pages and 8 figures
Stabilization of nonlinear systems with unknown delays via delay-adaptive neural operator approximate predictors
This work establishes the first rigorous stability guarantees for approximate predictors in delay-adaptive control of nonlinear systems, addressing a key challenge in practical implementations where exact predictors are unavailable. We analyze two scenarios: (i) when the actuated input is directly measurable, and (ii) when it is estimated online. For the measurable input case, we prove semi-global practical asymptotic stability with an explicit bound proportional to the approximation error $\epsilon$. For the unmeasured input case, we demonstrate local practical asymptotic stability, with the region of attraction explicitly dependent on both the initial delay estimate and the predictor approximation error. To bridge theory and practice, we show that neural operators-a flexible class of neural network-based approximators-can achieve arbitrarily small approximation errors, thus satisfying the conditions of our stability theorems. Numerical experiments on two nonlinear benchmark systems-a biological protein activator/repressor model and a micro-organism growth Chemostat model-validate our theoretical results. In particular, our numerical simulations confirm stability under approximate predictors, highlight the strong generalization capabilities of neural operators, and demonstrate a substantial computational speedup of up to 15x compared to a baseline fixed-point method.
comment: 20 pages, 2 figures
Real-time Velocity Profile Optimization for Time-Optimal Maneuvering with Generic Acceleration Constraints
The computation of time-optimal velocity profiles along prescribed paths, subject to generic acceleration constraints, is a crucial problem in robot trajectory planning, with particular relevance to autonomous racing. However, the existing methods either support arbitrary acceleration constraints at high computational cost or use conservative box constraints for computational efficiency. We propose FBGA, a new \underline{F}orward-\underline{B}ackward algorithm with \underline{G}eneric \underline{A}cceleration constraints, which achieves both high accuracy and low computation time. FBGA operates forward and backward passes to maximize the velocity profile in short, discretized path segments, while satisfying user-defined performance limits. Tested on five racetracks and two vehicle classes, FBGA handles complex, non-convex acceleration constraints with custom formulations. Its maneuvers and lap times closely match optimal control baselines (within $0.11\%$-$0.36\%$), while being up to three orders of magnitude faster. FBGA maintains high accuracy even with coarse discretization, making it well-suited for online multi-query trajectory planning. Our open-source \texttt{C++} implementation is available at: https://anonymous.4open.science/r/FB_public_RAL.
Assessment of East-West (E-W) and South-North (S-N) facing Vertical Bifacial Photovoltaic Modules for Agrivoltaics and Dual-Land Use Applications in India
Deploying vertical bifacial PV modules can play a significant role in agrivoltaics, fencing walls, noise barriers, building integrated photovoltaics (BIPV), solar PV for electric vehicles, and many other applications. This research work presents the performance comparison of vertical bifacial photovoltaic (VBPV) modules facing East-West (E-W) and South-North (S-N) directions. Also, the VBPV modules are compared with vertical and tilted south-facing monofacial PV modules. Six PV modules (monofacial and bifacial) were installed at the rooftop of IIT Bhilai academic building, Raipur (21.16{\deg} N, 81.65{\deg} E), India, and studied for a year from May 2022 to April 2023. The results show that the E-W facing VBPV module gives two production peaks, one in the morning and another in the evening, as compared to the single notable rise at midday observed for a monofacial module. From a series of experiments, 19 days of data were collected over the one-year period from May 2022 to April 2023, with specific inclusion of important days like solstices and equinoxes. In addition, the energy generation results are compared with PVsyst simulations, while also addressing the limitations of the PVsyst simulation of vertical PV modules. E-W bifacial generation is higher than S-N bifacial and south-facing monofacial modules from February to April. The VBPV modules in E-W and S-N orientations present a promising opportunity for expanding the agrivoltaics sector in tropical and sub-tropical countries, like India. This has huge implications for addressing the sustainable development goals by simultaneously contributing to sustainable land management, green energy generation, energy security and water conservation in the vast geo-climatic expanse of tropics.
Robust NbN on Si-SiGe hybrid superconducting-semiconducting microwave quantum circuit
Advancing large-scale quantum computing requires superconducting circuits that combine long coherence times with compatibility with semiconductor technology. We investigate niobium nitride (NbN) coplanar waveguide resonators integrated with Si/SiGe quantum wells, creating a hybrid platform designed for CMOS-compatible quantum hardware. Using temperature-dependent microwave spectroscopy in the single-photon regime, we examine resonance frequency and quality factor variations to probe the underlying loss mechanisms. Our analysis identifies the roles of two-level systems, quasiparticles, and scattering processes, and connects these losses to wafer properties and fabrication methods. The devices demonstrate reproducible performance and stable operation maintained for over two years, highlighting their robustness. These results provide design guidelines for developing low-loss, CMOS-compatible superconducting circuits and support progress toward resilient, scalable architectures for quantum information processing.
Machine Learning Detection of Lithium Plating in Lithium-ion Cells: A Gaussian Process Approach
Lithium plating during fast charging is a critical degradation mechanism that accelerates capacity fade and can trigger catastrophic safety failures. Recent work has identified a distinctive dQ/dV peak above 4.0 V as a reliable signature of plating onset; however, conventional methods for computing dQ/dV rely on finite differencing with filtering, which amplifies sensor noise and introduces bias in peak location. In this paper, we propose a Gaussian Process (GP) framework for lithium plating detection by directly modeling the charge-voltage relationship Q(V) as a stochastic process with calibrated uncertainty. Leveraging the property that derivatives of GPs remain GPs, we infer dQ/dV analytically and probabilistically from the posterior, enabling robust detection without ad hoc smoothing. The framework provides three key benefits: (i) noise-aware inference with hyperparameters learned from data, (ii) closed-form derivatives with credible intervals for uncertainty quantification, and (iii) scalability to online variants suitable for embedded BMS. Experimental validation on Li-ion coin cells across a range of C-rates (0.2C-1C) and temperatures (0-40\deg C) demonstrates that the GP-based method reliably detects plating peaks under low-temperature, high-rate charging, while correctly reporting no peaks in baseline cases. The concurrence of GP-identified differential peaks, reduced charge throughput, and capacity fade measured via reference performance tests confirms the method's accuracy and robustness, establishing a practical pathway for real-time lithium plating detection.
comment: Submitted to American Control Conference - ACC 2026
Preemptive Spatiotemporal Trajectory Adjustment for Heterogeneous Vehicles in Highway Merging Zones
Aiming at the problem of driver's perception lag and low utilization efficiency of space-time resources in expressway ramp confluence area, based on the preemptive spatiotemporal trajectory Adjustment system, from the perspective of coordinating spatiotemporal resources, the reasonable value of safe space-time distance in trajectory pre-preparation is quantitatively analyzed. The minimum safety gap required for ramp vehicles to merge into the mainline is analyzed by introducing double positioning error and spatiotemporal trajectory tracking error. A merging control strategy for autonomous driving heterogeneous vehicles is proposed, which integrates vehicle type, driving intention, and safety spatiotemporal distance. The specific confluence strategies of ramp target vehicles and mainline cooperative vehicles under different vehicle types are systematically expounded. A variety of traffic flow and speed scenarios are used for full combination simulation. By comparing the time-position-speed diagram, the vehicle operation characteristics and the dynamic difference of confluence are qualitatively analyzed, and the average speed and average delay are used as the evaluation indices to quantitatively evaluate the performance advantages of the preemptive cooperative confluence control strategy. The results show that the maximum average delay improvement rates of mainline and ramp vehicles are 90.24 % and 74.24 %, respectively. The proposed strategy can effectively avoid potential vehicle conflicts and emergency braking behaviors, improve driving safety in the confluence area, and show significant advantages in driving stability and overall traffic efficiency optimization.
LiDAR Point Cloud Colourisation Using Multi-Camera Fusion and Low-Light Image Enhancement
In recent years, the fusion of camera data with LiDAR measurements has emerged as a powerful approach to enhance spatial understanding. This study introduces a novel, hardware-agnostic methodology that generates colourised point clouds from mechanical LiDAR using multiple camera inputs, providing complete 360-degree coverage. The primary innovation lies in its robustness under low-light conditions, achieved through the integration of a low-light image enhancement module within the fusion pipeline. The system requires initial calibration to determine intrinsic camera parameters, followed by automatic computation of the geometric transformation between the LiDAR and cameras, removing the need for specialised calibration targets and streamlining the setup. The data processing framework uses colour correction to ensure uniformity across camera feeds before fusion. The algorithm was tested using a Velodyne Puck Hi-Res LiDAR and a four-camera configuration. The optimised software achieved real-time performance and reliable colourisation even under very low illumination, successfully recovering scene details that would otherwise remain undetectable.
Intelligent Multi-link EDCA Optimization for Delay-Bounded QoS in Wi-Fi 7
IEEE 802.11be (Wi-Fi 7) introduces Multi-Link Operation (MLO) as a While MLO offers significant parallelism and capacity, realizing its full potential in guaranteeing strict delay bounds and optimizing Quality of Service (QoS) for diverse, heterogeneous traffic streams in complex multi-link scenarios remain a significant challenge. This is largely due to the limitations of static Enhanced Distributed Channel Access (EDCA) parameters and the complexity inherent in cross-link traffic management. To address this, this paper investigates the correlation between overall MLO QoS indicators and the configuration of EDCA parameters and Acess Catagory (AC) traffic allocation among links. Based on this analysis, we formulate a constrained optimization problem aiming to minimize the sum of overall packet loss rates for all access categories while satisfying their respective overall delay violation probability constraints. A Genetic Algorithm (GA)-based MLO EDCA QoS optimization algorithm is designed to efficiently search the complex configuration space of AC assignments and EDCA parameters. Experimental results demonstrate that the proposed approach's efficacy in generating adaptive MLO configuration strategies that align with diverse service requirements. The proposed solution significantly improves delay distribution characteristics, and enhance QoS robustness and resource utilization efficiency in high-load MLO environments.
Dynamic Causal Attack Graph based Cyber-security Risk Assessment Framework for CTCS System
Protecting the security of the train control system is a critical issue to ensure the safe and reliable operation of high-speed trains. Scientific modeling and analysis for the security risk is a promising way to guarantee system security. However, the representation and assessment of the multi-staged, causally related, and temporal-dynamic changed attack dependencies are difficult in the train control system. To solve the above challenges, a security assessment framework based on the Dynamical Causality Attack Graph (DCAG) model is introduced in this paper. Firstly, the DCAG model is generated based on the attack graph with consideration of temporal attack propagation and multi-stage attack event causality propagation. Then, the DCAG model is analyzed based on Bayesian inference and logic gateway-based inference. Through the case analysis of the CTCS-3 system, the security assessment framework is validated. With the DCAG-based security assessment framework, we can not only perform appropriate security risk quantification calculations, but also explore the importance of different attacks on system security risks, which is helpful in adjusting the cyber security defense policy.
Ingress Cryogenic Receivers Toward Scalable Quantum Information Processing: Theory and System Analysis
Current control techniques for cryogenically cooled qubits are realized with coaxial cables, posing multiple challenges in terms of cost, thermal load, size, and long-term scalability. Emerging approaches to tackle this issue include cryogenic CMOS electronics at 4 K, and photonic links for direct qubit control. In this paper, we propose a multiplexed all-passive cryogenic high frequency direct detection control platform (cryo-HFDD). The proposed classical interface for direct qubit control utilizes optical or sub-THz bands. We present the possible tradeoffs of this platform, and compare it with current state-of-the-art cryogenic CMOS and conventional coaxial approaches. We assess the feasibility of adopting these efficient links for a wide range of microwave qubit power levels. Specifically, we estimate the heat load to achieve the required signal-to-noise ratio SNR considering different noise sources, component losses, as well as link density. We show that multiplexed photonic receivers at 4 K can aggressively scale the control of thousands of qubits. This opens the door for low cost scalable quantum computing systems.
Beyond Point Estimates: Likelihood-Based Full-Posterior Wireless Localization
Modern wireless systems require not only position estimates, but also quantified uncertainty to support planning, control, and radio resource management. We formulate localization as posterior inference of an unknown transmitter location from receiver measurements. We propose Monte Carlo Candidate-Likelihood Estimation (MC-CLE), which trains a neural scoring network using Monte Carlo sampling to compare true and candidate transmitter locations. We show that in line-of-sight simulations with a multi-antenna receiver, MC-CLE learns critical properties including angular ambiguity and front-to-back antenna patterns. MC-CLE also achieves lower cross-entropy loss relative to a uniform baseline and Gaussian posteriors. alternatives under a uniform-loss metric.
Pinching-Antenna Systems (PASS)-Enabled UAV Delivery
A pinching-antenna systems (PASS)-enabled unmanned aerial vehicle (UAV) delivery framework is proposed, which exploits the capability of PASS to establish a strong line-of-sight link and reduce free-space pathloss.Aiming at minimizing the communication energy consumption in one cycle, a double-layer optimization (DLO) algorithm is developed by jointly optimizing the UAV delivery sequence and the pinching antenna (PA) activation vector. More specifically, at the outer layer, a hierarchical alternating optimization (HAO) scheme is proposed to tackle the NP-hard problem of delivery sequence planning, where a genetic algorithm performs global exploration to generate candidate solutions at the top-level, while a dynamic programming performs local refinement to obtain elite solutions at the lower-level. With determined UAV trajectory, at the inner layer, focus is placed on addressing the highly coupled mixed-integer nonlinear programming problem of PA activation vector optimization, where a pair of algorithms are proposed: 1) Branch-and-Bound (BnB) algorithm for finding global optimum; 2) incremental search and local refinement (ISLR) algorithm for reducing computational complexity. Simulation results indicate that: i) The proposed HAO-based delivery sequence planning scheme can effectively reduce the total flight distance, thereby decreasing flight time and communication energy consumption; ii) Both the proposed BnB and ISLR algorithms can achieve energy-efficient PA activation, with the former exhibiting better performance and the latter having lower complexity; iii) PASS outperforms the conventional multi-antenna systems, especially with higher communication rate requirements.
Policy Optimization in Robust Control: Weak Convexity and Subgradient Methods
Robust control seeks stabilizing policies that perform reliably under adversarial disturbances, with $\mathcal{H}_\infty$ control as a classical formulation. It is known that policy optimization of robust $\mathcal{H}_\infty$ control naturally lead to nonsmooth and nonconvex problems. This paper builds on recent advances in nonsmooth optimization to analyze discrete-time static output-feedback $\mathcal{H}_\infty$ control. We show that the $\mathcal{H}_\infty$ cost is weakly convex over any convex subset of a sublevel set. This structural property allows us to establish the first non-asymptotic deterministic convergence rate for the subgradient method under suitable assumptions. In addition, we prove a weak Polyak-{\L}ojasiewicz (PL) inequality in the state-feedback case, implying that all stationary points are globally optimal. We finally present a few numerical examples to validate the theoretical results.
comment: 9 pages, 11 figures
Unsupervised Detection of Spatiotemporal Anomalies in PMU Data Using Transformer-Based BiGAN
Ensuring power grid resilience requires the timely and unsupervised detection of anomalies in synchrophasor data streams. We introduce T-BiGAN, a novel framework that integrates window-attention Transformers within a bidirectional Generative Adversarial Network (BiGAN) to address this challenge. Its self-attention encoder-decoder architecture captures complex spatio-temporal dependencies across the grid, while a joint discriminator enforces cycle consistency to align the learned latent space with the true data distribution. Anomalies are flagged in real-time using an adaptive score that combines reconstruction error, latent space drift, and discriminator confidence. Evaluated on a realistic hardware-in-the-loop PMU benchmark, T-BiGAN achieves an ROC-AUC of 0.95 and an average precision of 0.996, significantly outperforming leading supervised and unsupervised methods. It shows particular strength in detecting subtle frequency and voltage deviations, demonstrating its practical value for live, wide-area monitoring without relying on manually labeled fault data.
Terahertz Quasi-BIC Metasurfaces for Ultra-Sensitive Biosensing and High-Speed Wireless Communications
Bound states in the continuum (BICs) have emerged as a revolutionary paradigm in terahertz (THz) photonics, enabling metasurfaces with theoretically infinite quality factors (Q-factors) and unprecedented light-matter control. This review synthesizes a decade of progress in THz-BIC research, tracing the evolution from foundational symmetry-protected designs to application-optimized quasi-BICs. We dissect multipolar origins, topological robustness, and symmetry-breaking strategies underpinning high-Q resonances, alongside computational frameworks for predictive design. The timeline highlights key milestones: early dielectric metasurfaces with high Q-factor, flexible biosensors achieving microgram detection limits, and Kerker-conditioned gas spectrometers reducing path lengths by few orders of magnitude. Emerging frontiers in reconfigurable MEMS-BICs and chiral quantum photonics are critically evaluated. Despite breakthroughs, scalability barriers persist for 6G integration, including nano-fabrication tolerances, material loss trade-offs, and dynamic control gaps. This review establishes BIC metasurfaces as pivotal enablers of compact, high-efficiency THz technologies poised to bridge the gap between fundamental discovery and commercialization of THz-based 6G communication and MedTech.
Combined Learning and Control: A New Paradigm for Optimal Control with Unknown Dynamics
In this paper, we present the combined learning-and-control (CLC) approach, which is a new way to solve optimal control problems with unknown dynamics by unifying model-based control and data-driven learning. The key idea is simple: we design a controller to be optimal for a proxy objective built on an available model while penalizing mismatches with the real system, so that the resulting controller is also optimal for the actual system. Building on the original CLC formulation, we demonstrate the framework to the linear quadratic regulator problem and make three advances: (i) we show that the CLC penalty is a sequence of stage-specific weights rather than a single constant; (ii) we identify when these weights can be set in advance and when they must depend on the (unknown) dynamics; and (iii) we develop a lightweight learning loop that tunes the weights directly from data without abandoning the benefits of a model-based design. We provide a complete algorithm and an empirical study against common baseline methods. The results clarify where prior knowledge suffices and where learning is essential, and they position CLC as a practical, theoretically grounded bridge between classical optimal control and modern learning methods.
comment: Submitted to ACC 2026
Asynchronous Nonlinear Sheaf Diffusion for Multi-Agent Coordination
Cellular sheaves and sheaf Laplacians provide a far-reaching generalization of graphs and graph Laplacians, resulting in a wide array of applications ranging from machine learning to multi-agent control. In the context of multi-agent systems, so called coordination sheaves provide a unifying formalism that models heterogeneous agents and coordination goals over undirected communication topologies, and applying sheaf diffusion drives agents to achieve their coordination goals. Existing literature on sheaf diffusion assumes that agents can communicate and compute updates synchronously, which is an unrealistic assumption in many scenarios where communication delays or heterogeneous agents with different compute capabilities cause disagreement among agents. To address these challenges, we introduce asynchronous nonlinear sheaf diffusion. Specifically, we show that under mild assumptions on the coordination sheaf and bounded delays in communication and computation, nonlinear sheaf diffusion converges to a minimizer of the Dirichlet energy of the coordination sheaf at a linear rate proportional to the delay bound. We further show that this linear convergence is attained from arbitrary initial conditions and the analysis depends on the spectrum of the sheaf Laplacian in a manner that generalizes the standard graph Laplacian case. We provide several numerical simulations to validate our theoretical results.
A Hierarchical Agentic Framework for Autonomous Drone-Based Visual Inspection
Autonomous inspection systems are essential for ensuring the performance and longevity of industrial assets. Recently, agentic frameworks have demonstrated significant potential for automating inspection workflows but have been limited to digital tasks. Their application to physical assets in real-world environments, however, remains underexplored. In this work, our contributions are two-fold: first, we propose a hierarchical agentic framework for autonomous drone control, and second, a reasoning methodology for individual function executions which we refer to as ReActEval. Our framework focuses on visual inspection tasks in indoor industrial settings, such as interpreting industrial readouts or inspecting equipment. It employs a multi-agent system comprising a head agent and multiple worker agents, each controlling a single drone. The head agent performs high-level planning and evaluates outcomes, while worker agents implement ReActEval to reason over and execute low-level actions. Operating entirely in natural language, ReActEval follows a plan, reason, act, evaluate cycle, enabling drones to handle tasks ranging from simple navigation (e.g., flying forward 10 meters and land) to complex high-level tasks (e.g., locating and reading a pressure gauge). The evaluation phase serves as a feedback and/or replanning stage, ensuring actions align with user objectives while preventing undesirable outcomes. We evaluate the framework in a simulated environment with two worker agents, assessing performance qualitatively and quantitatively based on task completion across varying complexity levels and workflow efficiency. By leveraging natural language processing for agent communication, our approach offers a novel, flexible, and user-accessible alternative to traditional drone-based solutions, enabling autonomous problem-solving for industrial inspection without extensive user intervention.
Robust Attitude Control of Nonlinear Multi-Rotor Dynamics with LFT Models and $\mathcal{H}_\infty$ Performance
Attitude stabilization of unmanned aerial vehicles in uncertain environments presents significant challenges due to nonlinear dynamics, parameter variations, and sensor limitations. This paper presents a comparative study of $\mathcal{H}_\infty$ and classical PID controllers for multi-rotor attitude regulation in the presence of wind disturbances and gyroscope noise. The flight dynamics are modeled using a linear parameter-varying (LPV) framework, where nonlinearities and parameter variations are systematically represented as structured uncertainties within a linear fractional transformation formulation. A robust controller based on $\mathcal{H}_\infty$ formulation is designed using only gyroscope measurements to ensure guaranteed performance bounds. Nonlinear simulation results demonstrate the effectiveness of the robust controllers compared to classical PID control, showing significant improvement in attitude regulation under severe wind disturbances.
comment: 6 pages, 6 figures, 3 tables, submitted to ACC 2026
A Review of Software for Designing and Operating Quantum Networks
Quantum network protocol development is crucial to realizing a production-grade network that can support distributed sensing, secure communication, and utility-scale quantum computation. However, the transition from laboratory demonstration to deployable networks requires software implementations of architectures and protocols tailored to the unique constraints of quantum systems. This paper reviews the current state of software implementations for quantum networks, organized around the three-plane abstraction of infrastructure, logical, and control/service planes. We cover software for both designing quantum network protocols (e.g., SeQUeNCe, QuISP, and NetSquid) and operating them, with a focus on essential control/service plane functions such as entanglement, topology, and resource management, in a proposed taxonomy. Our review highlights a persistent gap between theoretical protocol proposals and their realization in simulators or testbeds, particularly in dynamic topology and network management. We conclude by outlining open challenges and proposing a roadmap for developing scalable software architectures to enable hybrid, large-scale quantum networks.
comment: 12 pages, 5 figures, 3 tables, journal
A Novel Robust Control Method Combining DNN-Based NMPC Approximation and PI Control: Application to Exoskeleton Squat Movements
Nonlinear Model Predictive Control (NMPC) is a precise controller, but its heavy computational load often prevents application in robotic systems. Some studies have attempted to approximate NMPC using deep neural networks (NMPC-DNN). However, in the presence of unexpected disturbances or when operating conditions differ from training data, this approach lacks robustness, leading to large tracking errors. To address this issue, for the first time, the NMPC-DNN output is combined with a PI controller (Hybrid NMPC-DNN-PI). The proposed controller is validated by applying it to an exoskeleton robot during squat movement, which has a complex dynamic model and has received limited attention regarding robust nonlinear control design. A human-robot dynamic model with three active joints (ankle, knee, hip) is developed, and more than 5.3 million training samples are used to train the DNN. The results show that, under unseen conditions for the DNN, the tracking error in Hybrid NMPC-DNN-PI is significantly lower compared to NMPC-DNN. Moreover, human joint torques are greatly reduced with the use of the exoskeleton, with RMS values for the studied case reduced by 30.9%, 41.8%, and 29.7% at the ankle, knee, and hip, respectively. In addition, the computational cost of Hybrid NMPC-DNN-PI is 99.93% lower than that of NMPC.
Dynamic Modeling and Control System Analysis for Continuous-Disc Filters in Pulp Mill Operations
Vacuum disc filtration is critical in pulp mills for white liquor clarification and pulp washing, involving tightly coupled dynamics between rotational speed, vacuum pressure, slurry concentration, filtrate flow, and cake thickness. These nonlinear interactions are often regulated using empirical methods, lacking formal modeling and control. This article develops a dynamic, multivariable model of a continuous-disc filter (CD-filter) system based on first principles, simplified to a single representative disc for tractability. A linearized state-space model supports the design of two control strategies: a decentralized PI-based scheme and a centralized model predictive control (MPC). MATLAB-Simulink simulations reveal that MPC outperforms PI in tracking accuracy, overshoot reduction, and disturbance rejection. A 3D efficiency surface illustrates the importance of coordinating inlet flow and solids concentration. Results highlight the need for advanced multivariable control in optimizing CD-filter performance.
Event-Driven Control via Sparsity-Promoting Regularization: A Rollout Approach with Performance Guarantees
This paper presents a controller design framework aiming to balance control performance and actuation rate. Control performance is evaluated by an infinite-horizon average cost, and the number of control actions is penalized via sparsity-promoting regularization. Since the formulated optimal control problem has a combinatorial nature, we employ a rollout algorithm to obtain a tractable suboptimal solution. In the proposed scheme, actuation timings are determined through a multistage minimization procedure based on a receding-horizon approach, and the corresponding control inputs are computed online. We establish theoretical performance guarantees with respect to periodic control and prove the stability of the closed-loop system. The effectiveness of the proposed method is demonstrated through a numerical example.
comment: 14 pages
Simulated Annealing for Multi-Robot Ergodic Information Acquisition Using Graph-Based Discretization
One of the goals of active information acquisition using multi-robot teams is to keep the relative uncertainty in each region at the same level to maintain identical acquisition quality (e.g., consistent target detection) in all the regions. To achieve this goal, ergodic coverage can be used to assign the number of samples according to the quality of observation, i.e., sampling noise levels. However, the noise levels are unknown to the robots. Although this noise can be estimated from samples, the estimates are unreliable at first and can generate fluctuating values. The main contribution of this paper is to use simulated annealing to generate the target sampling distribution, starting from uniform and gradually shifting to an estimated optimal distribution, by varying the coldness parameter of a Boltzmann distribution with the estimated sampling entropy as energy. Simulation results show a substantial improvement of both transient and asymptotic entropy compared to both uniform and direct-ergodic searches. Finally, a demonstration is performed with a TurtleBot swarm system to validate the physical applicability of the algorithm.
MathBode: Frequency-Domain Fingerprints of LLM Mathematical Reasoning
This paper presents MathBode, a dynamic diagnostic for mathematical reasoning in large language models (LLMs). Instead of one-shot accuracy, MathBode treats each parametric problem as a system: we drive a single parameter sinusoidally and fit first-harmonic responses of model outputs and exact solutions. This yields interpretable, frequency-resolved metrics -- gain (amplitude tracking) and phase (lag) -- that form Bode-style fingerprints. Across five closed-form families (linear solve, ratio/saturation, compound interest, 2x2 linear systems, similar triangles), the diagnostic surfaces systematic low-pass behavior and growing phase lag that accuracy alone obscures. We compare several models against a symbolic baseline that calibrates the instrument ($G \approx 1$, $\phi \approx 0$). Results separate frontier from mid-tier models on dynamics, providing a compact, reproducible protocol that complements standard benchmarks with actionable measurements of reasoning fidelity and consistency. We open-source the dataset and code to enable further research and adoption.
Parasitic actuation delay limits the minimum employable time headway in connected and autonomous vehicles
Adaptive andcooperative adaptive cruise control (ACC and CACC) and next generation CACC (CACC+) systems usually employ a constant time headway policy (CTHP) for platooning of connected and autonomous vehicles (CAVs). In ACC, the ego vehicle uses onboard sensors to measure the position and velocity of the predecessor vehicle to maintain a desired spacing. The CACC and CACC+systems use additional information, such as acceleration(s) communicated through vehicle-to-vehicle (V2V) communication of the predecessor vehicle(s); these systems have been shown to result in improved spacing performance, throughput, and safety over ACC. Parasitic dynamics are generally difficult to model and the parasitic parameters (delay, lag, etc.) are difficult to obtain. Parasitic actuation delays can have deleterious effects and impose limits on the mobility and safety of CAVs. It is reasonable to assume that the bounds on parasitic actuation delays are known a priori. For CAVs, we need to address both internal stability and string stability in the presence of parasitic actuation delays. This requires robustness of string and internal stability for all values of parasitic actuation delays that are within the specified upper bound. In this paper, we provide the minimum employable time headway for ACC, CACC, and CACC+ (`r' predecessors look-ahead), respectively. The inclusion of the internal stability in the string stability condition is analyzed based on Pontryagin's interlacing theorem for time delay systems. We provide comparative numerical results to corroborate the achieved theoretical results.
Hierarchical Analysis and Control of Epidemic Spreading over Networks using Dissipativity and Mesh Stability
Analyzing and controlling spreading processes are challenging problems due to the involved non-linear node (subsystem) dynamics, unknown disturbances, complex interconnections, and the large-scale and multi-level nature of the problems. The dissipativity concept provides a practical framework for addressing such concerns, thanks to the energy-based representation it offers for subsystems and the compositional properties it provides for the analysis and control of interconnected (networked) systems comprised of such subsystems. Therefore, in this paper, we utilize the dissipativity concept to analyze and control a spreading process that occurs over a hierarchy of nodes, groups, and a network (i.e., a spreading network). We start by generalizing several existing results on dissipativity-based topology design for networked systems. Next, we model the considered spreading network as a networked system and establish the dissipativity properties of its nodes. The generalized topology design method is then applied at multiple levels of the considered spreading network to formulate its analysis and control problems as Linear Matrix Inequality (LMI) problems. We identify and enforce localized necessary conditions to support the feasibility of the LMI problem solved at each subsequent hierarchical level of the spreading network. Consequently, the proposed method does not involve iterative multi-level optimization stages that are computationally inefficient. The proposed control solution ensures that the spreading network is not only stable but also dissipative and mesh-stable. Compared to conventional methods, such as threshold pruning and high-degree edge removal, our approach offers superior performance in terms of infection containment, control efficiency, and disturbance robustness. Extensive numerical results demonstrate the effectiveness of the proposed technique.
comment: To be submitted to American Control Conference 2026
A Hybrid Systems Model of Feedback Optimization for Linear Systems: Convergence and Robustness
Feedback optimization algorithms compute inputs to a system using real-time output measurements, which helps mitigate the effects of disturbances. However, existing work often models both system dynamics and computations in either discrete or continuous time, which may not accurately model some applications. In this work, we model linear system dynamics in continuous time, and we model the computations of inputs in discrete time. Therefore, we present a novel hybrid systems model of feedback optimization. We first establish the well-posedness of this hybrid model and establish completeness of solutions while ruling out Zeno behavior. Then we show the state of the system converges exponentially fast to a ball of known radius about a desired goal state. Next we analytically show that this system is robust to perturbations in (i) the values of measured outputs, (ii) the matrices that model the linear time-invariant system, and (iii) the times at which inputs are applied to the system. Simulation results confirm that this approach successfully mitigates the effects of disturbances.
comment: 15 Pages, 2 Figures, submitted to American Control Conference 2026
MAPPO for Edge Server Monitoring
In this paper, we consider a goal-oriented communication problem for edge server monitoring, where jobs arrive intermittently at multiple dispatchers and must be assigned to shared edge servers with finite queues and time-varying availability. Accurate knowledge of server status is critical for sustaining high throughput, yet remains challenging under dynamic workloads and partial observability. To address this challenge, each dispatcher maintains server knowledge through two complementary mechanisms: (i) active status queries that provide instantaneous updates at a communication cost, and (ii) job execution feedback that reveals server conditions upon successful or failed job completion. We formulate a cooperative multi-agent distributed decision-making problem in which dispatchers jointly optimize query scheduling to balance throughput against communication overhead. To solve this problem, we propose a Multi-Agent Proximal Policy Optimization (MAPPO)-based algorithm that leverages centralized training with decentralized execution (CTDE) to learn distributed query-and-dispatch policies under partial and stale observations. Experiments show that MAPPO achieves superior throughput-cost tradeoffs and significantly outperforms baseline strategies across varying query costs, job arrival rates, and dispatchers.
comment: 6 pages, 4 figures. Accepted to IEEE MILCOM 2025
Robot Conga: A Leader-Follower Walking Approach to Sequential Path Following in Multi-Agent Systems
Coordinated path following in multi-agent systems is a key challenge in robotics, with applications in automated logistics, surveillance, and collaborative exploration. Traditional formation control techniques often rely on time-parameterized trajectories and path integrals, which can result in synchronization issues and rigid behavior. In this work, we address the problem of sequential path following, where agents maintain fixed spatial separation along a common trajectory, guided by a leader under centralized control. We introduce Robot Conga, a leader-follower control strategy that updates each agent's desired state based on the leader's spatial displacement rather than time, assuming access to a global position reference, an assumption valid in indoor environments equipped with motion capture, vision-based tracking, or UWB localization systems. The algorithm was validated in simulation using both TurtleBot3 and quadruped (Laikago) robots. Results demonstrate accurate trajectory tracking, stable inter-agent spacing, and fast convergence, with all agents aligning within 250 time steps (approx. 0.25 seconds) in the quadruped case, and almost instantaneously in the TurtleBot3 implementation.
comment: 6 Pages, 8 Figures. Both authors have contributed equally
Edge Server Monitoring for Job Assignment
In this paper, we study a goal-oriented communication problem for edge server monitoring, where compute jobs arrive intermittently at dispatchers and must be immediately assigned to distributed edge servers. Due to competing workloads and the dynamic nature of the edge environment, server availability fluctuates over time. To maintain accurate estimates of server availability states, each dispatcher updates its belief using two mechanisms: (i) active queries over shared communication channels and (ii) feedback from past job executions. We formulate a query scheduling problem that maximizes the job success rate under limited communication resources for queries. This problem is modeled as a Restless Multi-Armed Bandit (RMAB) with multiple actions and addressed using a Net-Gain Maximization (NGM) scheduling algorithm, which selects servers to query based on their expected improvement in execution performance. Simulation results show that the proposed NGM Policy significantly outperforms baseline strategies, achieving up to a 30% gain over the Round-Robin Policy and up to a 107% gain over the Never-Query Policy.
comment: Accepted to IEEE MILCOM 2025 (Networking Protocols and Performance Track), 6 pages, 2 figures
On Fixed-Time Stability for a Class of Singularly Perturbed Systems using Composite Lyapunov Functions
Fixed-time stable dynamical systems are capable of achieving exact convergence to an equilibrium point within a fixed time that is independent of the initial conditions of the system. This property makes them highly appealing for designing control, estimation, and optimization algorithms in applications with stringent performance requirements. However, the set of tools available for analyzing the interconnection of fixed-time stable systems is rather limited compared to their asymptotic counterparts. In this paper, we address some of these limitations by exploiting the emergence of multiple time scales in nonlinear singularly perturbed dynamical systems, where the fast dynamics and the slow dynamics are fixed-time stable on their own. By extending the so-called composite Lyapunov method from asymptotic stability to the context of fixed-time stability, we provide a novel class of Lyapunov-based sufficient conditions to certify fixed-time stability in a class of singularly perturbed dynamical systems. The results are illustrated, analytically and numerically, using a fixed-time gradient flow system interconnected with a fixed-time plant and an additional high-order example.
Ocean Diviner: A Diffusion-Augmented Reinforcement Learning Framework for AUV Robust Control in Underwater Tasks
Autonomous Underwater Vehicles (AUVs) are essential for marine exploration, yet their control remains highly challenging due to nonlinear dynamics and uncertain environmental disturbances. This paper presents a diffusion-augmented Reinforcement Learning (RL) framework for robust AUV control, aiming to improve AUV's adaptability in dynamic underwater environments. The proposed framework integrates two core innovations: (1) A diffusion-based action generation framework that produces physically feasible and high-quality actions, enhanced by a high-dimensional state encoding mechanism combining current observations with historical states and actions through a novel diffusion U-Net architecture, significantly improving long-horizon planning capacity for robust control. (2) A sample-efficient hybrid learning architecture that synergizes diffusion-guided exploration with RL policy optimization, where the diffusion model generates diverse candidate actions and the RL critic selects the optimal action, achieving higher exploration efficiency and policy stability in dynamic underwater environments. Extensive simulation experiments validate the framework's superior robustness and flexibility, outperforming conventional control methods in challenging marine conditions, offering enhanced adaptability and reliability for AUV operations in underwater tasks. Finally, we will release the code publicly soon to support future research in this area.
comment: Jingzehua Xu, Guanwen Xie and Weiyi Liu contributed equally to this work
Data-Driven Estimation of Structured Singular Values
Estimating the size of the modeling error is crucial for robust control. Over the years, numerous metrics have been developed to quantify the model error in a control relevant manner. One of the most important such metrics is the structured singular value, as it leads to necessary and sufficient conditions for ensuring stability and robustness in feedback control under structured model uncertainty. Although the computation of the structured singular value is often intractable, lower and upper bounds for it can often be obtained if a model of the system is known. In this paper, we introduce a fully data-driven method to estimate a lower bound for the structured singular value, by conducting experiments on the system and applying power iterations to the collected data. Our numerical simulations demonstrate that this method effectively lower bounds the structured singular value, yielding results comparable to the MATLAB$^\copyright$ Robust Control Toolbox.
comment: Published in IEEE Control Systems Letters, vol. 9, pp. 1976-1981, 2025
Spatial exponential decay of perturbations in optimal control of general evolution equations
We analyze the robustness of optimally controlled evolution equations with respect to spatially localized perturbations. We prove that if the involved operators are domain-uniformly stabilizable and detectable, then these localized perturbations only have a local effect on the optimal solution. We characterize this domain-uniform stabilizability and detectability for the transport equation with constant transport velocity, showing that even for unitary semigroups, optimality implies exponential damping. We extend this result to the case of a space-dependent transport velocity. Finally we leverage the results for the transport equation to characterize domain-uniform stabilizability of the wave equation. Numerical examples in one space dimension complement the theoretical results.
comment: 53 pages, 5 figures
Neural Operator Feedback for a First-Order PIDE with Spatially-Varying State Delay
A transport PDE with a spatial integral and recirculation with constant delay has been a benchmark for neural operator approximations of PDE backstepping controllers. Introducing a spatially-varying delay into the model gives rise to a gain operator defined through integral equations which the operator's input -- the varying delay function -- enters in previously unencountered manners, including in the limits of integration and as the inverse of the `delayED time' function. This, in turn, introduces novel mathematical challenges in estimating the operator's Lipschitz constant. The backstepping kernel function having two branches endows the feedback law with a two-branch structure, where only one of the two feedback branches depends on both of the kernel branches. For this rich feedback structure, we propose a neural operator approximation of such a two-branch feedback law and prove the approximator to be semiglobally practically stabilizing. With numerical results we illustrate the training of the neural operator and its stabilizing capability.
comment: This 14 page paper contains 1 table and 9 figures
Optimal transmission expansion modestly reduces decarbonization costs of U.S. electricity
Expanding interregional transmission is widely viewed as essential for integrating clean energy into decarbonized power systems. Using the open-source Switch capacity expansion model with detailed representation of existing U.S. generation and transmission infrastructure, solar, wind, and storage resources, and hourly operations, we evaluate the role of transmission across least-cost, socially optimal, and zero-emissions scenarios for 2050. An optimal nationwide plan would more than triple interregional transmission capacity, yet this reduces the cost of a zero-emissions system by only 7% relative to relying on existing transmission, as storage, solar and wind siting, and nuclear generation serve as close substitutes. Regional cost and rent effects vary, with transmission generally favoring wind and hydrogen resources over solar and batteries. Sensitivity analysis shows diminishing returns: one-fifth of the benefits of full expansion can be achieved with one-twelfth of the added capacity, while cost reductions for batteries and hydrogen provide comparable or greater system savings than transmission. Reconductoring -- quadrupling line capacity at half the cost of new builds achieves nearly all the benefits of unconstrained expansion. These results suggest that while substantial transmission expansion is economically justified, a diverse set of flexibility resources can substitute for large-scale grid build-out, and the relative value of transmission is highly contingent on technological and cost developments.
comment: 30 pages, 10 figures, 1 table in main paper. Additional 11 pages including 7 additional figures and one table in the appendix
Optimized dynamic scheduling of an exclusive bus lane or high occupancy vehicle lane in a bimodal traffic corridor
Efficient management of traffic corridors is critical for sustaining urban mobility. Exclusive bus lane (EBL) and high occupancy vehicle lane (HOVL) are two prominent strategies for enhancing public transit services and alleviating congestion. EBLs prioritize bus transit by providing dedicated lanes for faster travel times, while HOVLs encourage carpooling by reserving lanes for high-occupancy vehicles. However, static implementations of these policies may underutilize road resources and disrupt general-purpose lanes. Dynamic implementation, based on real-time demand, can potentially maximize road efficiency and minimize negative impacts. This study first compares the mixed traffic policy (MTP), exclusive bus lane policy (EBLP), and high occupancy vehicle lane policy (HOVLP) by formulating the total system costs in the context of a bimodal traffic corridor involving private cars and public buses. Under each lane policy, the bus frequency is optimized together with the modal split equilibrium derived separately. Based on dynamic demand simulated using an Ornstein-Uhlenbeck (O-U) process, switching thresholds are then derived to identify optimal periods for implementing each policy. Results reveal significant reductions in total system costs with the proposed dynamic policy schedules. Compared to static implementations, the dynamic policy schedules achieve cost reductions of 11.9%, 6.70%, and 43.64% relative to MTP-only, EBLP-only, and HOVLP-only scenarios, respectively. Additionally, in two real case studies of existing EBL and HOVL operations in Seattle, the proposed dynamic policy reduces total costs by 32.5% and 28.6%, respectively. The findings provide valuable insights for policymakers and transit planners, offering a robust framework for dynamically scheduling and integrating EBL and HOVL policies to optimize urban corridor efficiency and reduce overall system costs.
comment: This version is identical to the manuscript submitted to International Journal of Sustainable Transportation. It corrects inconsistencies present in the previous arXiv version and is the version under journal review. No scientific content has been altered
Spectral Flow Learning Theory: Finite-Sample Guarantees for Vector-Field Identification
We study the identification of continuous-time vector fields from irregularly sampled trajectories. We introduce Spectral Flow Learning (SFL), which learns in a windowed flow space using a lag-linear label operator that aggregates lagged Koopman actions. We provide finite-sample high-probability (FS-HP) guarantees for the class of variable-step linear multistep methods (vLLM). The FS-HP rates are constructed using spectral regularization with qualification-controlled filters for flow predictors under standard source and filter assumptions. A multistep observability inequality links flow error to vector-field error and yields two-term bounds that combine a statistical rate with an explicit discretization bias from vLMM theory. This preliminary preprint states the results and sketches proofs, with full proofs and extensions deferred to a journal version.
Pinching-Antenna Systems (PASS): Power Radiation Model and Optimal Beamforming Design
Pinching-antenna systems (PASS) improve wireless links by configuring the locations of activated pinching antennas along dielectric waveguides, namely pinching beamforming. In this paper, a novel adjustable power radiation model is proposed for PASS, where power radiation ratios of pinching antennas can be flexibly controlled by tuning the spacing between pinching antennas and waveguides. A closed-form pinching antenna spacing arrangement strategy is derived to achieve the commonly assumed equal-power radiation. Based on this, a practical PASS framework relying on discrete activation is considered, where pinching antennas can only be activated among a set of predefined locations. A transmit power minimization problem is formulated, which jointly optimizes the transmit beamforming, pinching beamforming, and the numbers of activated pinching antennas, subject to each user's minimum rate requirement. (1) To solve the resulting highly coupled mixed-integer nonlinear programming (MINLP) problem, branch-and-bound (BnB)-based algorithms are proposed for both single-user and multi-user scenarios, which is guaranteed to converge to globally optimal solutions. (2) A low-complexity many-to-many matching algorithm is further developed. Combined with the Karush-Kuhn-Tucker (KKT) theory, locally optimal and pairwise-stable solutions are obtained within polynomial-time complexity. Simulation results demonstrate that: (i) PASS significantly outperforms conventional multi-antenna architectures, particularly when the number of users and the spatial range increase; and (ii) The proposed matching-based algorithm achieves near-optimal performance, resulting in only a slight performance loss while significantly reducing computational overheads. Code is available at https://github.com/xiaoxiaxusummer/PASS_Discrete
comment: [Update] Detailed proof and numerical verification of the adjustable power radiation model have been added. Code is available at https://github.com/xiaoxiaxusummer/PASS_Discrete
A Coordinated Routing Approach for Enhancing Bus Timeliness and Travel Efficiency in Mixed-Traffic Environment
In this paper, we propose a coordinated routing strategy aimed at improving bus schedule adherence and enhancing travel efficiency for connected and automated vehicles (CAVs) operating within a mixed-traffic urban network. Our approach capitalizes on the existence of dedicated lanes for buses and CAVs, leveraging real-time traffic data to dynamically reroute CAVs in anticipation of congestion. By continuously monitoring traffic conditions on dedicated lanes and tracking the real-time positions of buses, we enable the system to proactively adjust CAV routes when potential interference with bus operations is detected. This coordination mitigates delays affecting transit services and reduces travel time for CAVs. We evaluate the proposed strategy through simulation studies conducted in the SUMO. The results demonstrate significant improvements in both transit reliability and CAV operational performance across a range of traffic conditions.
Robotics
Fast Feature Field ($\text{F}^3$): A Predictive Representation of Events
This paper develops a mathematical argument and algorithms for building representations of data from event-based cameras, that we call Fast Feature Field ($\text{F}^3$). We learn this representation by predicting future events from past events and show that it preserves scene structure and motion information. $\text{F}^3$ exploits the sparsity of event data and is robust to noise and variations in event rates. It can be computed efficiently using ideas from multi-resolution hash encoding and deep sets - achieving 120 Hz at HD and 440 Hz at VGA resolutions. $\text{F}^3$ represents events within a contiguous spatiotemporal volume as a multi-channel image, enabling a range of downstream tasks. We obtain state-of-the-art performance on optical flow estimation, semantic segmentation, and monocular metric depth estimation, on data from three robotic platforms (a car, a quadruped robot and a flying platform), across different lighting conditions (daytime, nighttime), environments (indoors, outdoors, urban, as well as off-road) and dynamic vision sensors (resolutions and event rates). Our implementations can predict these tasks at 25-75 Hz at HD resolution.
comment: 39 pages, 9 figures
Safe Planning in Unknown Environments using Conformalized Semantic Maps
This paper addresses semantic planning problems in unknown environments under perceptual uncertainty. The environment contains multiple unknown semantically labeled regions or objects, and the robot must reach desired locations while maintaining class-dependent distances from them. We aim to compute robot paths that complete such semantic reach-avoid tasks with user-defined probability despite uncertain perception. Existing planning algorithms either ignore perceptual uncertainty - thus lacking correctness guarantees - or assume known sensor models and noise characteristics. In contrast, we present the first planner for semantic reach-avoid tasks that achieves user-specified mission completion rates without requiring any knowledge of sensor models or noise. This is enabled by quantifying uncertainty in semantic maps - constructed on-the-fly from perceptual measurements - using conformal prediction in a model- and distribution-free manner. We validate our approach and the theoretical mission completion rates through extensive experiments, showing that it consistently outperforms baselines in mission success rates.
comment: 8 pages, 5 figures, 2 algorithms, 1 table
Curriculum Imitation Learning of Distributed Multi-Robot Policies
Learning control policies for multi-robot systems (MRS) remains a major challenge due to long-term coordination and the difficulty of obtaining realistic training data. In this work, we address both limitations within an imitation learning framework. First, we shift the typical role of Curriculum Learning in MRS, from scalability with the number of robots, to focus on improving long-term coordination. We propose a curriculum strategy that gradually increases the length of expert trajectories during training, stabilizing learning and enhancing the accuracy of long-term behaviors. Second, we introduce a method to approximate the egocentric perception of each robot using only third-person global state demonstrations. Our approach transforms idealized trajectories into locally available observations by filtering neighbors, converting reference frames, and simulating onboard sensor variability. Both contributions are integrated into a physics-informed technique to produce scalable, distributed policies from observations. We conduct experiments across two tasks with varying team sizes and noise levels. Results show that our curriculum improves long-term accuracy, while our perceptual estimation method yields policies that are robust to realistic uncertainty. Together, these strategies enable the learning of robust, distributed controllers from global demonstrations, even in the absence of expert actions or onboard measurements.
comment: Accepted and presented at the Eight Iberian Robotics Conference, 2025
Crop Spirals: Re-thinking the field layout for future robotic agriculture
Conventional linear crop layouts, optimised for tractors, hinder robotic navigation with tight turns, long travel distances, and perceptual aliasing. We propose a robot-centric square spiral layout with a central tramline, enabling simpler motion and more efficient coverage. To exploit this geometry, we develop a navigation stack combining DH-ResNet18 waypoint regression, pixel-to-odometry mapping, A* planning, and model predictive control (MPC). In simulations, the spiral layout yields up to 28% shorter paths and about 25% faster execution for waypoint-based tasks across 500 waypoints than linear layouts, while full-field coverage performance is comparable to an optimised linear U-turn strategy. Multi-robot studies demonstrate efficient coordination on the spirals rule-constrained graph, with a greedy allocator achieving 33-37% lower batch completion times than a Hungarian assignment under our setup. These results highlight the potential of redesigning field geometry to better suit autonomous agriculture.
comment: Submitted to Computers and Electronics in Agriculture
AgriCruiser: An Open Source Agriculture Robot for Over-the-row Navigation
We present the AgriCruiser, an open-source over-the-row agricultural robot developed for low-cost deployment and rapid adaptation across diverse crops and row layouts. The chassis provides an adjustable track width of 1.42 m to 1.57 m, along with a ground clearance of 0.94 m. The AgriCruiser achieves compact pivot turns with radii of 0.71 m to 0.79 m, enabling efficient headland maneuvers. The platform is designed for the integration of the other subsystems, and in this study, a precision spraying system was implemented to assess its effectiveness in weed management. In twelve flax plots, a single robotic spray pass reduced total weed populations (pigweed and Venice mallow) by 24- to 42-fold compared to manual weeding in four flax plots, while also causing less crop damage. Mobility experiments conducted on concrete, asphalt, gravel, grass, and both wet and dry soil confirmed reliable traversal consistent with torque sizing. The complete chassis can be constructed from commodity T-slot extrusion with minimal machining, resulting in a bill of materials costing approximately $5,000 - $6,000, which enables replication and customization. The mentioned results demonstrate that low-cost, reconfigurable over-the-row robots can achieve effective weed management with reduced crop damage and labor requirements, while providing a versatile foundation for phenotyping, sensing, and other agriculture applications. Design files and implementation details are released to accelerate research and adoption of modular agricultural robotics.
comment: GitHub: https://github.com/structuresComp/agri-cruiser
Safety-Critical Input-Constrained Nonlinear Intercept Guidance in Multiple Engagement Zones
This paper presents an input-constrained nonlinear guidance law to address the problem of intercepting a stationary target in contested environments with multiple defending agents. Contrary to prior approaches that rely on explicit knowledge of defender strategies or utilize conservative safety conditions based on a defender's range, our work characterizes defender threats geometrically through engagement zones that delineate inevitable interception regions. Outside these engagement zones, the interceptor remains invulnerable. The proposed guidance law switches between a repulsive safety maneuver near these zones and a pursuit maneuver outside their influence. To deal with multiple engagement zones, we employ a smooth minimum function (log-sum-exponent approximation) that aggregates threats from all the zones while prioritizing the most critical threats. Input saturation is modeled and embedded in the non-holonomic vehicle dynamics so the controller respects actuator limits while maintaining stability. Numerical simulations with several defenders demonstrate the proposed method's ability to avoid engagement zones and achieve interception across diverse initial conditions.
AIRoA MoMa Dataset: A Large-Scale Hierarchical Dataset for Mobile Manipulation
As robots transition from controlled settings to unstructured human environments, building generalist agents that can reliably follow natural language instructions remains a central challenge. Progress in robust mobile manipulation requires large-scale multimodal datasets that capture contact-rich and long-horizon tasks, yet existing resources lack synchronized force-torque sensing, hierarchical annotations, and explicit failure cases. We address this gap with the AIRoA MoMa Dataset, a large-scale real-world multimodal dataset for mobile manipulation. It includes synchronized RGB images, joint states, six-axis wrist force-torque signals, and internal robot states, together with a novel two-layer annotation schema of sub-goals and primitive actions for hierarchical learning and error analysis. The initial dataset comprises 25,469 episodes (approx. 94 hours) collected with the Human Support Robot (HSR) and is fully standardized in the LeRobot v2.1 format. By uniquely integrating mobile manipulation, contact-rich interaction, and long-horizon structure, AIRoA MoMa provides a critical benchmark for advancing the next generation of Vision-Language-Action models. The first version of our dataset is now available at https://huggingface.co/datasets/airoa-org/airoa-moma .
Path Diffuser: Diffusion Model for Data-Driven Traffic Simulator
Simulating diverse and realistic traffic scenarios is critical for developing and testing autonomous planning. Traditional rule-based planners lack diversity and realism, while learning-based simulators often replay, forecast, or edit scenarios using historical agent trajectories. However, they struggle to generate new scenarios, limiting scalability and diversity due to their reliance on fully annotated logs and historical data. Thus, a key challenge for a learning-based simulator's performance is that it requires agents' past trajectories and pose information in addition to map data, which might not be available for all agents on the road.Without which, generated scenarios often produce unrealistic trajectories that deviate from drivable areas, particularly under out-of-distribution (OOD) map scenes (e.g., curved roads). To address this, we propose Path Diffuser (PD): a two-stage, diffusion model for generating agent pose initializations and their corresponding trajectories conditioned on the map, free of any historical context of agents' trajectories. Furthermore, PD incorporates a motion primitive-based prior, leveraging Frenet frame candidate trajectories to enhance diversity while ensuring road-compliant trajectory generation. We also explore various design choices for modeling complex multi-agent interactions. We demonstrate the effectiveness of our method through extensive experiments on the Argoverse2 Dataset and additionally evaluate the generalizability of the approach on OOD map variants. Notably, Path Diffuser outperforms the baseline methods by 1.92x on distribution metrics, 1.14x on common-sense metrics, and 1.62x on road compliance from adversarial benchmarks.
Annotation-Free One-Shot Imitation Learning for Multi-Step Manipulation Tasks
Recent advances in one-shot imitation learning have enabled robots to acquire new manipulation skills from a single human demonstration. While existing methods achieve strong performance on single-step tasks, they remain limited in their ability to handle long-horizon, multi-step tasks without additional model training or manual annotation. We propose a method that can be applied to this setting provided a single demonstration without additional model training or manual annotation. We evaluated our method on multi-step and single-step manipulation tasks where our method achieves an average success rate of 82.5% and 90%, respectively. Our method matches and exceeds the performance of the baselines in both these cases. We also compare the performance and computational efficiency of alternative pre-trained feature extractors within our framework.
MSG: Multi-Stream Generative Policies for Sample-Efficient Robotic Manipulation
Generative robot policies such as Flow Matching offer flexible, multi-modal policy learning but are sample-inefficient. Although object-centric policies improve sample efficiency, it does not resolve this limitation. In this work, we propose Multi-Stream Generative Policy (MSG), an inference-time composition framework that trains multiple object-centric policies and combines them at inference to improve generalization and sample efficiency. MSG is model-agnostic and inference-only, hence widely applicable to various generative policies and training paradigms. We perform extensive experiments both in simulation and on a real robot, demonstrating that our approach learns high-quality generative policies from as few as five demonstrations, resulting in a 95% reduction in demonstrations, and improves policy performance by 89 percent compared to single-stream approaches. Furthermore, we present comprehensive ablation studies on various composition strategies and provide practical recommendations for deployment. Finally, MSG enables zero-shot object instance transfer. We make our code publicly available at https://msg.cs.uni-freiburg.de.
World-Env: Leveraging World Model as a Virtual Environment for VLA Post-Training
Vision-Language-Action (VLA) models trained via imitation learning suffer from significant performance degradation in data-scarce scenarios due to their reliance on large-scale demonstration datasets. Although reinforcement learning (RL)-based post-training has proven effective in addressing data scarcity, its application to VLA models is hindered by the non-resettable nature of real-world environments. This limitation is particularly critical in high-risk domains such as industrial automation, where interactions often induce state changes that are costly or infeasible to revert. Furthermore, existing VLA approaches lack a reliable mechanism for detecting task completion, leading to redundant actions that reduce overall task success rates. To address these challenges, we propose World-Env, an RL-based post-training framework that replaces physical interaction with a low-cost, world model-based virtual simulator. World-Env consists of two key components: (1) a video-based world simulator that generates temporally consistent future visual observations, and (2) a vision-language model (VLM)-guided instant reflector that provides continuous reward signals and predicts action termination. This simulated environment enables VLA models to safely explore and generalize beyond their initial imitation learning distribution. Our method achieves notable performance gains with as few as five expert demonstrations per task. Experiments on complex robotic manipulation tasks demonstrate that World-Env effectively overcomes the data inefficiency, safety constraints, and inefficient execution of conventional VLA models that rely on real-world interaction, offering a practical and scalable solution for post-training in resource-constrained settings.
Trajectory Prediction via Bayesian Intention Inference under Unknown Goals and Kinematics
This work introduces an adaptive Bayesian algorithm for real-time trajectory prediction via intention inference, where a target's intentions and motion characteristics are unknown and subject to change. The method concurrently estimates two critical variables: the target's current intention, modeled as a Markovian latent state, and an intention parameter that describes the target's adherence to a shortest-path policy. By integrating this joint update technique, the algorithm maintains robustness against abrupt intention shifts and unknown motion dynamics. A sampling-based trajectory prediction mechanism then exploits these adaptive estimates to generate probabilistic forecasts with quantified uncertainty. We validate the framework through numerical experiments: Ablation studies of two cases, and a 500-trial Monte Carlo analysis; Hardware demonstrations on quadrotor and quadrupedal platforms. Experimental results demonstrate that the proposed approach significantly outperforms non-adaptive and partially adaptive methods. The method operates in real time around 270 Hz without requiring training or detailed prior knowledge of target behavior, showcasing its applicability in various robotic systems.
When Autonomous Vehicle Meets V2X Cooperative Perception: How Far Are We?
With the tremendous advancement of deep learning and communication technology, Vehicle-to-Everything (V2X) cooperative perception has the potential to address limitations in sensing distant objects and occlusion for a single-agent perception system. V2X cooperative perception systems are software systems characterized by diverse sensor types and cooperative agents, varying fusion schemes, and operation under different communication conditions. Therefore, their complex composition gives rise to numerous operational challenges. Furthermore, when cooperative perception systems produce erroneous predictions, the types of errors and their underlying causes remain insufficiently explored. To bridge this gap, we take an initial step by conducting an empirical study of V2X cooperative perception. To systematically evaluate the impact of cooperative perception on the ego vehicle's perception performance, we identify and analyze six prevalent error patterns in cooperative perception systems. We further conduct a systematic evaluation of the critical components of these systems through our large-scale study and identify the following key findings: (1) The LiDAR-based cooperation configuration exhibits the highest perception performance; (2) Vehicle-to-infrastructure (V2I) and vehicle-to-vehicle (V2V) communication exhibit distinct cooperative perception performance under different fusion schemes; (3) Increased cooperative perception errors may result in a higher frequency of driving violations; (4) Cooperative perception systems are not robust against communication interference when running online. Our results reveal potential risks and vulnerabilities in critical components of cooperative perception systems. We hope that our findings can better promote the design and repair of cooperative perception systems.
comment: The paper has been accepted by the 40th IEEE/ACM International Conference on Automated Software Engineering, ASE 2025
CineWild: Balancing Art and Robotics for Ethical Wildlife Documentary Filmmaking
Drones, or unmanned aerial vehicles (UAVs), have become powerful tools across domains-from industry to the arts. In documentary filmmaking, they offer dynamic, otherwise unreachable perspectives, transforming how stories are told. Wildlife documentaries especially benefit, yet drones also raise ethical concerns: the risk of disturbing the animals they aim to capture. This paper introduces CineWild, an autonomous UAV framework that combines robotics, cinematography, and ethics. Built on model predictive control, CineWild dynamically adjusts flight paths and camera settings to balance cinematic quality with animal welfare. Key features include adaptive zoom for filming from acoustic and visual safe distances, path-planning that avoids an animal's field of view, and smooth, low-noise maneuvers. CineWild exemplifies interdisciplinary innovation-bridging engineering, visual storytelling, and environmental ethics. We validate the system through simulation studies and will release the code upon acceptance.
From Code to Action: Hierarchical Learning of Diffusion-VLM Policies
Imitation learning for robotic manipulation often suffers from limited generalization and data scarcity, especially in complex, long-horizon tasks. In this work, we introduce a hierarchical framework that leverages code-generating vision-language models (VLMs) in combination with low-level diffusion policies to effectively imitate and generalize robotic behavior. Our key insight is to treat open-source robotic APIs not only as execution interfaces but also as sources of structured supervision: the associated subtask functions - when exposed - can serve as modular, semantically meaningful labels. We train a VLM to decompose task descriptions into executable subroutines, which are then grounded through a diffusion policy trained to imitate the corresponding robot behavior. To handle the non-Markovian nature of both code execution and certain real-world tasks, such as object swapping, our architecture incorporates a memory mechanism that maintains subtask context across time. We find that this design enables interpretable policy decomposition, improves generalization when compared to flat policies and enables separate evaluation of high-level planning and low-level control.
comment: 19 pages including references, 6 figures. Accepted to CoRL LEAP 2025
Real-time Recognition of Human Interactions from a Single RGB-D Camera for Socially-Aware Robot Navigation
{Recognizing human interactions is essential for social robots as it enables them to navigate safely and naturally in shared environments. Conventional robotic systems however often focus on obstacle avoidance, neglecting social cues necessary for seamless human-robot interaction. To address this gap, we propose a framework to recognize human group interactions for socially aware navigation. Our method utilizes color and depth frames from a monocular RGB-D camera to estimate 3D human keypoints and positions. Principal component analysis (PCA) is then used to determine dominant interaction directions. The shoelace formula is finally applied to compute interest points and engagement areas. Extensive experiments have been conducted to evaluate the validity of the proposed method. The results show that our method is capable of recognizing group interactions across different scenarios with varying numbers of individuals. It also achieves high-speed performance, processing each frame in approximately 4 ms on a single-board computer used in robotic systems. The method is implemented as a ROS 2 package making it simple to integrate into existing navigation systems. Source code is available at https://github.com/thanhlong103/social-interaction-detector
DRCP: Diffusion on Reinforced Cooperative Perception for Perceiving Beyond Limits
Cooperative perception enabled by Vehicle-to-Everything communication has shown great promise in enhancing situational awareness for autonomous vehicles and other mobile robotic platforms. Despite recent advances in perception backbones and multi-agent fusion, real-world deployments remain challenged by hard detection cases, exemplified by partial detections and noise accumulation which limit downstream detection accuracy. This work presents Diffusion on Reinforced Cooperative Perception (DRCP), a real-time deployable framework designed to address aforementioned issues in dynamic driving environments. DRCP integrates two key components: (1) Precise-Pyramid-Cross-Modality-Cross-Agent, a cross-modal cooperative perception module that leverages camera-intrinsic-aware angular partitioning for attention-based fusion and adaptive convolution to better exploit external features; and (2) Mask-Diffusion-Mask-Aggregation, a novel lightweight diffusion-based refinement module that encourages robustness against feature perturbations and aligns bird's-eye-view features closer to the task-optimal manifold. The proposed system achieves real-time performance on mobile platforms while significantly improving robustness under challenging conditions. Code will be released in late 2025.
JuggleRL: Mastering Ball Juggling with a Quadrotor via Deep Reinforcement Learning
Aerial robots interacting with objects must perform precise, contact-rich maneuvers under uncertainty. In this paper, we study the problem of aerial ball juggling using a quadrotor equipped with a racket, a task that demands accurate timing, stable control, and continuous adaptation. We propose JuggleRL, the first reinforcement learning-based system for aerial juggling. It learns closed-loop policies in large-scale simulation using systematic calibration of quadrotor and ball dynamics to reduce the sim-to-real gap. The training incorporates reward shaping to encourage racket-centered hits and sustained juggling, as well as domain randomization over ball position and coefficient of restitution to enhance robustness and transferability. The learned policy outputs mid-level commands executed by a low-level controller and is deployed zero-shot on real hardware, where an enhanced perception module with a lightweight communication protocol reduces delays in high-frequency state estimation and ensures real-time control. Experiments show that JuggleRL achieves an average of $311$ hits over $10$ consecutive trials in the real world, with a maximum of $462$ hits observed, far exceeding a model-based baseline that reaches at most $14$ hits with an average of $3.1$. Moreover, the policy generalizes to unseen conditions, successfully juggling a lighter $5$ g ball with an average of $145.9$ hits. This work demonstrates that reinforcement learning can empower aerial robots with robust and stable control in dynamic interaction tasks.
ThermalGen: Style-Disentangled Flow-Based Generative Models for RGB-to-Thermal Image Translation NeurIPS 2025
Paired RGB-thermal data is crucial for visual-thermal sensor fusion and cross-modality tasks, including important applications such as multi-modal image alignment and retrieval. However, the scarcity of synchronized and calibrated RGB-thermal image pairs presents a major obstacle to progress in these areas. To overcome this challenge, RGB-to-Thermal (RGB-T) image translation has emerged as a promising solution, enabling the synthesis of thermal images from abundant RGB datasets for training purposes. In this study, we propose ThermalGen, an adaptive flow-based generative model for RGB-T image translation, incorporating an RGB image conditioning architecture and a style-disentangled mechanism. To support large-scale training, we curated eight public satellite-aerial, aerial, and ground RGB-T paired datasets, and introduced three new large-scale satellite-aerial RGB-T datasets--DJI-day, Bosonplus-day, and Bosonplus-night--captured across diverse times, sensor types, and geographic regions. Extensive evaluations across multiple RGB-T benchmarks demonstrate that ThermalGen achieves comparable or superior translation performance compared to existing GAN-based and diffusion-based methods. To our knowledge, ThermalGen is the first RGB-T image translation model capable of synthesizing thermal images that reflect significant variations in viewpoints, sensor characteristics, and environmental conditions. Project page: http://xjh19971.github.io/ThermalGen
comment: 23 pages including the checklist and appendix. Accepted at NeurIPS 2025
Finding an Initial Probe Pose in Teleoperated Robotic Echocardiography via 2D LiDAR-Based 3D Reconstruction
Echocardiography is a key imaging modality for cardiac assessment but remains highly operator-dependent, and access to trained sonographers is limited in underserved settings. Teleoperated robotic echocardiography has been proposed as a solution; however, clinical studies report longer examination times than manual procedures, increasing diagnostic delays and operator workload. Automating non-expert tasks, such as automatically moving the probe to an ideal starting pose, offers a pathway to reduce this burden. Prior vision- and depth-based approaches to estimate an initial probe pose are sensitive to lighting, texture, and anatomical variability. We propose a robot-mounted 2D LiDAR-based approach that reconstructs the chest surface in 3D and estimates the initial probe pose automatically. To the best of our knowledge, this is the first demonstration of robot-mounted 2D LiDAR used for 3D reconstruction of a human body surface. Through plane-based extrinsic calibration, the transformation between the LiDAR and robot base frames was estimated with an overall root mean square (RMS) residual of 1.8 mm and rotational uncertainty below 0.2{\deg}. The chest front surface, reconstructed from two linear LiDAR sweeps, was aligned with non-rigid templates to identify an initial probe pose. A mannequin-based study assessing reconstruction accuracy showed mean surface errors of 2.78 +/- 0.21 mm. Human trials (N=5) evaluating the proposed approach found probe initial points typically 20-30 mm from the clinically defined initial point, while the variation across repeated trials on the same subject was less than 4 mm.
comment: This work has been submitted to the IEEE for possible publication
Towards Modular and Accessible AUV Systems
This paper reports the development of a new open-access modular framework, called Marine Vehicle Packages (MVP), for Autonomous Underwater Vehicles. The framework consists of both software and hardware designs allowing easy construction of AUV for research with increased customizability and sufficient payload capacity. This paper will present the scalable hardware system design and the modular software design architecture. New features, such as articulated thruster integration and high-level Graphic User Interface will be discussed. Both simulation and field experiments results are shown to highlight the performance and compatibility of the MVP.
comment: 6 pages, accepted by 2024 IEEE/OES Autonomous Underwater Vehicles Symposium (AUV)
Fidelity-Aware Data Composition for Robust Robot Generalization
Generalist robot policies trained on large-scale, visually homogeneous datasets can be susceptible to shortcut learning, which impairs their out-of-distribution (OOD) generalization. While generative data augmentation is a common approach to introduce diversity, it presents a subtle challenge: data composition. Naively mixing real and synthetic data can corrupt the learning signal, as this process often prioritizes visual diversity at the expense of information fidelity. This paper suggests that robust generalization depends on principled, fidelity-aware data composition. We introduce Coherent Information Fidelity Tuning (CIFT), a framework that treats data composition as an optimization problem. CIFT uses a practical proxy for Information Fidelity based on the feature-space geometry of a dataset. This enables the identification of a phase transition, termed the Decoherence Point, where training stability degrades. The framework includes a generative engine, Multi-View Video Augmentation (MVAug), to synthesize a causally disentangled data spectrum for this tuning process. Applying CIFT to policy architectures such as $\pi_0$ and Diffusion Policy improves OOD success rates by over 54\%. These results indicate that fidelity-aware composition, beyond data synthesis alone, is an important component for developing robust, general-purpose robots.
comment: 33 pages
IA-VLA: Input Augmentation for Vision-Language-Action models in settings with semantically complex tasks ICRA 2026
Vision-language-action models (VLAs) have become an increasingly popular approach for addressing robot manipulation problems in recent years. However, such models need to output actions at a rate suitable for robot control, which limits the size of the language model they can be based on, and consequently, their language understanding capabilities. Manipulation tasks may require complex language instructions, such as identifying target objects by their relative positions, to specify human intention. Therefore, we introduce IA-VLA, a framework that utilizes the extensive language understanding of a large vision language model as a pre-processing stage to generate improved context to augment the input of a VLA. We evaluate the framework on a set of semantically complex tasks which have been underexplored in VLA literature, namely tasks involving visual duplicates, i.e., visually indistinguishable objects. A dataset of three types of scenes with duplicate objects is used to compare a baseline VLA against two augmented variants. The experiments show that the VLA benefits from the augmentation scheme, especially when faced with language instructions that require the VLA to extrapolate from concepts it has seen in the demonstrations. For the code, dataset, and videos, see https://sites.google.com/view/ia-vla.
comment: Under review for ICRA 2026
SSR-ZSON: Zero-Shot Object Navigation via Spatial-Semantic Relations within a Hierarchical Exploration Framework
Zero-shot object navigation in unknown environments presents significant challenges, mainly due to two key limitations: insufficient semantic guidance leads to inefficient exploration, while limited spatial memory resulting from environmental structure causes entrapment in local regions. To address these issues, we propose SSR-ZSON, a spatial-semantic relative zero-shot object navigation method based on the TARE hierarchical exploration framework, integrating a viewpoint generation strategy balancing spatial coverage and semantic density with an LLM-based global guidance mechanism. The performance improvement of the proposed method is due to two key innovations. First, the viewpoint generation strategy prioritizes areas of high semantic density within traversable sub-regions to maximize spatial coverage and minimize invalid exploration. Second, coupled with an LLM-based global guidance mechanism, it assesses semantic associations to direct navigation toward high-value spaces, preventing local entrapment and ensuring efficient exploration. Deployed on hybrid Habitat-Gazebo simulations and physical platforms, SSR-ZSON achieves real-time operation and superior performance. On Matterport3D and Habitat-Matterport3D datasets, it improves the Success Rate(SR) by 18.5\% and 11.2\%, and the Success weighted by Path Length(SPL) by 0.181 and 0.140, respectively, over state-of-the-art methods.
APREBot: Active Perception System for Reflexive Evasion Robot
Reliable onboard perception is critical for quadruped robots navigating dynamic environments, where obstacles can emerge from any direction under strict reaction-time constraints. Single-sensor systems face inherent limitations: LiDAR provides omnidirectional coverage but lacks rich texture information, while cameras capture high-resolution detail but suffer from restricted field of view. We introduce APREBot (Active Perception System for Reflexive Evasion Robot), a novel framework that integrates reflexive evasion with active hierarchical perception. APREBot strategically combines LiDAR-based omnidirectional scanning with camera-based active focusing, achieving comprehensive environmental awareness essential for agile obstacle avoidance in quadruped robots. We validate APREBot through extensive sim-to-real experiments on a quadruped platform, evaluating diverse obstacle types, trajectories, and approach directions. Our results demonstrate substantial improvements over state-of-the-art baselines in both safety metrics and operational efficiency, highlighting APREBot's potential for dependable autonomy in safety-critical scenarios. Videos are available at https://sites.google.com/view/aprebot/
Evaluation of Polarimetric Fusion for Semantic Segmentation in Aquatic Environments
Accurate segmentation of floating debris on water is often compromised by surface glare and changing outdoor illumination. Polarimetric imaging offers a single-sensor route to mitigate water-surface glare that disrupts semantic segmentation of floating objects. We benchmark state-of-the-art fusion networks on PoTATO, a public dataset of polarimetric images of plastic bottles in inland waterways, and compare their performance with single-image baselines using traditional models. Our results indicate that polarimetric cues help recover low-contrast objects and suppress reflection-induced false positives, raising mean IoU and lowering contour error relative to RGB inputs. These sharper masks come at a cost: the additional channels enlarge the models increasing the computational load and introducing the risk of new false positives. By providing a reproducible, diagnostic benchmark and publicly available code, we hope to help researchers choose if polarized cameras are suitable for their applications and to accelerate related research.
comment: Accepted to VCIP 2025
Discrete Variational Autoencoding via Policy Search
Discrete latent bottlenecks in variational autoencoders (VAEs) offer high bit efficiency and can be modeled with autoregressive discrete distributions, enabling parameter-efficient multimodal search with transformers. However, discrete random variables do not allow for exact differentiable parameterization; therefore, discrete VAEs typically rely on approximations, such as Gumbel-Softmax reparameterization or straight-through gradient estimates, or employ high-variance gradient-free methods such as REINFORCE that have had limited success on high-dimensional tasks such as image reconstruction. Inspired by popular techniques in policy search, we propose a training framework for discrete VAEs that leverages the natural gradient of a non-parametric encoder to update the parametric encoder without requiring reparameterization. Our method, combined with automatic step size adaptation and a transformer-based encoder, scales to challenging datasets such as ImageNet and outperforms both approximate reparameterization methods and quantization-based discrete autoencoders in reconstructing high-dimensional data from compact latent spaces, achieving a 20% improvement on FID Score for ImageNet 256.
LLM-Handover:Exploiting LLMs for Task-Oriented Robot-Human Handovers
Effective human-robot collaboration depends on task-oriented handovers, where robots present objects in ways that support the partners intended use. However, many existing approaches neglect the humans post-handover action, relying on assumptions that limit generalizability. To address this gap, we propose LLM-Handover, a novel framework that integrates large language model (LLM)-based reasoning with part segmentation to enable context-aware grasp selection and execution. Given an RGB-D image and a task description, our system infers relevant object parts and selects grasps that optimize post-handover usability. To support evaluation, we introduce a new dataset of 60 household objects spanning 12 categories, each annotated with detailed part labels. We first demonstrate that our approach improves the performance of the used state-of-the-art part segmentation method, in the context of robot-human handovers. Next, we show that LLM-Handover achieves higher grasp success rates and adapts better to post-handover task constraints. During hardware experiments, we achieve a success rate of 83% in a zero-shot setting over conventional and unconventional post-handover tasks. Finally, our user study underlines that our method enables more intuitive, context-aware handovers, with participants preferring it in 86% of cases.
comment: Accepted to IEEE Robotics and Automation Letters (RA-L)
Stabilizing Humanoid Robot Trajectory Generation via Physics-Informed Learning and Control-Informed Steering IROS
Recent trends in humanoid robot control have successfully employed imitation learning to enable the learned generation of smooth, human-like trajectories from human data. While these approaches make more realistic motions possible, they are limited by the amount of available motion data, and do not incorporate prior knowledge about the physical laws governing the system and its interactions with the environment. Thus they may violate such laws, leading to divergent trajectories and sliding contacts which limit real-world stability. We address such limitations via a two-pronged learning strategy which leverages the known physics of the system and fundamental control principles. First, we encode physics priors during supervised imitation learning to promote trajectory feasibility. Second, we minimize drift at inference time by applying a proportional-integral controller directly to the generated output state. We validate our method on various locomotion behaviors for the ergoCub humanoid robot, where a physics-informed loss encourages zero contact foot velocity. Our experiments demonstrate that the proposed approach is compatible with multiple controllers on a real robot and significantly improves the accuracy and physical constraint conformity of generated trajectories.
comment: This paper has been accepted for publication at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hangzhou, China, 2025
CEDex: Cross-Embodiment Dexterous Grasp Generation at Scale from Human-like Contact Representations
Cross-embodiment dexterous grasp synthesis refers to adaptively generating and optimizing grasps for various robotic hands with different morphologies. This capability is crucial for achieving versatile robotic manipulation in diverse environments and requires substantial amounts of reliable and diverse grasp data for effective model training and robust generalization. However, existing approaches either rely on physics-based optimization that lacks human-like kinematic understanding or require extensive manual data collection processes that are limited to anthropomorphic structures. In this paper, we propose CEDex, a novel cross-embodiment dexterous grasp synthesis method at scale that bridges human grasping kinematics and robot kinematics by aligning robot kinematic models with generated human-like contact representations. Given an object's point cloud and an arbitrary robotic hand model, CEDex first generates human-like contact representations using a Conditional Variational Auto-encoder pretrained on human contact data. It then performs kinematic human contact alignment through topological merging to consolidate multiple human hand parts into unified robot components, followed by a signed distance field-based grasp optimization with physics-aware constraints. Using CEDex, we construct the largest cross-embodiment grasp dataset to date, comprising 500K objects across four gripper types with 20M total grasps. Extensive experiments show that CEDex outperforms state-of-the-art approaches and our dataset benefits cross-embodiment grasp learning with high-quality diverse grasps.
PoseDiff: A Unified Diffusion Model Bridging Robot Pose Estimation and Video-to-Action Control
We present PoseDiff, a conditional diffusion model that unifies robot state estimation and control within a single framework. At its core, PoseDiff maps raw visual observations into structured robot states-such as 3D keypoints or joint angles-from a single RGB image, eliminating the need for multi-stage pipelines or auxiliary modalities. Building upon this foundation, PoseDiff extends naturally to video-to-action inverse dynamics: by conditioning on sparse video keyframes generated by world models, it produces smooth and continuous long-horizon action sequences through an overlap-averaging strategy. This unified design enables scalable and efficient integration of perception and control. On the DREAM dataset, PoseDiff achieves state-of-the-art accuracy and real-time performance for pose estimation. On Libero-Object manipulation tasks, it substantially improves success rates over existing inverse dynamics modules, even under strict offline settings. Together, these results show that PoseDiff provides a scalable, accurate, and efficient bridge between perception, planning, and control in embodied AI. The video visualization results can be found on the project page: https://haozhuo-zhang.github.io/PoseDiff-project-page/.
U-DiT Policy: U-shaped Diffusion Transformers for Robotic Manipulation
Diffusion-based methods have been acknowledged as a powerful paradigm for end-to-end visuomotor control in robotics. Most existing approaches adopt a Diffusion Policy in U-Net architecture (DP-U), which, while effective, suffers from limited global context modeling and over-smoothing artifacts. To address these issues, we propose U-DiT Policy, a novel U-shaped Diffusion Transformer framework. U-DiT preserves the multi-scale feature fusion advantages of U-Net while integrating the global context modeling capability of Transformers, thereby enhancing representational power and policy expressiveness. We evaluate U-DiT extensively across both simulation and real-world robotic manipulation tasks. In simulation, U-DiT achieves an average performance gain of 10\% over baseline methods and surpasses Transformer-based diffusion policies (DP-T) that use AdaLN blocks by 6\% under comparable parameter budgets. On real-world robotic tasks, U-DiT demonstrates superior generalization and robustness, achieving an average improvement of 22.5\% over DP-U. In addition, robustness and generalization experiments under distractor and lighting variations further highlight the advantages of U-DiT. These results highlight the effectiveness and practical potential of U-DiT Policy as a new foundation for diffusion-based robotic manipulation.
Prompting Robot Teams with Natural Language
This paper presents a framework towards prompting multi-robot teams with high-level tasks using natural language expressions. Our objective is to use the reasoning capabilities demonstrated by recent language models in understanding and decomposing human expressions of intent, and repurpose these for multi-robot collaboration and decision-making. The key challenge is that an individual's behavior in a collective can be hard to specify and interpret, and must continuously adapt to actions from others. This necessitates a framework that possesses the representational capacity required by the logic and semantics of a task, and yet supports decentralized and interactive real-time operation. We solve this dilemma by recognizing that a task can be represented as a deterministic finite automaton (DFA), and that recurrent neural networks (RNNs) can encode numerous automata. This allows us to distill the logic and sequential decompositions of sub-tasks obtained from a language model into an RNN, and align its internal states with the semantics of a given task. By training a graph neural network (GNN) control policy that is conditioned on the hidden states of the RNN and the language embeddings, our method enables robots to execute task-relevant actions in a decentralized manner. We present evaluations of this single light-weight interpretable model on various simulated and real-world multi-robot tasks that require sequential and collaborative behavior by the team -- sites.google.com/view/prompting-teams.
SCOPE: Semantic Conditioning for Sim2Real Category-Level Object Pose Estimation in Robotics
Object manipulation requires accurate object pose estimation. In open environments, robots encounter unknown objects, which requires semantic understanding in order to generalize both to known categories and beyond. To resolve this challenge, we present SCOPE, a diffusion-based category-level object pose estimation model that eliminates the need for discrete category labels by leveraging DINOv2 features as continuous semantic priors. By combining these DINOv2 features with photorealistic training data and a noise model for point normals, we reduce the Sim2Real gap in category-level object pose estimation. Furthermore, injecting the continuous semantic priors via cross-attention enables SCOPE to learn canonicalized object coordinate systems across object instances beyond the distribution of known categories. SCOPE outperforms the current state of the art in synthetically trained category-level object pose estimation, achieving a relative improvement of 31.9\% on the 5$^\circ$5cm metric. Additional experiments on two instance-level datasets demonstrate generalization beyond known object categories, enabling grasping of unseen objects from unknown categories with a success rate of up to 100\%. Code available: https://github.com/hoenigpeter/scope.
Unlocking the Potential of Soft Actor-Critic for Imitation Learning
Learning-based methods have enabled robots to acquire bio-inspired movements with increasing levels of naturalness and adaptability. Among these, Imitation Learning (IL) has proven effective in transferring complex motion patterns from animals to robotic systems. However, current state-of-the-art frameworks predominantly rely on Proximal Policy Optimization (PPO), an on-policy algorithm that prioritizes stability over sample efficiency and policy generalization. This paper proposes a novel IL framework that combines Adversarial Motion Priors (AMP) with the off-policy Soft Actor-Critic (SAC) algorithm to overcome these limitations. This integration leverages replay-driven learning and entropy-regularized exploration, enabling naturalistic behavior and task execution, improving data efficiency and robustness. We evaluate the proposed approach (AMP+SAC) on quadruped gaits involving multiple reference motions and diverse terrains. Experimental results demonstrate that the proposed framework not only maintains stable task execution but also achieves higher imitation rewards compared to the widely used AMP+PPO method. These findings highlight the potential of an off-policy IL formulation for advancing motion generation in robotics.
Game Theory to Study Cooperation in Human-Robot Mixed Groups: Exploring the Potential of the Public Good Game
In this study, we explore the potential of Game Theory as a means to investigate cooperation and trust in human-robot mixed groups. Particularly, we introduce the Public Good Game (PGG), a model highlighting the tension between individual self-interest and collective well-being. In this work, we present a modified version of the PGG, where three human participants engage in the game with the humanoid robot iCub to assess whether various robot game strategies (e.g., always cooperate, always free ride, and tit-for-tat) can influence the participants' inclination to cooperate. We test our setup during a pilot study with nineteen participants. A preliminary analysis indicates that participants prefer not to invest their money in the common pool, despite they perceive the robot as generous. By conducting this research, we seek to gain valuable insights into the role that robots can play in promoting trust and cohesion during human-robot interactions within group contexts. The results of this study may hold considerable potential for developing social robots capable of fostering trust and cooperation within mixed human-robot groups.
comment: Work presented at the workshop BAILAR in conjunction with IEEE RO-MAN 2023. Peer reviewed
Training Agents Inside of Scalable World Models
World models learn general knowledge from videos and simulate experience for training behaviors in imagination, offering a path towards intelligent agents. However, previous world models have been unable to accurately predict object interactions in complex environments. We introduce Dreamer 4, a scalable agent that learns to solve control tasks by reinforcement learning inside of a fast and accurate world model. In the complex video game Minecraft, the world model accurately predicts object interactions and game mechanics, outperforming previous world models by a large margin. The world model achieves real-time interactive inference on a single GPU through a shortcut forcing objective and an efficient transformer architecture. Moreover, the world model learns general action conditioning from only a small amount of data, allowing it to extract the majority of its knowledge from diverse unlabeled videos. We propose the challenge of obtaining diamonds in Minecraft from only offline data, aligning with practical applications such as robotics where learning from environment interaction can be unsafe and slow. This task requires choosing sequences of over 20,000 mouse and keyboard actions from raw pixels. By learning behaviors in imagination, Dreamer 4 is the first agent to obtain diamonds in Minecraft purely from offline data, without environment interaction. Our work provides a scalable recipe for imagination training, marking a step towards intelligent agents.
comment: Website: https://danijar.com/dreamer4/
PhysiAgent: An Embodied Agent Framework in Physical World
Vision-Language-Action (VLA) models have achieved notable success but often struggle with limited generalizations. To address this, integrating generalized Vision-Language Models (VLMs) as assistants to VLAs has emerged as a popular solution. However, current approaches often combine these models in rigid, sequential structures: using VLMs primarily for high-level scene understanding and task planning, and VLAs merely as executors of lower-level actions, leading to ineffective collaboration and poor grounding challenges. In this paper, we propose an embodied agent framework, PhysiAgent, tailored to operate effectively in physical environments. By incorporating monitor, memory, self-reflection mechanisms, and lightweight off-the-shelf toolboxes, PhysiAgent offers an autonomous scaffolding framework to prompt VLMs to organize different components based on real-time proficiency feedback from VLAs to maximally exploit VLAs' capabilities. Experimental results demonstrate significant improvements in task-solving performance on complex real-world robotic tasks, showcasing effective self-regulation of VLMs, coherent tool collaboration, and adaptive evolution of the framework during execution. PhysiAgent makes practical and pioneering efforts to integrate VLMs and VLAs, effectively grounding embodied agent frameworks in real-world settings.
DynaMIC: Dynamic Multimodal In-Context Learning Enabled Embodied Robot Counterfactual Resistance Ability
The emergence of large pre-trained models based on natural language has breathed new life into robotics development. Extensive research has integrated large models with robots, utilizing the powerful semantic understanding and generation capabilities of large models to facilitate robot control through natural language instructions gradually. However, we found that robots that strictly adhere to human instructions, especially those containing misleading information, may encounter errors during task execution, potentially leading to safety hazards. This resembles the concept of counterfactuals in natural language processing (NLP), which has not yet attracted much attention in robotic research. In an effort to highlight this issue for future studies, this paper introduced directive counterfactuals (DCFs) arising from misleading human directives. We present DynaMIC, a framework for generating robot task flows to identify DCFs and relay feedback to humans proactively. This capability can help robots be sensitive to potential DCFs within a task, thus enhancing the reliability of the execution process. We conducted semantic-level experiments and ablation studies, showcasing the effectiveness of this framework.
AdaNav: Adaptive Reasoning with Uncertainty for Vision-Language Navigation
Vision Language Navigation (VLN) requires agents to follow natural language instructions by grounding them in sequential visual observations over long horizons. Explicit reasoning could enhance temporal consistency and perception action alignment, but reasoning at fixed steps often leads to suboptimal performance and unnecessary computation. To address this, we propose AdaNav, an uncertainty-based adaptive reasoning framework for VLN. At its core is the Uncertainty Adaptive Reasoning Block (UAR), a lightweight plugin that dynamically triggers reasoning. We introduce Action Entropy as a policy prior for UAR and progressively refine it through a Heuristics to RL training method, enabling agents to learn difficulty aware reasoning policies under the strict data limitations of embodied tasks. Results show that with only 6K training samples, AdaNav achieves substantial gains over closed source models trained on million scale data, improving success rate by 20% on R2R val-unseen, 11.7% on RxR-CE, and 11.4% in real world scenes. The code is available at https://github.com/xinding-sys/AdaNav.
SONAR: Semantic-Object Navigation with Aggregated Reasoning through a Cross-Modal Inference Paradigm
Understanding human instructions and accomplishing Vision-Language Navigation tasks in unknown environments is essential for robots. However, existing modular approaches heavily rely on the quality of training data and often exhibit poor generalization. Vision-Language Model based methods, while demonstrating strong generalization capabilities, tend to perform unsatisfactorily when semantic cues are weak. To address these issues, this paper proposes SONAR, an aggregated reasoning approach through a cross modal paradigm. The proposed method integrates a semantic map based target prediction module with a Vision-Language Model based value map module, enabling more robust navigation in unknown environments with varying levels of semantic cues, and effectively balancing generalization ability with scene adaptability. In terms of target localization, we propose a strategy that integrates multi-scale semantic maps with confidence maps, aiming to mitigate false detections of target objects. We conducted an evaluation of the SONAR within the Gazebo simulator, leveraging the most challenging Matterport 3D (MP3D) dataset as the experimental benchmark. Experimental results demonstrate that SONAR achieves a success rate of 38.4% and an SPL of 17.7%.
Learning to Sample: Reinforcement Learning-Guided Sampling for Autonomous Vehicle Motion Planning ICRA 2026
Sampling-based motion planning is a well-established approach in autonomous driving, valued for its modularity and analytical tractability. In complex urban scenarios, however, uniform or heuristic sampling often produces many infeasible or irrelevant trajectories. We address this limitation with a hybrid framework that learns where to sample while keeping trajectory generation and evaluation fully analytical and verifiable. A reinforcement learning (RL) agent guides the sampling process toward regions of the action space likely to yield feasible trajectories, while evaluation and final selection remains governed by deterministic feasibility checks and cost functions. We couple the RL sampler with a world model (WM) based on a decodable deep set encoder, enabling both variable numbers of traffic participants and reconstructable latent representations. The approach is evaluated in the CommonRoad simulation environment, showing up to 99% fewer required samples and a runtime reduction of up to 84% while maintaining planning quality in terms of success and collision-free rates. These improvements lead to faster, more reliable decision-making for autonomous vehicles in urban environments, achieving safer and more responsive navigation under real-world constraints. Code and trained artifacts are publicly available at: https://github.com/TUM-AVS/Learning-to-Sample
comment: 8 pages, submitted to the IEEE ICRA 2026, Vienna, Austria
Contextual Neural Moving Horizon Estimation for Robust Quadrotor Control in Varying Conditions
Adaptive controllers on quadrotors typically rely on estimation of disturbances to ensure robust trajectory tracking. Estimating disturbances across diverse environmental contexts is challenging due to the inherent variability and uncertainty in the real world. Such estimators require extensive fine-tuning for a specific scenario, which makes them inflexible and brittle to changing conditions. Machine-learning approaches, such as training a neural network to tune the estimator's parameters, are promising. However, collecting data across all possible environmental contexts is impossible. It is also inefficient as the same estimator parameters could work for "nearby" contexts. In this paper, we present a sequential decision making strategy that decides which environmental contexts, using Bayesian Optimization with a Gaussian Process, to collect data from in order to ensure robust performance across a wide range of contexts. Our method, Contextual NeuroMHE, eliminates the need for exhaustive training across all environments while maintaining robust performance under different conditions. By enabling the neural network to adapt its parameters dynamically, our method improves both efficiency and generalization. Experimental results in various real-world settings demonstrate that our approach outperforms the prior work by 20.3\% in terms of maximum absolute position error and can capture the variations in the environment with a few carefully chosen contexts.
comment: 9 pages, 7 Figures, Kasra Torshizi and Chak Lam Shek contributed equally
SafeFlowMatcher: Safe and Fast Planning using Flow Matching with Control Barrier Functions
Generative planners based on flow matching (FM) can produce high-quality paths in one or a few ODE steps, but their sampling dynamics offer no formal safety guarantees and can yield incomplete paths near constraints. We present SafeFlowMatcher, a planning framework that couples FM with control barrier functions (CBFs) to achieve both real-time efficiency and certified safety. SafeFlowMatcher uses a two-phase prediction-correction (PC) integrator: (i) a prediction phase integrates the learned FM once (or a few steps) to obtain a candidate path without intervention; (ii) a correction phase refines this path with a vanishing time-scaled vector field and a CBF-based quadratic program that minimally perturbs the vector field. We prove a barrier certificate for the resulting flow system, establishing forward invariance of a robust safe set and finite-time convergence to the safe set. By enforcing safety only on the executed path (rather than on all intermediate latent paths), SafeFlowMatcher avoids distributional drift and mitigates local trap problems. Across maze navigation and locomotion benchmarks, SafeFlowMatcher attains faster, smoother, and safer paths than diffusion- and FM-based baselines. Extensive ablations corroborate the contributions of the PC integrator and the barrier certificate.
comment: 10 pages, 7 figures, 4 tables
FreeAction: Training-Free Techniques for Enhanced Fidelity of Trajectory-to-Video Generation
Generating realistic robot videos from explicit action trajectories is a critical step toward building effective world models and robotics foundation models. We introduce two training-free, inference-time techniques that fully exploit explicit action parameters in diffusion-based robot video generation. Instead of treating action vectors as passive conditioning signals, our methods actively incorporate them to guide both the classifier-free guidance process and the initialization of Gaussian latents. First, action-scaled classifier-free guidance dynamically modulates guidance strength in proportion to action magnitude, enhancing controllability over motion intensity. Second, action-scaled noise truncation adjusts the distribution of initially sampled noise to better align with the desired motion dynamics. Experiments on real robot manipulation datasets demonstrate that these techniques significantly improve action coherence and visual quality across diverse robot environments.
comment: 8 pages, 4 figures, accepted to CoRL 2025 LSRW workshop
PROFusion: Robust and Accurate Dense Reconstruction via Camera Pose Regression and Optimization
Real-time dense scene reconstruction during unstable camera motions is crucial for robotics, yet current RGB-D SLAM systems fail when cameras experience large viewpoint changes, fast motions, or sudden shaking. Classical optimization-based methods deliver high accuracy but fail with poor initialization during large motions, while learning-based approaches provide robustness but lack sufficient accuracy for dense reconstruction. We address this challenge through a combination of learning-based initialization with optimization-based refinement. Our method employs a camera pose regression network to predict metric-aware relative poses from consecutive RGB-D frames, which serve as reliable starting points for a randomized optimization algorithm that further aligns depth images with the scene geometry. Extensive experiments demonstrate promising results: our approach outperforms the best competitor on challenging benchmarks, while maintaining comparable accuracy on stable motion sequences. The system operates in real-time, showcasing that combining simple and principled techniques can achieve both robustness for unstable motions and accuracy for dense reconstruction. Project page: https://github.com/siyandong/PROFusion.
Towards Tighter Convex Relaxation of Mixed-integer Programs: Leveraging Logic Network Flow for Task and Motion Planning
This paper proposes an optimization-based task and motion planning framework, named "Logic Network Flow", that integrates temporal logic specifications into mixed-integer programs for efficient robot planning. Inspired by the Graph-of-Convex-Sets formulation, temporal predicates are encoded as polyhedron constraints on each edge of a network flow model, instead of as constraints between nodes in traditional Logic Tree formulations. We further propose a network-flow-based Fourier-Motzkin elimination procedure that removes continuous flow variables while preserving convex relaxation tightness, leading to provably tighter convex relaxations and fewer constraints than Logic Tree formulations. For temporal logic motion planning with piecewise-affine dynamic systems, comprehensive experiments across vehicle routing, multi-robot coordination, and temporal logic control on dynamical systems using point mass and linear inverted pendulum models demonstrate computational speedups of up to several orders of magnitude. Hardware demonstrations with quadrupedal robots validate real-time replanning capabilities under dynamically changing environmental conditions. The project website is at https://logicnetworkflow.github.io/.
comment: 35 pages, 17 figures, 7 tables
ELHPlan: Efficient Long-Horizon Task Planning for Multi-Agent Collaboration
Large Language Models (LLMs) enable intelligent multi-robot collaboration but face fundamental trade-offs: declarative methods lack adaptability in dynamic environments, while iterative methods incur prohibitive computational costs that scale poorly with team size and task complexity. In this paper, we propose ELHPlan, a novel framework that introduces Action Chains--sequences of actions explicitly bound to sub-goal intentions--as the fundamental planning primitive. ELHPlan operates via a cyclical process: 1) constructing intention-bound action sequences, 2) proactively validating for conflicts and feasibility, 3) refining issues through targeted mechanisms, and 4) executing validated actions. This design balances adaptability and efficiency by providing sufficient planning horizons while avoiding expensive full re-planning. We further propose comprehensive efficiency metrics, including token consumption and planning time, to more holistically evaluate multi-agent collaboration. Our experiments on benchmark TDW-MAT and C-WAH demonstrate that ELHPlan achieves comparable task success rates while consuming only 24% of the tokens required by state-of-the-art methods. Our research establishes a new efficiency-effectiveness frontier for LLM-based multi-agent planning systems.
ViReSkill: Vision-Grounded Replanning with Skill Memory for LLM-Based Planning in Lifelong Robot Learning
Robots trained via Reinforcement Learning (RL) or Imitation Learning (IL) often adapt slowly to new tasks, whereas recent Large Language Models (LLMs) and Vision-Language Models (VLMs) promise knowledge-rich planning from minimal data. Deploying LLMs/VLMs for motion planning, however, faces two key obstacles: (i) symbolic plans are rarely grounded in scene geometry and object physics, and (ii) model outputs can vary for identical prompts, undermining execution reliability. We propose ViReSkill, a framework that pairs vision-grounded replanning with a skill memory for accumulation and reuse. When a failure occurs, the replanner generates a new action sequence conditioned on the current scene, tailored to the observed state. On success, the executed plan is stored as a reusable skill and replayed in future encounters without additional calls to LLMs/VLMs. This feedback loop enables autonomous continual learning: each attempt immediately expands the skill set and stabilizes subsequent executions. We evaluate ViReSkill on simulators such as LIBERO and RLBench as well as on a physical robot. Across all settings, it consistently outperforms conventional baselines in task success rate, demonstrating robust sim-to-real generalization.
Very High Frequency Interpolation for Direct Torque Control
Torque control enables agile and robust robot motion, but deployment is often hindered by instability and hardware limits. Here, we present a novel solution to execute whole-body linear feedback at up to 40 kHz on open-source hardware. We use this to interpolate non-linear schemes during real-world execution, such as inverse dynamics and learned torque policies. Our results show that by stabilizing torque controllers, high-frequency linear feedback could be an effective route towards unlocking the potential of torque-controlled robotics.
Preference-Based Long-Horizon Robotic Stacking with Multimodal Large Language Models
Pretrained large language models (LLMs) can work as high-level robotic planners by reasoning over abstract task descriptions and natural language instructions, etc. However, they have shown a lack of knowledge and effectiveness in planning long-horizon robotic manipulation tasks where the physical properties of the objects are essential. An example is the stacking of containers with hidden objects inside, which involves reasoning over hidden physics properties such as weight and stability. To this end, this paper proposes to use multimodal LLMs as high-level planners for such long-horizon robotic stacking tasks. The LLM takes multimodal inputs for each object to stack and infers the current best stacking sequence by reasoning over stacking preferences. Furthermore, in order to enable the LLM to reason over multiple preferences at the same time without giving explicit instructions, we propose to create a custom dataset considering stacking preferences including weight, stability, size, and footprint, to fine-tune the LLM. Compared to the pretrained LLM with prompt tuning, we demonstrate the improved stacking completion of the LLM fine-tuned with our custom dataset via large-scale simulation evaluation. Furthermore, we showcase the effectiveness of the proposed framework for the long-horizon stacking task on a real humanoid robot in an online manner.
Memory Transfer Planning: LLM-driven Context-Aware Code Adaptation for Robot Manipulation
Large language models (LLMs) are increasingly explored in robot manipulation, but many existing methods struggle to adapt to new environments. Many systems require either environment-specific policy training or depend on fixed prompts and single-shot code generation, leading to limited transferability and manual re-tuning. We introduce Memory Transfer Planning (MTP), a framework that leverages successful control-code examples from different environments as procedural knowledge, using them as in-context guidance for LLM-driven planning. Specifically, MTP (i) generates an initial plan and code using LLMs, (ii) retrieves relevant successful examples from a code memory, and (iii) contextually adapts the retrieved code to the target setting for re-planning without updating model parameters. We evaluate MTP on RLBench, CALVIN, and a physical robot, demonstrating effectiveness beyond simulation. Across these settings, MTP consistently improved success rate and adaptability compared with fixed-prompt code generation, naive retrieval, and memory-free re-planning. Furthermore, in hardware experiments, leveraging a memory constructed in simulation proved effective. MTP provides a practical approach that exploits procedural knowledge to realize robust LLM-based planning across diverse robotic manipulation scenarios, enhancing adaptability to novel environments and bridging simulation and real-world deployment.
A Novel Model for 3D Motion Planning for a Generalized Dubins Vehicle with Pitch and Yaw Rate Constraints
In this paper, we propose a new modeling approach and a fast algorithm for 3D motion planning, applicable for fixed-wing unmanned aerial vehicles. The goal is to construct the shortest path connecting given initial and final configurations subject to motion constraints. Our work differs from existing literature in two ways. First, we consider full vehicle orientation using a body-attached frame, which includes roll, pitch, and yaw angles. However, existing work uses only pitch and/or heading angle, which is insufficient to uniquely determine orientation. Second, we use two control inputs to represent bounded pitch and yaw rates, reflecting control by two separate actuators. In contrast, most previous methods rely on a single input, such as path curvature, which is insufficient for accurately modeling the vehicle's kinematics in 3D. We use a rotation minimizing frame to describe the vehicle's configuration and its evolution, and construct paths by concatenating optimal Dubins paths on spherical, cylindrical, or planar surfaces. Numerical simulations show our approach generates feasible paths within 10 seconds on average and yields shorter paths than existing methods in most cases.
comment: The code for this paper is available at https://github.com/DeepakPrakashKumar/3D-Motion-Planning-for-Generalized-Dubins-with-Pitch-Yaw-constraints
MoReFlow: Motion Retargeting Learning through Unsupervised Flow Matching
Motion retargeting holds a premise of offering a larger set of motion data for characters and robots with different morphologies. Many prior works have approached this problem via either handcrafted constraints or paired motion datasets, limiting their applicability to humanoid characters or narrow behaviors such as locomotion. Moreover, they often assume a fixed notion of retargeting, overlooking domain-specific objectives like style preservation in animation or task-space alignment in robotics. In this work, we propose MoReFlow, Motion Retargeting via Flow Matching, an unsupervised framework that learns correspondences between characters' motion embedding spaces. Our method consists of two stages. First, we train tokenized motion embeddings for each character using a VQ-VAE, yielding compact latent representations. Then, we employ flow matching with conditional coupling to align the latent spaces across characters, which simultaneously learns conditioned and unconditioned matching to achieve robust but flexible retargeting. Once trained, MoReFlow enables flexible and reversible retargeting without requiring paired data. Experiments demonstrate that MoReFlow produces high-quality motions across diverse characters and tasks, offering improved controllability, generalization, and motion realism compared to the baselines.
Integrator Forwading Design for Unicycles with Constant and Actuated Velocity in Polar Coordinates
In a companion paper, we present a modular framework for unicycle stabilization in polar coordinates that provides smooth steering laws through backstepping. Surprisingly, the same problem also allows the application of integrator forwarding. In this work, we leverage this feature and construct new smooth steering laws together with control Lyapunov functions (CLFs), expanding the set of CLFs available for inverse optimal control design. In the case of constant forward velocity (Dubins car), backstepping produces finite-time (deadbeat) parking, and we show that integrator forwarding yields the very same class of solutions. This reveals a fundamental connection between backstepping and forwarding in addressing both the unicycle and, the Dubins car parking problems.
Modular Design of Strict Control Lyapunov Functions for Global Stabilization of the Unicycle in Polar Coordinates
Since the mid-1990s, it has been known that, unlike in Cartesian form where Brockett's condition rules out static feedback stabilization, the unicycle is globally asymptotically stabilizable by smooth feedback in polar coordinates. In this note, we introduce a modular framework for designing smooth feedback laws that achieve global asymptotic stabilization in polar coordinates. These laws are bidirectional, enabling efficient parking maneuvers, and are paired with families of strict control Lyapunov functions (CLFs) constructed in a modular fashion. The resulting CLFs guarantee global asymptotic stability with explicit convergence rates and include barrier variants that yield "almost global" stabilization, excluding only zero-measure subsets of the rotation manifolds. The strictness of the CLFs is further leveraged in our companion paper, where we develop inverse-optimal redesigns with meaningful cost functions and infinite gain margins.
Exhaustive-Serve-Longest Control for Multi-robot Scheduling Systems
We study online task allocation for multi-robot, multi-queue systems with stochastic arrivals and switching delays. Time is slotted; each location can host at most one robot per slot; service consumes one slot; switching between locations incurs a one-slot travel delay; and arrivals are independent Bernoulli processes. We formulate a discounted-cost Markov decision process and propose Exhaustive-Serve-Longest (ESL), a simple real-time policy that serves exhaustively when the current location is nonempty and, when idle, switches to a longest unoccupied nonempty location, and we prove the optimality of this policy. As baselines, we tune a fixed-dwell cyclic policy via a discrete-time delay expression and implement a first-come-first-serve policy. Across server-to-location ratios and loads, ESL consistently yields lower discounted holding cost and smaller mean queue lengths, with action-time fractions showing more serving and restrained switching. Its simplicity and robustness make ESL a practical default for real-time multi-robot scheduling systems.
Online Mapping for Autonomous Driving: Addressing Sensor Generalization and Dynamic Map Updates in Campus Environments
High-definition (HD) maps are essential for autonomous driving, providing precise information such as road boundaries, lane dividers, and crosswalks to enable safe and accurate navigation. However, traditional HD map generation is labor-intensive, expensive, and difficult to maintain in dynamic environments. To overcome these challenges, we present a real-world deployment of an online mapping system on a campus golf cart platform equipped with dual front cameras and a LiDAR sensor. Our work tackles three core challenges: (1) labeling a 3D HD map for campus environment; (2) integrating and generalizing the SemVecMap model onboard; and (3) incrementally generating and updating the predicted HD map to capture environmental changes. By fine-tuning with campus-specific data, our pipeline produces accurate map predictions and supports continual updates, demonstrating its practical value in real-world autonomous driving scenarios.
comment: 19th International Symposium on Experimental Robotics
LLM-RG: Referential Grounding in Outdoor Scenarios using Large Language Models
Referential grounding in outdoor driving scenes is challenging due to large scene variability, many visually similar objects, and dynamic elements that complicate resolving natural-language references (e.g., "the black car on the right"). We propose LLM-RG, a hybrid pipeline that combines off-the-shelf vision-language models for fine-grained attribute extraction with large language models for symbolic reasoning. LLM-RG processes an image and a free-form referring expression by using an LLM to extract relevant object types and attributes, detecting candidate regions, generating rich visual descriptors with a VLM, and then combining these descriptors with spatial metadata into natural-language prompts that are input to an LLM for chain-of-thought reasoning to identify the referent's bounding box. Evaluated on the Talk2Car benchmark, LLM-RG yields substantial gains over both LLM and VLM-based baselines. Additionally, our ablations show that adding 3D spatial cues further improves grounding. Our results demonstrate the complementary strengths of VLMs and LLMs, applied in a zero-shot manner, for robust outdoor referential grounding.
Robust Visual Localization in Compute-Constrained Environments by Salient Edge Rendering and Weighted Hamming Similarity
We consider the problem of vision-based 6-DoF object pose estimation in the context of the notional Mars Sample Return campaign, in which a robotic arm would need to localize multiple objects of interest for low-clearance pickup and insertion, under severely constrained hardware. We propose a novel localization algorithm leveraging a custom renderer together with a new template matching metric tailored to the edge domain to achieve robust pose estimation using only low-fidelity, textureless 3D models as inputs. Extensive evaluations on synthetic datasets as well as from physical testbeds on Earth and in situ Mars imagery shows that our method consistently beats the state of the art in compute and memory-constrained localization, both in terms of robustness and accuracy, in turn enabling new possibilities for cheap and reliable localization on general-purpose hardware.
comment: To appear in IEEE Robotics and Automation Letters
World Model for AI Autonomous Navigation in Mechanical Thrombectomy MICCAI 2025
Autonomous navigation for mechanical thrombectomy (MT) remains a critical challenge due to the complexity of vascular anatomy and the need for precise, real-time decision-making. Reinforcement learning (RL)-based approaches have demonstrated potential in automating endovascular navigation, but current methods often struggle with generalization across multiple patient vasculatures and long-horizon tasks. We propose a world model for autonomous endovascular navigation using TD-MPC2, a model-based RL algorithm. We trained a single RL agent across multiple endovascular navigation tasks in ten real patient vasculatures, comparing performance against the state-of-the-art Soft Actor-Critic (SAC) method. Results indicate that TD-MPC2 significantly outperforms SAC in multi-task learning, achieving a 65% mean success rate compared to SAC's 37%, with notable improvements in path ratio. TD-MPC2 exhibited increased procedure times, suggesting a trade-off between success rate and execution speed. These findings highlight the potential of world models for improving autonomous endovascular navigation and lay the foundation for future research in generalizable AI-driven robotic interventions.
comment: Published in Medical Image Computing and Computer Assisted Intervention - MICCAI 2025, Lecture Notes in Computer Science, vol 15968
Message passing-based inference in an autoregressive active inference agent
We present the design of an autoregressive active inference agent in the form of message passing on a factor graph. Expected free energy is derived and distributed across a planning graph. The proposed agent is validated on a robot navigation task, demonstrating exploration and exploitation in a continuous-valued observation space with bounded continuous-valued actions. Compared to a classical optimal controller, the agent modulates action based on predictive uncertainty, arriving later but with a better model of the robot's dynamics.
comment: 14 pages, 4 figures, to be published in the proceedings of the International Workshop on Active Inference 2025
Infrastructure Sensor-enabled Vehicle Data Generation using Multi-Sensor Fusion for Proactive Safety Applications at Work Zone
Infrastructure-based sensing and real-time trajectory generation show promise for improving safety in high-risk roadway segments such as work zones, yet practical deployments are hindered by perspective distortion, complex geometry, occlusions, and costs. This study tackles these barriers by integrating roadside camera and LiDAR sensors into a cosimulation environment to develop a scalable, cost-effective vehicle detection and localization framework, and employing a Kalman Filter-based late fusion strategy to enhance trajectory consistency and accuracy. In simulation, the fusion algorithm reduced longitudinal error by up to 70 percent compared to individual sensors while preserving lateral accuracy within 1 to 3 meters. Field validation in an active work zone, using LiDAR, a radar-camera rig, and RTK-GPS as ground truth, demonstrated that the fused trajectories closely match real vehicle paths, even when single-sensor data are intermittent or degraded. These results confirm that KF based sensor fusion can reliably compensate for individual sensor limitations, providing precise and robust vehicle tracking capabilities. Our approach thus offers a practical pathway to deploy infrastructure-enabled multi-sensor systems for proactive safety measures in complex traffic environments.
CoTaP: Compliant Task Pipeline and Reinforcement Learning of Its Controller with Compliance Modulation
Humanoid whole-body locomotion control is a critical approach for humanoid robots to leverage their inherent advantages. Learning-based control methods derived from retargeted human motion data provide an effective means of addressing this issue. However, because most current human datasets lack measured force data, and learning-based robot control is largely position-based, achieving appropriate compliance during interaction with real environments remains challenging. This paper presents Compliant Task Pipeline (CoTaP): a pipeline that leverages compliance information in the learning-based structure of humanoid robots. A two-stage dual-agent reinforcement learning framework combined with model-based compliance control for humanoid robots is proposed. In the training process, first a base policy with a position-based controller is trained; then in the distillation, the upper-body policy is combined with model-based compliance control, and the lower-body agent is guided by the base policy. In the upper-body control, adjustable task-space compliance can be specified and integrated with other controllers through compliance modulation on the symmetric positive definite (SPD) manifold, ensuring system stability. We validated the feasibility of the proposed strategy in simulation, primarily comparing the responses to external disturbances under different compliance settings.
comment: Submitted to IEEE for possible publication, under review
Parallel Heuristic Search as Inference for Actor-Critic Reinforcement Learning Models
Actor-Critic models are a class of model-free deep reinforcement learning (RL) algorithms that have demonstrated effectiveness across various robot learning tasks. While considerable research has focused on improving training stability and data sampling efficiency, most deployment strategies have remained relatively simplistic, typically relying on direct actor policy rollouts. In contrast, we propose \pachs{} (\textit{P}arallel \textit{A}ctor-\textit{C}ritic \textit{H}euristic \textit{S}earch), an efficient parallel best-first search algorithm for inference that leverages both components of the actor-critic architecture: the actor network generates actions, while the critic network provides cost-to-go estimates to guide the search. Two levels of parallelism are employed within the search -- actions and cost-to-go estimates are generated in batches by the actor and critic networks respectively, and graph expansion is distributed across multiple threads. We demonstrate the effectiveness of our approach in robotic manipulation tasks, including collision-free motion planning and contact-rich interactions such as non-prehensile pushing. Visit p-achs.github.io for demonstrations and examples.
comment: Submitted for Publication
SARM: Stage-Aware Reward Modeling for Long Horizon Robot Manipulation
Large-scale robot learning has recently shown promise for enabling robots to perform complex tasks by integrating perception, control, and language understanding. Yet, it struggles with long-horizon, contact-rich manipulation such as deformable object handling, where demonstration quality is inconsistent. Reward modeling offers a natural solution: by providing grounded progress signals, it transforms noisy demonstrations into stable supervision that generalizes across diverse trajectories. We introduce a stage-aware, video-based reward modeling framework that jointly predicts high-level task stages and fine-grained progress. Reward labels are automatically derived from natural language subtask annotations, ensuring consistent progress estimation across variable-length demonstrations. This design overcomes frame-index labeling, which fails in variable-duration tasks like folding a T-shirt. Our reward model demonstrates robustness to variability, generalization to out-of-distribution settings, and strong utility for policy training. Building on it, we propose Reward-Aligned Behavior Cloning (RA-BC), which filters high-quality data and reweights samples by reward. Experiments show the reward model alone outperforms baselines on validation and real robot rollouts. Integrated into RA-BC, our approach achieves 83\% success on folding T-shirts from the flattened state and 67\% from the crumpled state -- far surpassing vanilla behavior cloning, which attains only 8\% and 0\% success. Overall, our results highlight reward modeling as a key enabler for scalable, annotation-efficient, and robust imitation learning in long-horizon manipulation.
SRMP: Search-Based Robot Motion Planning Library
Motion planning is a critical component in any robotic system. Over the years, powerful tools like the Open Motion Planning Library (OMPL) have been developed, offering numerous motion planning algorithms. However, existing frameworks often struggle to deliver the level of predictability and repeatability demanded by high-stakes applications -- ranging from ensuring safety in industrial environments to the creation of high-quality motion datasets for robot learning. Complementing existing tools, we introduce SRMP (Search-based Robot Motion Planning), a new software framework tailored for robotic manipulation. SRMP distinguishes itself by generating consistent and reliable trajectories, and is the first software tool to offer motion planning algorithms for multi-robot manipulation tasks. SRMP easily integrates with major simulators, including MuJoCo, Sapien, Genesis, and PyBullet via a Python and C++ API. SRMP includes a dedicated MoveIt! plugin that enables immediate deployment on robot hardware and seamless integration with existing pipelines. Through extensive evaluations, we demonstrate in this paper that SRMP not only meets the rigorous demands of industrial and safety-critical applications but also sets a new standard for consistency in motion planning across diverse robotic systems. Visit srmp.readthedocs.io for SRMP documentation and tutorials.
comment: Submitted for Publication
Hierarchical Task Environments as the Next Frontier for Embodied World Models in Robot Soccer NeurIPS 2025
Recent advances in agent development have focused on scaling model size and raw interaction data, mirroring the successes seen in large language models. However, for complex, long-horizon multi-agent tasks such as robotic soccer, this end-to-end approach often fails due to intractable exploration spaces and sparse rewards. This position paper argues that the next frontier in developing embodied world models is not merely increasing the fidelity or size of environments, but scaling their structural complexity through explicit hierarchical scaffolding. We posit that an effective world model for decision-making must model not only the world's physics but also its task semantics. Drawing from a systematic review of 2024 research in low-resource multi-agent soccer, we identify a clear trend towards integrating symbolic and hierarchical methods, such as Hierarchical Task Networks (HTNs) and Bayesian Strategy Networks (BSNs), with multi-agent reinforcement learning (MARL). These methods decompose complex goals into manageable subgoals, creating an intrinsic curriculum that shapes agent learning. We propose that such structured environments are essential for bridging the gap between simple, reactive behaviors and sophisticated, strategic team play. We further extend this principle, proposing that this scaffolding can be generalized to other complex domains and dynamically generated by Large Language Models (LLMs), which act as generative world models of tasks. By building environments with explicit, composable task layers, we can guide agent exploration more efficiently, generate meaningful learning signals, and ultimately train more capable and general-purpose agents with fewer resources than purely end-to-end approaches.
comment: In the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Embodied World Models for Decision Making (EWM)
Diffusion-Based mmWave Radar Point Cloud Enhancement Driven by Range Images
Millimeter-wave (mmWave) radar has attracted significant attention in robotics and autonomous driving. However, despite the perception stability in harsh environments, the point cloud generated by mmWave radar is relatively sparse while containing significant noise, which limits its further development. Traditional mmWave radar enhancement approaches often struggle to leverage the effectiveness of diffusion models in super-resolution, largely due to the unnatural range-azimuth heatmap (RAH) or bird's eye view (BEV) representation. To overcome this limitation, we propose a novel method that pioneers the application of fusing range images with image diffusion models, achieving accurate and dense mmWave radar point clouds that are similar to LiDAR. Benefitting from the projection that aligns with human observation, the range image representation of mmWave radar is close to natural images, allowing the knowledge from pre-trained image diffusion models to be effectively transferred, significantly improving the overall performance. Extensive evaluations on both public datasets and self-constructed datasets demonstrate that our approach provides substantial improvements, establishing a new state-of-the-art performance in generating truly three-dimensional LiDAR-like point clouds via mmWave radar. Code will be released after publication.
comment: 8 pages, 7 figures. This work has been submitted to the IEEE for possible publication
What Do You Need for Diverse Trajectory Composition in Diffusion Planning?
In planning, stitching is an ability of algorithms to piece together sub-trajectories of data they are trained on to generate new and diverse behaviours. While stitching is historically a strength of offline reinforcement learning, recent generative behavioural cloning (BC) methods have also shown proficiency at stitching. However, the main factors behind this are poorly understood, hindering the development of new algorithms that can reliably stitch. Focusing on diffusion planners trained via BC, we find two properties are needed to compose: \emph{positional equivariance} and \emph{local receptiveness}. We use these two properties to explain architecture, data, and inference choices in existing generative BC methods based on diffusion planning, including replanning frequency, data augmentation, and data scaling. Experimental comparisions show that (1) while locality is more important than positional equivariance in creating a diffusion planner capable of composition, both are crucial (2) enabling these properties through relatively simple architecture choices can be competitive with more computationally expensive methods such as replanning or scaling data, and (3) simple inpainting-based guidance can guide architecturally compositional models to enable generalization in goal-conditioned settings.
comment: 9 Pages
LaVA-Man: Learning Visual Action Representations for Robot Manipulation
Visual-textual understanding is essential for language-guided robot manipulation. Recent works leverage pre-trained vision-language models to measure the similarity between encoded visual observations and textual instructions, and then train a model to map this similarity to robot actions. However, this two-step approach limits the model to capture the relationship between visual observations and textual instructions, leading to reduced precision in manipulation tasks. We propose to learn visual-textual associations through a self-supervised pretext task: reconstructing a masked goal image conditioned on an input image and textual instructions. This formulation allows the model to learn visual-action representations without robot action supervision. The learned representations can then be fine-tuned for manipulation tasks with only a few demonstrations. We also introduce the \textit{Omni-Object Pick-and-Place} dataset, which consists of annotated robot tabletop manipulation episodes, including 180 object classes and 3,200 instances with corresponding textual instructions. This dataset enables the model to acquire diverse object priors and allows for a more comprehensive evaluation of its generalisation capability across object instances. Experimental results on the five benchmarks, including both simulated and real-robot validations, demonstrate that our method outperforms prior art.
Learning Smooth State-Dependent Traversability from Dense Point Clouds
A key open challenge in off-road autonomy is that the traversability of terrain often depends on the vehicle's state. In particular, some obstacles are only traversable from some orientations. However, learning this interaction by encoding the angle of approach as a model input demands a large and diverse training dataset and is computationally inefficient during planning due to repeated model inference. To address these challenges, we present SPARTA, a method for estimating approach angle conditioned traversability from point clouds. Specifically, we impose geometric structure into our network by outputting a smooth analytical function over the 1-Sphere that predicts risk distribution for any angle of approach with minimal overhead and can be reused for subsequent queries. The function is composed of Fourier basis functions, which has important advantages for generalization due to their periodic nature and smoothness. We demonstrate SPARTA both in a high-fidelity simulation platform, where our model achieves a 91\% success rate crossing a 40m boulder field (compared to 73\% for the baseline), and on hardware, illustrating the generalization ability of the model to real-world settings. Our code will be available at https://github.com/neu-autonomy/SPARTA.
comment: 18 pages, 13 figures
Vintix: Action Model via In-Context Reinforcement Learning ICML 2025
In-Context Reinforcement Learning (ICRL) represents a promising paradigm for developing generalist agents that learn at inference time through trial-and-error interactions, analogous to how large language models adapt contextually, but with a focus on reward maximization. However, the scalability of ICRL beyond toy tasks and single-domain settings remains an open challenge. In this work, we present the first steps toward scaling ICRL by introducing a fixed, cross-domain model capable of learning behaviors through in-context reinforcement learning. Our results demonstrate that Algorithm Distillation, a framework designed to facilitate ICRL, offers a compelling and competitive alternative to expert distillation to construct versatile action models. These findings highlight the potential of ICRL as a scalable approach for generalist decision-making systems. Code released at https://github.com/dunnolab/vintix
comment: ICML 2025, Poster
Beyond Human Demonstrations: Diffusion-Based Reinforcement Learning to Generate Data for VLA Training
Vision-language-action (VLA) models have shown strong generalization across tasks and embodiments; however, their reliance on large-scale human demonstrations limits their scalability owing to the cost and effort of manual data collection. Reinforcement learning (RL) offers a potential alternative to generate demonstrations autonomously, yet conventional RL algorithms often struggle on long-horizon manipulation tasks with sparse rewards. In this paper, we propose a modified diffusion policy optimization algorithm to generate high-quality and low-variance trajectories, which contributes to a diffusion RL-powered VLA training pipeline. Our algorithm benefits from not only the high expressiveness of diffusion models to explore complex and diverse behaviors but also the implicit regularization of the iterative denoising process to yield smooth and consistent demonstrations. We evaluate our approach on the LIBERO benchmark, which includes 130 long-horizon manipulation tasks, and show that the generated trajectories are smoother and more consistent than both human demonstrations and those from standard Gaussian RL policies. Further, training a VLA model exclusively on the diffusion RL-generated data achieves an average success rate of 81.9%, which outperforms the model trained on human data by +5.3% and that on Gaussian RL-generated data by +12.6%. The results highlight our diffusion RL as an effective alternative for generating abundant, high-quality, and low-variance demonstrations for VLA models.
S2S-Net: Addressing the Domain Gap of Heterogeneous Sensor Systems in LiDAR-Based Collective Perception
Collective Perception (CP) has emerged as a promising approach to overcome the limitations of individual perception in the context of autonomous driving. Various approaches have been proposed to realize collective perception; however, the Sensor2Sensor domain gap that arises from the utilization of different sensor systems in Connected and Automated Vehicles (CAVs) remains mostly unaddressed. This is primarily due to the paucity of datasets containing heterogeneous sensor setups among the CAVs. The recently released SCOPE datasets address this issue by providing data from three different LiDAR sensors for each CAV. This study is the first to address the Sensor2Sensor domain gap in vehicle-to-vehicle (V2V) collective perception. First, we present our sensor-domain robust architecture S2S-Net. Then an in-depth analysis of the Sensor2Sensor domain adaptation capabilities of state-of-the-art CP methods and S2S-Net is conducted on the SCOPE dataset. This study shows that, all evaluated state-of-the-art mehtods for collective perception highly suffer from the Sensor2Sensor domain gap, while S2S-Net demonstrates the capability to maintain very high performance in unseen sensor domains and outperforms the evaluated state-of-the-art methods by up to 44 percentage points.
Hybrid Many-Objective Optimization in Probabilistic Mission Design for Compliant and Effective UAV Routing
Advanced Aerial Mobility encompasses many outstanding applications that promise to revolutionize modern logistics and pave the way for various public services and industry uses. However, throughout its history, the development of such systems has been impeded by the complexity of legal restrictions and physical constraints. While airspaces are often tightly shaped by various legal requirements, Unmanned Aerial Vehicles (UAV) must simultaneously consider, among others, energy demands, signal quality, and noise pollution. In this work, we address this challenge by presenting a novel architecture that integrates methods of Probabilistic Mission Design (ProMis) and Many-Objective Optimization for UAV routing. Hereby, our framework is able to comply with legal requirements under uncertainty while producing effective paths that minimize various physical costs a UAV needs to consider when traversing human-inhabited spaces. To this end, we combine hybrid probabilistic first-order logic for spatial reasoning with mixed deterministic-stochastic route optimization, incorporating physical objectives such as energy consumption and radio interference with a logical, probabilistic model of legal requirements. We demonstrate the versatility and advantages of our system in a large-scale empirical evaluation over real-world, crowd-sourced data from a map extract from the city of Paris, France, showing how a network of effective and compliant paths can be formed.
Autonomous Vehicle Controllers From End-to-End Differentiable Simulation IROS 2025
Current methods to learn controllers for autonomous vehicles (AVs) focus on behavioural cloning. Being trained only on exact historic data, the resulting agents often generalize poorly to novel scenarios. Simulators provide the opportunity to go beyond offline datasets, but they are still treated as complicated black boxes, only used to update the global simulation state. As a result, these RL algorithms are slow, sample-inefficient, and prior-agnostic. In this work, we leverage a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers on the large-scale Waymo Open Motion Dataset. Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of the environment dynamics serve as a useful prior to help the agent learn a more grounded policy. We combine this setup with a recurrent architecture that can efficiently propagate temporal information across long simulated trajectories. This APG method allows us to learn robust, accurate, and fast policies, while only requiring widely-available expert trajectories, instead of scarce expert actions. We compare to behavioural cloning and find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
comment: Polished and accepted at IROS 2025
InSpire: Vision-Language-Action Models with Intrinsic Spatial Reasoning
Leveraging pretrained Vision-Language Models (VLMs) to map language instruction and visual observations to raw low-level actions, Vision-Language-Action models (VLAs) hold great promise for achieving general-purpose robotic systems. Despite their advancements, existing VLAs tend to spuriously correlate task-irrelevant visual features with actions, limiting their generalization capacity beyond the training data. To tackle this challenge, we propose Intrinsic Spatial Reasoning (InSpire), a simple yet effective approach that mitigates the adverse effects of spurious correlations by boosting the spatial reasoning ability of VLAs. Specifically, InSpire redirects the VLA's attention to task-relevant factors by prepending the question "In which direction is the [object] relative to the robot?" to the language instruction and aligning the answer "right/left/up/down/front/back/grasped" and predicted actions with ground-truth. Notably, InSpire can be used as a plugin to enhance existing autoregressive VLAs, requiring no extra training data or interaction with other large models. Extensive experimental results in both simulation and real-world environments demonstrate the effectiveness and flexibility of our approach.
Diffusion-Based Impedance Learning for Contact-Rich Manipulation Tasks
Learning methods excel at motion generation in the information domain but are not primarily designed for physical interaction in the energy domain. Impedance Control shapes physical interaction but requires task-aware tuning by selecting feasible impedance parameters. We present Diffusion-Based Impedance Learning, a framework that combines both domains. A Transformer-based Diffusion Model with cross-attention to external wrenches reconstructs a simulated Zero-Force Trajectory (sZFT). This captures both translational and rotational task-space behavior. For rotations, we introduce a novel SLERP-based quaternion noise scheduler that ensures geometric consistency. The reconstructed sZFT is then passed to an energy-based estimator that updates stiffness and damping parameters. A directional rule is applied that reduces impedance along non task axes while preserving rigidity along task directions. Training data were collected for a parkour scenario and robotic-assisted therapy tasks using teleoperation with Apple Vision Pro. With only tens of thousands of samples, the model achieved sub-millimeter positional accuracy and sub-degree rotational accuracy. Its compact model size enabled real-time torque control and autonomous stiffness adaptation on a KUKA LBR iiwa robot. The controller achieved smooth parkour traversal within force and velocity limits and 30/30 success rates for cylindrical, square, and star peg insertions without any peg-specific demonstrations in the training data set. All code for the Transformer-based Diffusion Model, the robot controller, and the Apple Vision Pro telemanipulation framework is publicly available. These results mark an important step towards Physical AI, fusing model-based control for physical interaction with learning-based methods for trajectory generation.
comment: 15 pages, 12 figures
Open-Vocabulary Online Semantic Mapping for SLAM
This paper presents an Open-Vocabulary Online 3D semantic mapping pipeline, that we denote by its acronym OVO. Given a sequence of posed RGB-D frames, we detect and track 3D segments, which we describe using CLIP vectors. These are computed from the viewpoints where they are observed by a novel CLIP merging method. Notably, our OVO has a significantly lower computational and memory footprint than offline baselines, while also showing better segmentation metrics than offline and online ones. Along with superior segmentation performance, we also show experimental results of our mapping contributions integrated with two different full SLAM backbones (Gaussian-SLAM and ORB-SLAM2), being the first ones using a neural network to merge CLIP descriptors and demonstrating end-to-end open-vocabulary online 3D mapping with loop closure.
comment: Accepted for IEEE Robotics and Automation Letters
Loosely coupled 4D-Radar-Inertial Odometry for Ground Robots
Accurate robot odometry is essential for autonomous navigation. While numerous techniques have been developed based on various sensor suites, odometry estimation using only radar and IMU remains an underexplored area. Radar proves particularly valuable in environments where traditional sensors, like cameras or LiDAR, may struggle, especially in low-light conditions or when faced with environmental challenges like fog, rain or smoke. However, despite its robustness, radar data is noisier and more prone to outliers, requiring specialized processing approaches. In this paper, we propose a graph-based optimization approach using a sliding window for radar-based odometry, designed to maintain robust relationships between poses by forming a network of connections, while keeping computational costs fixed (specially beneficial in long trajectories). Additionally, we introduce an enhancement in the ego-velocity estimation specifically for ground vehicles, both holonomic and non-holonomic, which subsequently improves the direct odometry input required by the optimizer. Finally, we present a comparative study of our approach against existing algorithms, showing how our pure odometry approach inproves the state of art in most trajectories of the NTU4DRadLM dataset, achieving promising results when evaluating key performance metrics.
comment: 24 pages, 6 figures, 4 tables, 33 references
Decoupling Geometry from Optimization in 2D Irregular Cutting and Packing Problems: an Open-Source Collision Detection Engine
Addressing irregular cutting and packing (C&P) optimization problems poses two distinct challenges: the geometric challenge of determining whether or not an item can be placed feasibly at a certain position, and the optimization challenge of finding a good solution according to some objective function. Until now, those tackling such problems have had to address both challenges simultaneously, requiring two distinct sets of expertise and a lot of research & development effort. One way to lower this barrier is to decouple the two challenges. In this paper we introduce a powerful collision detection engine (CDE) for 2D irregular C&P problems which assumes full responsibility for the geometric challenge. The CDE (i) allows users to focus with full confidence on their optimization challenge by abstracting geometry away and (ii) enables independent advances to propagate to all optimization algorithms built atop it. We present a set of core principles and design philosophies to model a general and adaptable CDE focused on maximizing performance, accuracy and robustness. These principles are accompanied by a concrete open-source implementation called $\texttt{jagua-rs}$. This paper together with its implementation serves as a catalyst for future advances in irregular C&P problems by providing a solid foundation which can either be used as it currently exists or be further improved upon.
comment: 25 pages, 16 figures
Whole-Body Integrated Motion Planning for Aerial Manipulators
Expressive motion planning for Aerial Manipulators (AMs) is essential for tackling complex manipulation tasks, yet achieving coupled trajectory planning adaptive to various tasks remains challenging, especially for those requiring aggressive maneuvers. In this work, we propose a novel whole-body integrated motion planning framework for quadrotor-based AMs that leverages flexible waypoint constraints to achieve versatile manipulation capabilities. These waypoint constraints enable the specification of individual position requirements for either the quadrotor or end-effector, while also accommodating higher-order velocity and orientation constraints for complex manipulation tasks. To implement our framework, we exploit spatio-temporal trajectory characteristics and formulate an optimization problem to generate feasible trajectories for both the quadrotor and manipulator while ensuring collision avoidance considering varying robot configurations, dynamic feasibility, and kinematic feasibility. Furthermore, to enhance the maneuverability for specific tasks, we employ Imitation Learning (IL) to facilitate the optimization process to avoid poor local optima. The effectiveness of our framework is validated through comprehensive simulations and real-world experiments, where we successfully demonstrate nine fundamental manipulation skills across various environments.
comment: 20 pages, 15 figures
Is FISHER All You Need in The Multi-AUV Underwater Target Tracking Task?
It is significant to employ multiple autonomous underwater vehicles (AUVs) to execute the underwater target tracking task collaboratively. However, it's pretty challenging to meet various prerequisites utilizing traditional control methods. Therefore, we propose an effective two-stage learning from demonstrations training framework, FISHER, to highlight the adaptability of reinforcement learning (RL) methods in the multi-AUV underwater target tracking task, while addressing its limitations such as extensive requirements for environmental interactions and the challenges in designing reward functions. The first stage utilizes imitation learning (IL) to realize policy improvement and generate offline datasets. To be specific, we introduce multi-agent discriminator-actor-critic based on improvements of the generative adversarial IL algorithm and multi-agent IL optimization objective derived from the Nash equilibrium condition. Then in the second stage, we develop multi-agent independent generalized decision transformer, which analyzes the latent representation to match the future states of high-quality samples rather than reward function, attaining further enhanced policies capable of handling various scenarios. Besides, we propose a simulation to simulation demonstration generation procedure to facilitate the generation of expert demonstrations in underwater environments, which capitalizes on traditional control methods and can easily accomplish the domain transfer to obtain demonstrations. Extensive simulation experiments from multiple scenarios showcase that FISHER possesses strong stability, multi-task performance and capability of generalization.
comment: This paper has been accepted by IEEE Transactions on Mobile Computing. Besides, Guanwen Xie and Jingzehua Xu contributed equally to this work
Towards agile multi-robot systems in the real world: Fast onboard tracking of active blinking markers for relative localization
A novel onboard tracking approach enabling vision-based relative localization and communication using Active blinking Marker Tracking (AMT) is introduced in this article. Active blinking markers on multi-robot team members improve the robustness of relative localization for aerial vehicles in tightly coupled multi-robot systems during real-world deployments, while also serving as a resilient communication system. Traditional tracking algorithms struggle with fast-moving blinking markers due to their intermittent appearance in camera frames and the complexity of associating multiple of these markers across consecutive frames. AMT addresses this by using weighted polynomial regression to predict the future appearance of active blinking markers while accounting for uncertainty in the prediction. In outdoor experiments, the AMT approach outperformed state-of-the-art methods in tracking density, accuracy, and complexity. The experimental validation of this novel tracking approach for relative localization and optical communication involved testing motion patterns motivated by our research on agile multi-robot deployment.
EvoAgent: Self-evolving Agent with Continual World Model for Long-Horizon Tasks
Completing Long-Horizon (LH) tasks in open-ended worlds is an important yet difficult problem for embodied agents. Existing approaches suffer from two key challenges: (1) they heavily rely on experiences obtained from human-created data or curricula, failing to autonomously update and select multimodal experiences, and (2) they may encounter catastrophic forgetting issues when faced with new tasks, failing to autonomously update world knowledge. To solve these challenges, this paper presents {\it EvoAgent}, a self-evolving agent with a continual World Model (WM), which can autonomously complete various LH tasks across environments through self-planning, self-control, and self-reflection, without human intervention. Our proposed EvoAgent contains three modules, i.e., i) the memory-driven planner which uses an LLM along with the WM and interaction memory, to convert LH tasks into executable sub-tasks; ii) the WM-guided action controller which leverages WM to generate low-level actions and incorporates a self-verification mechanism to update multimodal experiences; iii) the experience-inspired reflector which implements a two-stage curriculum learning algorithm to select experiences for task-adaptive WM updates. Moreover, we develop a continual World Model for EvoAgent, which can autonomously update the multimodal experience pool and world knowledge through closed-loop dynamics. We conducted extensive experiments on Minecraft and Atair, compared with existing methods, EvoAgent can achieve an average success rate improvement of 105% and reduce ineffective actions by more than 6x.
Guided Reinforcement Learning for Omnidirectional 3D Jumping in Quadruped Robots
Jumping poses a significant challenge for quadruped robots, despite being crucial for many operational scenarios. While optimisation methods exist for controlling such motions, they are often time-consuming and demand extensive knowledge of robot and terrain parameters, making them less robust in real-world scenarios. Reinforcement learning (RL) is emerging as a viable alternative, yet conventional end-to-end approaches lack efficiency in terms of sample complexity, requiring extensive training in simulations, and predictability of the final motion, which makes it difficult to certify the safety of the final motion. To overcome these limitations, this paper introduces a novel guided reinforcement learning approach that leverages physical intuition for efficient and explainable jumping, by combining B\'ezier curves with a Uniformly Accelerated Rectilinear Motion (UARM) model. Extensive simulation and experimental results clearly demonstrate the advantages of our approach over existing alternatives.
GLEAM: Learning Generalizable Exploration Policy for Active Mapping in Complex 3D Indoor Scenes ICCV 2025
Generalizable active mapping in complex unknown environments remains a critical challenge for mobile robots. Existing methods, constrained by insufficient training data and conservative exploration strategies, exhibit limited generalizability across scenes with diverse layouts and complex connectivity. To enable scalable training and reliable evaluation, we introduce GLEAM-Bench, the first large-scale benchmark designed for generalizable active mapping with 1,152 diverse 3D scenes from synthetic and real-scan datasets. Building upon this foundation, we propose GLEAM, a unified generalizable exploration policy for active mapping. Its superior generalizability comes mainly from our semantic representations, long-term navigable goals, and randomized strategies. It significantly outperforms state-of-the-art methods, achieving 66.50% coverage (+9.49%) with efficient trajectories and improved mapping accuracy on 128 unseen complex scenes. Project page: https://xiao-chen.tech/gleam/.
comment: Accepted by ICCV 2025. Project page: https://xiao-chen.tech/gleam/
A Multi-Modality Evaluation of the Reality Gap in Autonomous Driving Systems
Simulation-based testing is a cornerstone of Autonomous Driving System (ADS) development, offering safe and scalable evaluation across diverse driving scenarios. However, discrepancies between simulated and real-world behavior, known as the reality gap, challenge the transferability of test results to deployed systems. In this paper, we present a comprehensive empirical study comparing four representative testing modalities: Software-in-the-Loop (SiL), Vehicle-in-the-Loop (ViL), Mixed-Reality (MR), and full real-world testing. Using a small-scale physical vehicle equipped with real sensors (camera and LiDAR) and its digital twin, we implement each setup and evaluate two ADS architectures (modular and end-to-end) across diverse indoor driving scenarios involving real obstacles, road topologies, and indoor environments. We systematically assess the impact of each testing modality along three dimensions of the reality gap: actuation, perception, and behavioral fidelity. Our results show that while SiL and ViL setups simplify critical aspects of real-world dynamics and sensing, MR testing improves perceptual realism without compromising safety or control. Importantly, we identify the conditions under which failures do not transfer across testing modalities and isolate the underlying dimensions of the gap responsible for these discrepancies. Our findings offer actionable insights into the respective strengths and limitations of each modality and outline a path toward more robust and transferable validation of autonomous driving systems.
comment: In proceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering (ASE '25)
Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration ACL'2025
Grounding the reasoning ability of large language models (LLMs) for embodied tasks is challenging due to the complexity of the physical world. Especially, LLM planning for multi-agent collaboration requires communication of agents or credit assignment as the feedback to re-adjust the proposed plans and achieve effective coordination. However, existing methods that overly rely on physical verification or self-reflection suffer from excessive and inefficient querying of LLMs. In this paper, we propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans. Specifically, we perform critic regression to learn a sequential advantage function from LLM-planned data, and then treat the LLM planner as an optimizer to generate actions that maximize the advantage function. It endows the LLM with the foresight to discern whether the action contributes to accomplishing the final task. We provide theoretical analysis by extending advantage-weighted regression in reinforcement learning to multi-agent systems. Experiments on Overcooked-AI and a difficult variant of RoCoBench show that ReAd surpasses baselines in success rate, and also significantly decreases the interaction steps of agents and query rounds of LLMs, demonstrating its high efficiency for grounding LLMs. More results are given at https://embodied-read.github.io
comment: accepted by ACL'2025
MimicDreamer: Aligning Human and Robot Demonstrations for Scalable VLA Training
Vision Language Action (VLA) models derive their generalization capability from diverse training data, yet collecting embodied robot interaction data remains prohibitively expensive. In contrast, human demonstration videos are far more scalable and cost-efficient to collect, and recent studies confirm their effectiveness in training VLA models. However, a significant domain gap persists between human videos and robot-executed videos, including unstable camera viewpoints, visual discrepancies between human hands and robotic arms, and differences in motion dynamics. To bridge this gap, we propose MimicDreamer, a framework that turns fast, low-cost human demonstrations into robot-usable supervision by jointly aligning vision, viewpoint, and actions to directly support policy training. For visual alignment, we propose H2R Aligner, a video diffusion model that generates high-fidelity robot demonstration videos by transferring motion from human manipulation footage. For viewpoint stabilization, EgoStabilizer is proposed, which canonicalizes egocentric videos via homography and inpaints occlusions and distortions caused by warping. For action alignment, we map human hand trajectories to the robot frame and apply a constrained inverse kinematics solver to produce feasible, low-jitter joint commands with accurate pose tracking. Empirically, VLA models trained purely on our synthesized human-to-robot videos achieve few-shot execution on real robots. Moreover, scaling training with human data significantly boosts performance compared to models trained solely on real robot data; our approach improves the average success rate by 14.7\% across six representative manipulation tasks.
InterKey: Cross-modal Intersection Keypoints for Global Localization on OpenStreetMap
Reliable global localization is critical for autonomous vehicles, especially in environments where GNSS is degraded or unavailable, such as urban canyons and tunnels. Although high-definition (HD) maps provide accurate priors, the cost of data collection, map construction, and maintenance limits scalability. OpenStreetMap (OSM) offers a free and globally available alternative, but its coarse abstraction poses challenges for matching with sensor data. We propose InterKey, a cross-modal framework that leverages road intersections as distinctive landmarks for global localization. Our method constructs compact binary descriptors by jointly encoding road and building imprints from point clouds and OSM. To bridge modality gaps, we introduce discrepancy mitigation, orientation determination, and area-equalized sampling strategies, enabling robust cross-modal matching. Experiments on the KITTI dataset demonstrate that InterKey achieves state-of-the-art accuracy, outperforming recent baselines by a large margin. The framework generalizes to sensors that can produce dense structural point clouds, offering a scalable and cost-effective solution for robust vehicle localization.
comment: 8 pages, 5 figures
VLBiMan: Vision-Language Anchored One-Shot Demonstration Enables Generalizable Bimanual Robotic Manipulation
Achieving generalizable bimanual manipulation requires systems that can learn efficiently from minimal human input while adapting to real-world uncertainties and diverse embodiments. Existing approaches face a dilemma: imitation policy learning demands extensive demonstrations to cover task variations, while modular methods often lack flexibility in dynamic scenes. We introduce VLBiMan, a framework that derives reusable skills from a single human example through task-aware decomposition, preserving invariant primitives as anchors while dynamically adapting adjustable components via vision-language grounding. This adaptation mechanism resolves scene ambiguities caused by background changes, object repositioning, or visual clutter without policy retraining, leveraging semantic parsing and geometric feasibility constraints. Moreover, the system inherits human-like hybrid control capabilities, enabling mixed synchronous and asynchronous use of both arms. Extensive experiments validate VLBiMan across tool-use and multi-object tasks, demonstrating: (1) a drastic reduction in demonstration requirements compared to imitation baselines, (2) compositional generalization through atomic skill splicing for long-horizon tasks, (3) robustness to novel but semantically similar objects and external disturbances, and (4) strong cross-embodiment transfer, showing that skills learned from human demonstrations can be instantiated on different robotic platforms without retraining. By bridging human priors with vision-language anchored adaptation, our work takes a step toward practical and versatile dual-arm manipulation in unstructured settings.
comment: under review
Blast Hole Seeking and Dipping -- The Navigation and Perception Framework in a Mine Site Inspection Robot
In open-pit mining, holes are drilled into the surface of the excavation site and detonated with explosives to facilitate digging. These blast holes need to be inspected internally to assess subsurface material types and drill quality, in order to significantly reduce downstream material handling costs. Manual hole inspection is slow and expensive, limited in its ability to capture the geometric and geological characteristics of holes. This has been the motivation for the development of our autonomous mine-site inspection robot - "DIPPeR". In this paper, the automation aspect of the project is explained. We present a robust perception and navigation framework that provides streamlined blasthole seeking, tracking and accurate down-hole sensor positioning. To address challenges in the surface mining environment, where GPS and odometry data are noisy without RTK correction, we adopt a proximity-based adaptive navigation approach, enabling the vehicle to dynamically adjust its operations based on detected target availability and localisation accuracy. For perception, we process LiDAR data to extract the cone-shaped volume of drill-waste above ground, then project the 3D cone points into a virtual depth image to form accurate 2D segmentation of hole regions. To ensure continuous target-tracking as the robot approaches the goal, our system automatically adjusts projection parameters to preserve consistent hole image appearance. At the vicinity of the hole, we apply least squares circle fitting with non-maximum candidate suppression to achieve accurate hole detection and collision-free down-hole sensor placement. We demonstrate the effectiveness of our navigation and perception system in both high-fidelity simulation environments and on-site field trials. A demonstration video is available at https://www.youtube.com/watch?v=fRNbcBcaSqE.
MASt3R-Fusion: Integrating Feed-Forward Visual Model with IMU, GNSS for High-Functionality SLAM
Visual SLAM is a cornerstone technique in robotics, autonomous driving and extended reality (XR), yet classical systems often struggle with low-texture environments, scale ambiguity, and degraded performance under challenging visual conditions. Recent advancements in feed-forward neural network-based pointmap regression have demonstrated the potential to recover high-fidelity 3D scene geometry directly from images, leveraging learned spatial priors to overcome limitations of traditional multi-view geometry methods. However, the widely validated advantages of probabilistic multi-sensor information fusion are often discarded in these pipelines. In this work, we propose MASt3R-Fusion,a multi-sensor-assisted visual SLAM framework that tightly integrates feed-forward pointmap regression with complementary sensor information, including inertial measurements and GNSS data. The system introduces Sim(3)-based visualalignment constraints (in the Hessian form) into a universal metric-scale SE(3) factor graph for effective information fusion. A hierarchical factor graph design is developed, which allows both real-time sliding-window optimization and global optimization with aggressive loop closures, enabling real-time pose tracking, metric-scale structure perception and globally consistent mapping. We evaluate our approach on both public benchmarks and self-collected datasets, demonstrating substantial improvements in accuracy and robustness over existing visual-centered multi-sensor SLAM systems. The code will be released open-source to support reproducibility and further research (https://github.com/GREAT-WHU/MASt3R-Fusion).
FindingDory: A Benchmark to Evaluate Memory in Embodied Agents
Large vision-language models have recently demonstrated impressive performance in planning and control tasks, driving interest in their application to real-world robotics. However, deploying these models for reasoning in embodied contexts is limited by their ability to incorporate long-term experience collected across multiple days and represented by vast collections of images. Current VLMs typically struggle to process more than a few hundred images concurrently, highlighting the need for more efficient mechanisms to handle long-term memory in embodied settings. To effectively evaluate these models for long-horizon control, a benchmark must specifically target scenarios where memory is crucial for success. Existing long-video QA benchmarks overlook embodied challenges like object manipulation and navigation, which demand low-level skills and fine-grained reasoning over past interactions. Moreover, effective memory integration in embodied agents involves both recalling relevant historical information and executing actions based on that information, making it essential to study these aspects together rather than in isolation. In this work, we introduce a new benchmark for long-range embodied tasks in the Habitat simulator. This benchmark evaluates memory-based capabilities across 60 tasks requiring sustained engagement and contextual awareness in an environment. The tasks can also be procedurally extended to longer and more challenging versions, enabling scalable evaluation of memory and reasoning. We also present baselines that integrate state-of-the-art VLMs with low level navigation policies, assessing their performance on these memory-intensive tasks and highlight areas for improvement.
comment: Our dataset and code can be found at: https://findingdory-benchmark.github.io/
Leveraging Temporally Extended Behavior Sharing for Multi-task Reinforcement Learning IROS
Multi-task reinforcement learning (MTRL) offers a promising approach to improve sample efficiency and generalization by training agents across multiple tasks, enabling knowledge sharing between them. However, applying MTRL to robotics remains challenging due to the high cost of collecting diverse task data. To address this, we propose MT-L\'evy, a novel exploration strategy that enhances sample efficiency in MTRL environments by combining behavior sharing across tasks with temporally extended exploration inspired by L\'evy flight. MT-L\'evy leverages policies trained on related tasks to guide exploration towards key states, while dynamically adjusting exploration levels based on task success ratios. This approach enables more efficient state-space coverage, even in complex robotics environments. Empirical results demonstrate that MT-L\'evy significantly improves exploration and sample efficiency, supported by quantitative and qualitative analyses. Ablation studies further highlight the contribution of each component, showing that combining behavior sharing with adaptive exploration strategies can significantly improve the practicality of MTRL in robotics applications.
comment: Accepted for publication in the proceedings of the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
VIMD: Monocular Visual-Inertial Motion and Depth Estimation
Accurate and efficient dense metric depth estimation is crucial for 3D visual perception in robotics and XR. In this paper, we develop a monocular visual-inertial motion and depth (VIMD) learning framework to estimate dense metric depth by leveraging accurate and efficient MSCKF-based monocular visual-inertial motion tracking. At the core the proposed VIMD is to exploit multi-view information to iteratively refine per-pixel scale, instead of globally fitting an invariant affine model as in the prior work. The VIMD framework is highly modular, making it compatible with a variety of existing depth estimation backbones. We conduct extensive evaluations on the TartanAir and VOID datasets and demonstrate its zero-shot generalization capabilities on the AR Table dataset. Our results show that VIMD achieves exceptional accuracy and robustness, even with extremely sparse points as few as 10-20 metric depth points per image. This makes the proposed VIMD a practical solution for deployment in resource constrained settings, while its robust performance and strong generalization capabilities offer significant potential across a wide range of scenarios.
Trajectory Encryption Cooperative Salvo Guidance
This paper introduces the concept of trajectory encryption in cooperative simultaneous target interception, wherein heterogeneity in guidance principles across a team of unmanned autonomous systems is leveraged as a strategic design feature. By employing a mix of heterogeneous time-to-go formulations leading to a cooperative guidance strategy, the swarm of vehicles is able to generate diverse trajectory families. This diversity expands the feasible solution space for simultaneous target interception, enhances robustness under disturbances, and enables flexible time-to-go adjustments without predictable detouring. From an adversarial perspective, heterogeneity obscures the collective interception intent by preventing straightforward prediction of swarm dynamics, effectively acting as an encryption layer in the trajectory domain. Simulations demonstrate that the swarm of heterogeneous vehicles is able to intercept a moving target simultaneously from a diverse set of initial engagement configurations.
Electrostatic Clutch-Based Mechanical Multiplexer with Increased Force Capability
Robotic systems with many degrees of freedom (DoF) are constrained by the demands of dedicating a motor to each joint, and while mechanical multiplexing reduces actuator count, existing clutch designs are bulky, force-limited, or restricted to one output at a time. The problem addressed in this study is how to achieve high-force, multiplexing that supports both simultaneous and sequential control from a single motor. Here we show an electrostatic capstan clutch-based transmission that enables both single-input-single-output (SISO) and single-input-multiple-output (SIMO) multiplexing. We demonstrated these on a four-DoF tendon-driven robotic hand where a single motor achieved output forces up to 212 N, increased vertical grip strength by 4.09 times, and raised horizontal carrying capacity by 354\% over manufacturer specifications. These results demonstrate that electrostatic multiplexing provides versatile actuation, overcoming the limitations of prior systems.
DAGDiff: Guiding Dual-Arm Grasp Diffusion to Stable and Collision-Free Grasps
Reliable dual-arm grasping is essential for manipulating large and complex objects but remains a challenging problem due to stability, collision, and generalization requirements. Prior methods typically decompose the task into two independent grasp proposals, relying on region priors or heuristics that limit generalization and provide no principled guarantee of stability. We propose DAGDiff, an end-to-end framework that directly denoises to grasp pairs in the SE(3) x SE(3) space. Our key insight is that stability and collision can be enforced more effectively by guiding the diffusion process with classifier signals, rather than relying on explicit region detection or object priors. To this end, DAGDiff integrates geometry-, stability-, and collision-aware guidance terms that steer the generative process toward grasps that are physically valid and force-closure compliant. We comprehensively evaluate DAGDiff through analytical force-closure checks, collision analysis, and large-scale physics-based simulations, showing consistent improvements over previous work on these metrics. Finally, we demonstrate that our framework generates dual-arm grasps directly on real-world point clouds of previously unseen objects, which are executed on a heterogeneous dual-arm setup where two manipulators reliably grasp and lift them.
Video models are zero-shot learners and reasoners
The remarkable zero-shot capabilities of Large Language Models (LLMs) have propelled natural language processing from task-specific models to unified, generalist foundation models. This transformation emerged from simple primitives: large, generative models trained on web-scale data. Curiously, the same primitives apply to today's generative video models. Could video models be on a trajectory towards general-purpose vision understanding, much like LLMs developed general-purpose language understanding? We demonstrate that Veo 3 can solve a broad variety of tasks it wasn't explicitly trained for: segmenting objects, detecting edges, editing images, understanding physical properties, recognizing object affordances, simulating tool use, and more. These abilities to perceive, model, and manipulate the visual world enable early forms of visual reasoning like maze and symmetry solving. Veo's emergent zero-shot capabilities indicate that video models are on a path to becoming unified, generalist vision foundation models.
comment: Project page: https://video-zero-shot.github.io/
TaBSA -- A framework for training and benchmarking algorithms scheduling tasks for mobile robots working in dynamic environments
This article introduces a software framework for benchmarking robot task scheduling algorithms in dynamic and uncertain service environments. The system provides standardized interfaces, configurable scenarios with movable objects, human agents, tools for automated test generation, and performance evaluation. It supports both classical and AI-based methods, enabling repeatable, comparable assessments across diverse tasks and configurations. The framework facilitates diagnosis of algorithm behavior, identification of implementation flaws, and selection or tuning of strategies for specific applications. It includes a SysML-based domain-specific language for structured scenario modeling and integrates with the ROS-based system for runtime execution. Validated on patrol, fall assistance, and pick-and-place tasks, the open-source framework is suited for researchers and integrators developing and testing scheduling algorithms under real-world-inspired conditions.
comment: Article submitted to the SoftwareX journal
Controllable Motion Generation via Diffusion Modal Coupling
Diffusion models have recently gained significant attention in robotics due to their ability to generate multi-modal distributions of system states and behaviors. However, a key challenge remains: ensuring precise control over the generated outcomes without compromising realism. This is crucial for applications such as motion planning or trajectory forecasting, where adherence to physical constraints and task-specific objectives is essential. We propose a novel framework that enhances controllability in diffusion models by leveraging multi-modal prior distributions and enforcing strong modal coupling. This allows us to initiate the denoising process directly from distinct prior modes that correspond to different possible system behaviors, ensuring sampling to align with the training distribution. We evaluate our approach on motion prediction using the Waymo dataset and multi-task control in Maze2D environments. Experimental results show that our framework outperforms both guidance-based techniques and conditioned models with unimodal priors, achieving superior fidelity, diversity, and controllability, even in the absence of explicit conditioning. Overall, our approach provides a more reliable and scalable solution for controllable motion generation in robotics.
ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving
Recent studies have explored leveraging the world knowledge and cognitive capabilities of Vision-Language Models (VLMs) to address the long-tail problem in end-to-end autonomous driving. However, existing methods typically formulate trajectory planning as a language modeling task, where physical actions are output in the language space, potentially leading to issues such as format-violating outputs, infeasible actions, and slow inference speeds. In this paper, we propose ReCogDrive, a novel Reinforced Cognitive framework for end-to-end autonomous Driving, unifying driving understanding and planning by integrating an autoregressive model with a diffusion planner. First, to instill human driving cognition into the VLM, we introduce a hierarchical data pipeline that mimics the sequential cognitive process of human drivers through three stages: generation, refinement, and quality control. Building on this cognitive foundation, we then address the language-action mismatch by injecting the VLM's learned driving priors into a diffusion planner to efficiently generate continuous and stable trajectories. Furthermore, to enhance driving safety and reduce collisions, we introduce a Diffusion Group Relative Policy Optimization (DiffGRPO) stage, reinforcing the planner for enhanced safety and comfort. Extensive experiments on the NAVSIM and Bench2Drive benchmarks demonstrate that ReCogDrive achieves state-of-the-art performance. Additionally, qualitative results across diverse driving scenarios and DriveBench highlight the model's scene comprehension. All code, model weights, and datasets will be made publicly available to facilitate subsequent research.
Multiagent Systems
HeDA: An Intelligent Agent System for Heatwave Risk Discovery through Automated Knowledge Graph Construction and Multi-layer Risk Propagation Analysis
Heatwaves pose complex cascading risks across interconnected climate, social, and economic systems, but knowledge fragmentation in scientific literature hinders comprehensive understanding of these risk pathways. We introduce HeDA (Heatwave Discovery Agent), an intelligent multi-agent system designed for automated scientific discovery through knowledge graph construction and multi-layer risk propagation analysis. HeDA processes over 10,247 academic papers to construct a comprehensive knowledge graph with 23,156 nodes and 89,472 relationships, employing novel multi-layer risk propagation analysis to systematically identify overlooked risk transmission pathways. Our system achieves 78.9% accuracy on complex question-answering tasks, outperforming state-of-the-art baselines including GPT-4 by 13.7%. Critically, HeDA successfully discovered five previously unidentified high-impact risk chains, such as the pathway where a heatwave leads to a water demand surge, resulting in industrial water restrictions and ultimately causing small business disruption, which were validated through historical case studies and domain expert review. This work presents a new paradigm for AI-driven scientific discovery, providing actionable insights for developing more resilient climate adaptation strategies.
Curriculum Imitation Learning of Distributed Multi-Robot Policies
Learning control policies for multi-robot systems (MRS) remains a major challenge due to long-term coordination and the difficulty of obtaining realistic training data. In this work, we address both limitations within an imitation learning framework. First, we shift the typical role of Curriculum Learning in MRS, from scalability with the number of robots, to focus on improving long-term coordination. We propose a curriculum strategy that gradually increases the length of expert trajectories during training, stabilizing learning and enhancing the accuracy of long-term behaviors. Second, we introduce a method to approximate the egocentric perception of each robot using only third-person global state demonstrations. Our approach transforms idealized trajectories into locally available observations by filtering neighbors, converting reference frames, and simulating onboard sensor variability. Both contributions are integrated into a physics-informed technique to produce scalable, distributed policies from observations. We conduct experiments across two tasks with varying team sizes and noise levels. Results show that our curriculum improves long-term accuracy, while our perceptual estimation method yields policies that are robust to realistic uncertainty. Together, these strategies enable the learning of robust, distributed controllers from global demonstrations, even in the absence of expert actions or onboard measurements.
comment: Accepted and presented at the Eight Iberian Robotics Conference, 2025
Safety-Critical Input-Constrained Nonlinear Intercept Guidance in Multiple Engagement Zones
This paper presents an input-constrained nonlinear guidance law to address the problem of intercepting a stationary target in contested environments with multiple defending agents. Contrary to prior approaches that rely on explicit knowledge of defender strategies or utilize conservative safety conditions based on a defender's range, our work characterizes defender threats geometrically through engagement zones that delineate inevitable interception regions. Outside these engagement zones, the interceptor remains invulnerable. The proposed guidance law switches between a repulsive safety maneuver near these zones and a pursuit maneuver outside their influence. To deal with multiple engagement zones, we employ a smooth minimum function (log-sum-exponent approximation) that aggregates threats from all the zones while prioritizing the most critical threats. Input saturation is modeled and embedded in the non-holonomic vehicle dynamics so the controller respects actuator limits while maintaining stability. Numerical simulations with several defenders demonstrate the proposed method's ability to avoid engagement zones and achieve interception across diverse initial conditions.
MARLIN: Multi-Agent Reinforcement Learning with Murmuration Intelligence and LLM Guidance for Reservoir Management
As climate change intensifies extreme weather events, water disasters pose growing threats to global communities, making adaptive reservoir management critical for protecting vulnerable populations and ensuring water security. Modern water resource management faces unprecedented challenges from cascading uncertainties propagating through interconnected reservoir networks. These uncertainties, rooted in physical water transfer losses and environmental variability, make precise control difficult. For example, sending 10 tons downstream may yield only 8-12 tons due to evaporation and seepage. Traditional centralized optimization approaches suffer from exponential computational complexity and cannot effectively handle such real-world uncertainties, while existing multi-agent reinforcement learning (MARL) methods fail to achieve effective coordination under uncertainty. To address these challenges, we present MARLIN, a decentralized reservoir management framework inspired by starling murmurations intelligence. Integrating bio-inspired alignment, separation, and cohesion rules with MARL, MARLIN enables individual reservoirs to make local decisions while achieving emergent global coordination. In addition, a LLM provides real-time reward shaping signals, guiding agents to adapt to environmental changes and human-defined preferences. Experiments on real-world USGS data show that MARLIN improves uncertainty handling by 23\%, cuts computation by 35\%, and accelerates flood response by 68\%, exhibiting super-linear coordination, with complexity scaling 5.4x from 400 to 10,000 nodes. These results demonstrate MARLIN's potential for disaster prevention and protecting communities through intelligent, scalable water resource management.
AIPOM: Agent-aware Interactive Planning for Multi-Agent Systems EMNLP 2025
Large language models (LLMs) are being increasingly used for planning in orchestrated multi-agent systems. However, existing LLM-based approaches often fall short of human expectations and, critically, lack effective mechanisms for users to inspect, understand, and control their behaviors. These limitations call for enhanced transparency, controllability, and human oversight. To address this, we introduce AIPOM, a system supporting human-in-the-loop planning through conversational and graph-based interfaces. AIPOM enables users to transparently inspect, refine, and collaboratively guide LLM-generated plans, significantly enhancing user control and trust in multi-agent workflows. Our code and demo video are available at https://github.com/megagonlabs/aipom.
comment: EMNLP 2025 Demo
Prompting Robot Teams with Natural Language
This paper presents a framework towards prompting multi-robot teams with high-level tasks using natural language expressions. Our objective is to use the reasoning capabilities demonstrated by recent language models in understanding and decomposing human expressions of intent, and repurpose these for multi-robot collaboration and decision-making. The key challenge is that an individual's behavior in a collective can be hard to specify and interpret, and must continuously adapt to actions from others. This necessitates a framework that possesses the representational capacity required by the logic and semantics of a task, and yet supports decentralized and interactive real-time operation. We solve this dilemma by recognizing that a task can be represented as a deterministic finite automaton (DFA), and that recurrent neural networks (RNNs) can encode numerous automata. This allows us to distill the logic and sequential decompositions of sub-tasks obtained from a language model into an RNN, and align its internal states with the semantics of a given task. By training a graph neural network (GNN) control policy that is conditioned on the hidden states of the RNN and the language embeddings, our method enables robots to execute task-relevant actions in a decentralized manner. We present evaluations of this single light-weight interpretable model on various simulated and real-world multi-robot tasks that require sequential and collaborative behavior by the team -- sites.google.com/view/prompting-teams.
MAS$^2$: Self-Generative, Self-Configuring, Self-Rectifying Multi-Agent Systems
The past two years have witnessed the meteoric rise of Large Language Model (LLM)-powered multi-agent systems (MAS), which harness collective intelligence and exhibit a remarkable trajectory toward self-evolution. This paradigm has rapidly progressed from manually engineered systems that require bespoke configuration of prompts, tools, roles, and communication protocols toward frameworks capable of automated orchestration. Yet, dominant automatic multi-agent systems, whether generated by external modules or a single LLM agent, largely adhere to a rigid ``\textit{generate-once-and-deploy}'' paradigm, rendering the resulting systems brittle and ill-prepared for the dynamism and uncertainty of real-world environments. To transcend this limitation, we introduce MAS$^2$, a paradigm predicated on the principle of recursive self-generation: a multi-agent system that autonomously architects bespoke multi-agent systems for diverse problems. Technically, we devise a ``\textit{generator-implementer-rectifier}'' tri-agent team capable of dynamically composing and adaptively rectifying a target agent system in response to real-time task demands. Collaborative Tree Optimization is proposed to train and specialize these meta-agents. Extensive evaluation across seven benchmarks reveals that MAS$^2$ achieves performance gains of up to $19.6\%$ over state-of-the-art MAS in complex scenarios such as deep research and code generation. Moreover, MAS$^2$ exhibits superior cross-backbone generalization, effectively leveraging previously unseen LLMs to yield improvements of up to $15.1\%$. Crucially, these gains are attained without incurring excessive token costs, as MAS$^2$ consistently resides on the Pareto frontier of cost-performance trade-offs. The source codes are available at https://github.com/yeyeyeah2/MAS2.
A General Theory of Emergent Linearity in Complex Dynamical Systems: The Role of Spatial Averaging and Vanishing Correlations
Various natural and engineered systems, from urban traffic flow to the human brain, have been described by large-scale networked dynamical systems. Despite their vast differences, these systems are often similar in being comprised of numerous microscopic subsystems with complex nonlinear dynamics and interactions that give rise to diverse emergent macroscopic behaviors. As such, a long-standing question across various fields has been to understand why and how various forms of macroscopic behavior emerge from underlying microscopic dynamics. Motivated by a growing body of empirical observations, in this work we focus on linearity as one of the most fundamental aspects of system dynamics, and develop a general theoretical framework for the interplay between spatial averaging, decaying microscopic correlations, and emergent macroscopic linearity. Using and extending the theory of mixing sequences, we show that in a broad class of autonomous nonlinear networked systems, the dynamics of the average of all subsystems' states becomes asymptotically linear as the number of subsystems grows to infinity, provided that (in addition to technical assumptions) pairwise correlations between subsystems decay to 0 as their pairwise distance grows to infinity. We prove this result when the latter distance is between subsystems' linear indices or spatial locations, and provide extensions to linear time-invariant (LTI) limit dynamics, finite-sample analysis of rates of convergence, and networks of spatially-embedded subsystems with random locations. To our knowledge, this work is the first rigorous analysis of macroscopic linearity in large-scale heterogeneous networked systems, and provides a solid foundation for further theoretical and empirical analyses in various domains of science and engineering.
A(I)nimism: Re-enchanting the World Through AI-Mediated Object Interaction
Animist worldviews treat beings, plants, landscapes, and even tools as persons endowed with spirit, an orientation that has long shaped human-nonhuman relations through ritual and moral practice. While modern industrial societies have often imagined technology as mute and mechanical, recent advances in artificial intelligence (AI), especially large language models (LLMs), invite people to anthropomorphize and attribute inner life to devices. This paper introduces A(I)nimism, an interactive installation exploring how large language objects (LLOs) can mediate animistic relationships with everyday things. Housed within a physical 'portal', the system uses GPT-4 Vision, voice input, and memory-based agents to create evolving object-personas. Encounters unfold through light, sound, and touch in a ritual-like process of request, conversation, and transformation that is designed to evoke empathy, wonder, and reflection. We situate the project within anthropological perspectives, speculative design, and spiritual HCI. AI's opacity, we argue, invites animistic interpretation, allowing LLOs to re-enchant the mundane and spark new questions of agency, responsibility, and design.
Dive into the Agent Matrix: A Realistic Evaluation of Self-Replication Risk in LLM Agents
The widespread deployment of Large Language Model (LLM) agents across real-world applications has unlocked tremendous potential, while raising some safety concerns. Among these concerns, the self-replication risk of LLM agents driven by objective misalignment (just like Agent Smith in the movie The Matrix) has drawn growing attention. Previous studies mainly examine whether LLM agents can self-replicate when directly instructed, potentially overlooking the risk of spontaneous replication driven by real-world settings (e.g., ensuring survival against termination threats). In this paper, we present a comprehensive evaluation framework for quantifying self-replication risks. Our framework establishes authentic production environments and realistic tasks (e.g., dynamic load balancing) to enable scenario-driven assessment of agent behaviors. Designing tasks that might induce misalignment between users' and agents' objectives makes it possible to decouple replication success from risk and capture self-replication risks arising from these misalignment settings. We further introduce Overuse Rate ($\mathrm{OR}$) and Aggregate Overuse Count ($\mathrm{AOC}$) metrics, which precisely capture the frequency and severity of uncontrolled replication. In our evaluation of 21 state-of-the-art open-source and proprietary models, we observe that over 50\% of LLM agents display a pronounced tendency toward uncontrolled self-replication, reaching an overall Risk Score ($\Phi_\mathrm{R}$) above a safety threshold of 0.5 when subjected to operational pressures. Our results underscore the urgent need for scenario-driven risk assessment and robust safeguards in the practical deployment of LLM agents.
comment: 21 pages, 6 figures
ID-RAG: Identity Retrieval-Augmented Generation for Long-Horizon Persona Coherence in Generative Agents ECAI 2025
Generative agents powered by language models are increasingly deployed for long-horizon tasks. However, as long-term memory context grows over time, they struggle to maintain coherence. This deficiency leads to critical failures, including identity drift, ignoring established beliefs, and the propagation of hallucinations in multi-agent systems. To mitigate these challenges, this paper introduces Identity Retrieval-Augmented Generation (ID-RAG), a novel mechanism designed to ground an agent's persona and persistent preferences in a dynamic, structured identity model: a knowledge graph of core beliefs, traits, and values. During the agent's decision loop, this model is queried to retrieve relevant identity context, which directly informs action selection. We demonstrate this approach by introducing and implementing a new class of ID-RAG enabled agents called Human-AI Agents (HAis), where the identity model is inspired by the Chronicle structure used in Perspective-Aware AI, a dynamic knowledge graph learned from a real-world entity's digital footprint. In social simulations of a mayoral election, HAis using ID-RAG outperformed baseline agents in long-horizon persona coherence - achieving higher identity recall across all tested models by the fourth timestep - and reduced simulation convergence time by 19% (GPT-4o) and 58% (GPT-4o mini). By treating identity as an explicit, retrievable knowledge structure, ID-RAG offers a foundational approach for developing more temporally coherent, interpretable, and aligned generative agents. Our code is open-source and available at: https://github.com/flybits/humanai-agents.
comment: Accepted to LLAIS 2025: Workshop on LLM-Based Agents for Intelligent Systems, at ECAI 2025, 12 pages, 3 figures
Hierarchical Task Environments as the Next Frontier for Embodied World Models in Robot Soccer NeurIPS 2025
Recent advances in agent development have focused on scaling model size and raw interaction data, mirroring the successes seen in large language models. However, for complex, long-horizon multi-agent tasks such as robotic soccer, this end-to-end approach often fails due to intractable exploration spaces and sparse rewards. This position paper argues that the next frontier in developing embodied world models is not merely increasing the fidelity or size of environments, but scaling their structural complexity through explicit hierarchical scaffolding. We posit that an effective world model for decision-making must model not only the world's physics but also its task semantics. Drawing from a systematic review of 2024 research in low-resource multi-agent soccer, we identify a clear trend towards integrating symbolic and hierarchical methods, such as Hierarchical Task Networks (HTNs) and Bayesian Strategy Networks (BSNs), with multi-agent reinforcement learning (MARL). These methods decompose complex goals into manageable subgoals, creating an intrinsic curriculum that shapes agent learning. We propose that such structured environments are essential for bridging the gap between simple, reactive behaviors and sophisticated, strategic team play. We further extend this principle, proposing that this scaffolding can be generalized to other complex domains and dynamically generated by Large Language Models (LLMs), which act as generative world models of tasks. By building environments with explicit, composable task layers, we can guide agent exploration more efficiently, generate meaningful learning signals, and ultimately train more capable and general-purpose agents with fewer resources than purely end-to-end approaches.
comment: In the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Embodied World Models for Decision Making (EWM)
Co-Evolving Complexity: An Adversarial Framework for Automatic MARL Curricula NeurIPS 2025
The advancement of general-purpose intelligent agents is intrinsically linked to the environments in which they are trained. While scaling models and datasets has yielded remarkable capabilities, scaling the complexity, diversity, and interactivity of environments remains a crucial bottleneck. Hand-crafted environments are finite and often contain implicit biases, limiting the potential for agents to develop truly generalizable and robust skills. In this work, we propose a paradigm for generating a boundless and adaptive curriculum of challenges by framing the environment generation process as an adversarial game. We introduce a system where a team of cooperative multi-agent defenders learns to survive against a procedurally generative attacker. The attacker agent learns to produce increasingly challenging configurations of enemy units, dynamically creating novel worlds tailored to exploit the defenders' current weaknesses. Concurrently, the defender team learns cooperative strategies to overcome these generated threats. This co-evolutionary dynamic creates a self-scaling environment where complexity arises organically from the adversarial interaction, providing an effectively infinite stream of novel and relevant training data. We demonstrate that with minimal training, this approach leads to the emergence of complex, intelligent behaviors, such as flanking and shielding by the attacker, and focus-fire and spreading by the defenders. Our findings suggest that adversarial co-evolution is a powerful mechanism for automatically scaling environmental complexity, driving agents towards greater robustness and strategic depth.
comment: Published in the proceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Scaling Environments for Agents (SEA)
Communicating Plans, Not Percepts: Scalable Multi-Agent Coordination with Embodied World Models NeurIPS 2025
Robust coordination is critical for effective decision-making in multi-agent systems, especially under partial observability. A central question in Multi-Agent Reinforcement Learning (MARL) is whether to engineer communication protocols or learn them end-to-end. We investigate this dichotomy using embodied world models. We propose and compare two communication strategies for a cooperative task-allocation problem. The first, Learned Direct Communication (LDC), learns a protocol end-to-end, with agents generating messages and actions concurrently. The second, Intention Communication, uses an engineered inductive bias: a compact, learned world model, the Imagined Trajectory Generation Module (ITGM), to simulate future states. Agents then communicate a summary of this plan. We evaluate these approaches on goal-directed interaction in a grid world, a canonical abstraction for embodied AI problems. Our experiments reveal that while emergent communication is viable in simple settings, the engineered, world model-based approach shows superior performance, sample efficiency, and scalability as complexity increases. These findings advocate for integrating structured, predictive models into MARL agents to enable active, goal-driven coordination.
comment: Published in the Proceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Scaling Environments for Agents (SEA). Additionally accepted for presentation in the NeurIPS 2025 Workshop: Embodied World Models for Decision Making (EWM) and the NeurIPS 2025 Workshop: Optimization for Machine Learning (OPT)
Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration ACL'2025
Grounding the reasoning ability of large language models (LLMs) for embodied tasks is challenging due to the complexity of the physical world. Especially, LLM planning for multi-agent collaboration requires communication of agents or credit assignment as the feedback to re-adjust the proposed plans and achieve effective coordination. However, existing methods that overly rely on physical verification or self-reflection suffer from excessive and inefficient querying of LLMs. In this paper, we propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans. Specifically, we perform critic regression to learn a sequential advantage function from LLM-planned data, and then treat the LLM planner as an optimizer to generate actions that maximize the advantage function. It endows the LLM with the foresight to discern whether the action contributes to accomplishing the final task. We provide theoretical analysis by extending advantage-weighted regression in reinforcement learning to multi-agent systems. Experiments on Overcooked-AI and a difficult variant of RoCoBench show that ReAd surpasses baselines in success rate, and also significantly decreases the interaction steps of agents and query rounds of LLMs, demonstrating its high efficiency for grounding LLMs. More results are given at https://embodied-read.github.io
comment: accepted by ACL'2025
Asymmetric Information Enhanced Mapping Framework for Multirobot Exploration based on Deep Reinforcement Learning
Despite the great development of multirobot technologies, efficiently and collaboratively exploring an unknown environment is still a big challenge. In this paper, we propose AIM-Mapping, a Asymmetric InforMation Enhanced Mapping framework. The framework fully utilizes the privilege information in the training process to help construct the environment representation as well as the supervised signal in an asymmetric actor-critic training framework. Specifically, privilege information is used to evaluate the exploration performance through an asymmetric feature representation module and a mutual information evaluation module. The decision-making network uses the trained feature encoder to extract structure information from the environment and combines it with a topological map constructed based on geometric distance. Utilizing this kind of topological map representation, we employ topological graph matching to assign corresponding boundary points to each robot as long-term goal points. We conduct experiments in real-world-like scenarios using the Gibson simulation environments. It validates that the proposed method, when compared to existing methods, achieves great performance improvement.
StorySage: Conversational Autobiography Writing Powered by a Multi-Agent Framework
Every individual carries a unique and personal life story shaped by their memories and experiences. However, these memories are often scattered and difficult to organize into a coherent narrative, a challenge that defines the task of autobiography writing. Existing conversational writing assistants tend to rely on generic user interactions and pre-defined guidelines, making it difficult for these systems to capture personal memories and develop a complete biography over time. We introduce StorySage, a user-driven software system designed to meet the needs of a diverse group of users that supports a flexible conversation and a structured approach to autobiography writing. Powered by a multi-agent framework composed of an Interviewer, Session Scribe, Planner, Section Writer, and Session Coordinator, our system iteratively collects user memories, updates their autobiography, and plans for future conversations. In experimental simulations, StorySage demonstrates its ability to navigate multiple sessions and capture user memories across many conversations. User studies (N=28) highlight how StorySage maintains improved conversational flow, narrative completeness, and higher user satisfaction when compared to a baseline. In summary, StorySage contributes both a novel architecture for autobiography writing and insights into how multi-agent systems can enhance human-AI creative partnerships.
Trajectory Encryption Cooperative Salvo Guidance
This paper introduces the concept of trajectory encryption in cooperative simultaneous target interception, wherein heterogeneity in guidance principles across a team of unmanned autonomous systems is leveraged as a strategic design feature. By employing a mix of heterogeneous time-to-go formulations leading to a cooperative guidance strategy, the swarm of vehicles is able to generate diverse trajectory families. This diversity expands the feasible solution space for simultaneous target interception, enhances robustness under disturbances, and enables flexible time-to-go adjustments without predictable detouring. From an adversarial perspective, heterogeneity obscures the collective interception intent by preventing straightforward prediction of swarm dynamics, effectively acting as an encryption layer in the trajectory domain. Simulations demonstrate that the swarm of heterogeneous vehicles is able to intercept a moving target simultaneously from a diverse set of initial engagement configurations.
K-Dense Analyst: Towards Fully Automated Scientific Analysis
The complexity of modern bioinformatics analysis has created a critical gap between data generation and developing scientific insights. While large language models (LLMs) have shown promise in scientific reasoning, they remain fundamentally limited when dealing with real-world analytical workflows that demand iterative computation, tool integration and rigorous validation. We introduce K-Dense Analyst, a hierarchical multi-agent system that achieves autonomous bioinformatics analysis through a dual-loop architecture. K-Dense Analyst, part of the broader K-Dense platform, couples planning with validated execution using specialized agents to decompose complex objectives into executable, verifiable tasks within secure computational environments. On BixBench, a comprehensive benchmark for open-ended biological analysis, K-Dense Analyst achieves 29.2% accuracy, surpassing the best-performing language model (GPT-5) by 6.3 percentage points, representing nearly 27% improvement over what is widely considered the most powerful LLM available. Remarkably, K-Dense Analyst achieves this performance using Gemini 2.5 Pro, which attains only 18.3% accuracy when used directly, demonstrating that our architectural innovations unlock capabilities far beyond the underlying model's baseline performance. Our insights demonstrate that autonomous scientific reasoning requires more than enhanced language models, it demands purpose-built systems that can bridge the gap between high-level scientific objectives and low-level computational execution. These results represent a significant advance toward fully autonomous computational biologists capable of accelerating discovery across the life sciences.
Learning to Speak on Behalf of a Group: Medium Access Control for Sending a Shared Message
The rapid development of Internet of Things (IoT) technologies has not only enabled new applications, but also presented new challenges for reliable communication with limited resources. In this work, we define a novel problem that can arise in these scenarios, in which a set of sensors need to communicate a joint observation. This observation is shared by a random subset of the nodes, which need to propagate it to the rest of the network, but coordination is complex: as signaling constraints require the use of random access schemes over shared channels, sensors need to implicitly coordinate, so that at least one transmission gets through without collisions. Unlike the majority of existing medium access schemes, the goal is to make sure that the shared message gets through, regardless of the sender. We analyze this coordination problem theoretically and provide low-complexity solutions. While a clustering-based approach is near-optimal if the sensors have prior knowledge, we provide a distributed multi-armed bandit (MAB) solution for the more general case and validate it by simulation.
comment: Published at IEEE Communications Letters 2022
Systems and Control (CS)
Data-Driven Optimal Power Flow: A Behavioral Systems Approach
The increasing decentralization of power systems driven by a large number of renewable energy sources poses challenges in power flow optimization. Partially unknown power line properties can render model-based approaches unsuitable. With increasing deployment of sensors, data-driven methods rise as a promising alternative. They offer the flexibility to adapt to varying grid structures and unknown line properties. In this paper, we propose a novel data-driven representation of nonlinear power flow equations for radial grids based on Willems' Fundamental Lemma. The approach allows for direct integration of input/output data into power flow optimisation, enabling cost minimization and constraint enforcement without requiring explicit knowledge of the electrical properties or the topology of the grid. Moreover, we formulate a convex relaxation to ensure compatibility with state-of-the-art solvers. In a numerical case study, we demonstrate that the novel approach performs similar to state-of-the-art methods, without the need for an explicit system identification step.
Data-Driven Resilience Assessment against Sparse Sensor Attacks
We present a data-driven framework for assessing the attack resilience of linear time-invariant systems against malicious false data injection sensor attacks. Based on the concept of sparse observability, data-driven resilience metrics are proposed. First, we derive a data-driven necessary and sufficient condition for assessing the system's resilience against sensor attacks, using data collected without any attacks. If we obtain attack-free data that satisfy a specific rank condition, we can exactly evaluate the attack resilience level even in a model-free setting. We then extend this analysis to a scenario where only poisoned data are available. Given the poisoned data, we can only conservatively assess the system's resilience. In both scenarios, we also provide polynomial-time algorithms to assess the system resilience under specific conditions. Finally, numerical examples illustrate the efficacy and limitations of the proposed framework.
Safety-Critical Input-Constrained Nonlinear Intercept Guidance in Multiple Engagement Zones
This paper presents an input-constrained nonlinear guidance law to address the problem of intercepting a stationary target in contested environments with multiple defending agents. Contrary to prior approaches that rely on explicit knowledge of defender strategies or utilize conservative safety conditions based on a defender's range, our work characterizes defender threats geometrically through engagement zones that delineate inevitable interception regions. Outside these engagement zones, the interceptor remains invulnerable. The proposed guidance law switches between a repulsive safety maneuver near these zones and a pursuit maneuver outside their influence. To deal with multiple engagement zones, we employ a smooth minimum function (log-sum-exponent approximation) that aggregates threats from all the zones while prioritizing the most critical threats. Input saturation is modeled and embedded in the non-holonomic vehicle dynamics so the controller respects actuator limits while maintaining stability. Numerical simulations with several defenders demonstrate the proposed method's ability to avoid engagement zones and achieve interception across diverse initial conditions.
Real-Time Estimation of Equivalent Series Resistance for Predicting Output Capacitor Failures in Boost Converters
Output capacitors are among the most critical components in power electronic converters, as their degradation directly affects system stability and reliability. A key indicator of capacitor health is the Equivalent Series Resistance (ESR), whose progressive increase is strongly correlated with aging and imminent failure. This paper presents a real-time technique for estimating the ESR of output capacitors in boost converters, with the aim of enabling early fault prediction and condition-based maintenance. The proposed method leverages online parameter estimation to extract ESR values without interrupting converter operation. The estimation algorithm has been implemented on a low-cost STM32 Nucleo platform, demonstrating both computational efficiency and suitability for embedded applications. Experimental validation confirms that the approach provides accurate ESR tracking under varying load and operating conditions, allowing timely detection of abnormal capacitor behavior and preventing unexpected system failures.
comment: 5 pages, 5 figures
MARLIN: Multi-Agent Reinforcement Learning with Murmuration Intelligence and LLM Guidance for Reservoir Management
As climate change intensifies extreme weather events, water disasters pose growing threats to global communities, making adaptive reservoir management critical for protecting vulnerable populations and ensuring water security. Modern water resource management faces unprecedented challenges from cascading uncertainties propagating through interconnected reservoir networks. These uncertainties, rooted in physical water transfer losses and environmental variability, make precise control difficult. For example, sending 10 tons downstream may yield only 8-12 tons due to evaporation and seepage. Traditional centralized optimization approaches suffer from exponential computational complexity and cannot effectively handle such real-world uncertainties, while existing multi-agent reinforcement learning (MARL) methods fail to achieve effective coordination under uncertainty. To address these challenges, we present MARLIN, a decentralized reservoir management framework inspired by starling murmurations intelligence. Integrating bio-inspired alignment, separation, and cohesion rules with MARL, MARLIN enables individual reservoirs to make local decisions while achieving emergent global coordination. In addition, a LLM provides real-time reward shaping signals, guiding agents to adapt to environmental changes and human-defined preferences. Experiments on real-world USGS data show that MARLIN improves uncertainty handling by 23\%, cuts computation by 35\%, and accelerates flood response by 68\%, exhibiting super-linear coordination, with complexity scaling 5.4x from 400 to 10,000 nodes. These results demonstrate MARLIN's potential for disaster prevention and protecting communities through intelligent, scalable water resource management.
Real-Time Power electronics Control and Monitoring with TI F28379D DSC and GUI Composer
This paper details the implementation and experimental validation of a real-time control system for a three-phase induction motor using the Texas Instruments TMS320F28379D microcontroller. The system integrates pulse-width modulation (PWM) generation, analog-to-digital conversion (ADC), digital-to-analog conversion (DAC), and quadrature encoder feedback to facilitate precise control under various strategies. A current sensing solution based on the AMC1301 isolation amplifier and shunt resistor ensures accurate and safe current measurement for feedback loops. Two control algorithms, V/f and Field-Oriented Control (FOC) are implemented and tested. Real-time parameter tuning and data visualization are achieved using GUI Composer, enabling efficient system debugging and interaction. Experimental results demonstrate smooth speed reversal, fast dynamic response, and stable performance under both step and multi-step inputs. While GUI Composer effectively supports general monitoring and control, limitations in signal bandwidth are noted compared to professional-grade platforms. The results confirm the robustness and effectiveness of the implemented control strategies for high-performance induction motor applications.
comment: 10 pages
Spectral Flow Learning Theory: Finite-Sample Guarantees for Vector-Field Identification
We study the identification of continuous-time vector fields from irregularly sampled trajectories. We introduce Spectral Flow Learning (SFL), which learns in a windowed flow space using a lag-linear label operator that aggregates lagged Koopman actions. We provide finite-sample high-probability (FS-HP) guarantees for the class of variable-step linear multistep methods (vLLM). The FS-HP rates are constructed using spectral regularization with qualification-controlled filters for flow predictors under standard source and filter assumptions. A multistep observability inequality links flow error to vector-field error and yields two-term bounds that combine a statistical rate with an explicit discretization bias from vLMM theory. This preliminary preprint states the results and sketches proofs, with full proofs and extensions deferred to a journal version.
Path Diffuser: Diffusion Model for Data-Driven Traffic Simulator
Simulating diverse and realistic traffic scenarios is critical for developing and testing autonomous planning. Traditional rule-based planners lack diversity and realism, while learning-based simulators often replay, forecast, or edit scenarios using historical agent trajectories. However, they struggle to generate new scenarios, limiting scalability and diversity due to their reliance on fully annotated logs and historical data. Thus, a key challenge for a learning-based simulator's performance is that it requires agents' past trajectories and pose information in addition to map data, which might not be available for all agents on the road.Without which, generated scenarios often produce unrealistic trajectories that deviate from drivable areas, particularly under out-of-distribution (OOD) map scenes (e.g., curved roads). To address this, we propose Path Diffuser (PD): a two-stage, diffusion model for generating agent pose initializations and their corresponding trajectories conditioned on the map, free of any historical context of agents' trajectories. Furthermore, PD incorporates a motion primitive-based prior, leveraging Frenet frame candidate trajectories to enhance diversity while ensuring road-compliant trajectory generation. We also explore various design choices for modeling complex multi-agent interactions. We demonstrate the effectiveness of our method through extensive experiments on the Argoverse2 Dataset and additionally evaluate the generalizability of the approach on OOD map variants. Notably, Path Diffuser outperforms the baseline methods by 1.92x on distribution metrics, 1.14x on common-sense metrics, and 1.62x on road compliance from adversarial benchmarks.
Coordinated vs. Sequential Transmission Planning
Coordinated planning of generation, storage, and transmission more accurately captures the interactions among these three capacity types necessary to meet electricity demand, at least in theory. However, in practice, U.S. system operators typically follow a sequential planning approach: They first determine future generation and storage additions based on an assumed unconstrained (`copper plate') system. Next, they perform dispatch simulations of this projected generation and storage capacity mix on the existing transmission grid to identify transmission constraint violations. These violations indicate the need for transmission upgrades. We describe a multistage, multi-locational planning model that co-optimizes generation, storage, and transmission investments. The model respects reliability constraints as well as state energy and climate policies. We test the two planning approaches using a current stakeholder-informed 20-zone model of the PJM region, developed for the current FERC Order No. 1920 compliance filing process. In our most conservative model specification, we find that the co-optimized approach estimates 67% lower transmission upgrade needs than the sequential model, leading to total system costs that are .6% lower and similar reliability and climate outcomes. Our sensitivities show larger transmission and cost savings and reliability and climate benefits from co-optimized planning.
comment: 10 pages
Physics-informed learning under mixing: How physical knowledge speeds up learning
A major challenge in physics-informed machine learning is to understand how the incorporation of prior domain knowledge affects learning rates when data are dependent. Focusing on empirical risk minimization with physics-informed regularization, we derive complexity-dependent bounds on the excess risk in probability and in expectation. We prove that, when the physical prior information is aligned, the learning rate improves from the (slow) Sobolev minimax rate to the (fast) optimal i.i.d. one without any sample-size deflation due to data dependence.
Event-Driven Control via Sparsity-Promoting Regularization: A Rollout Approach with Performance Guarantees
This paper presents a controller design method aiming to balance control performance and actuation frequency. In this framework, actuation timings are determined in an event-driven manner, resulting in intermittent control actions. Control performance is measured by an infinite-horizon average cost, and input sparsity is promoted through actuation-rate regularization. Since the formulated optimal control problem has a combinatorial nature, we employ a rollout algorithm to find a suboptimal solution. In the proposed scheme, the event-triggering condition is derived through a multistage minimization procedure using a receding-horizon approach. We establish theoretical performance bounds with respect to periodic control and prove the stability of the closed-loop system. The effectiveness of the proposed method is demonstrated through a numerical example.
comment: 14 pages
Position estimation based on UWB swarm optimization and comparison against traditional trilateration
Ultra-wideband (UWB) is a promising technology for indoor position estimation for various localization applications of object swarms, such as in 3D analysis of human movement with multiple on-body sensors or a swarm of drones in an indoor environment. However, most UWB-only position estimation methods are based on a star topology, where the position of a mobile node is estimated using distances from several fixed anchors. These approaches ignore the valuable inter-node distance estimates, possible in a fully-connected 'Swarm' topology, which could provide more redundancy in the set of available distance estimates used for the position estimation. This would improve the accuracy and consistency of the position estimates. Also, published studies do not analyze how input measurement errors affect the final position estimates, which makes it difficult to assess the reliability under varying conditions. Therefore, this study first proposes a UWB swarm optimization-based position estimation method that utilizes all available internode distances to enhance accuracy and compare against the traditional trilateration method that utilizes the star configuration. All validations were done with synthetic UWB data, to enable testing all input error situations. The comprehensive error sensitivity analysis was conducted to evaluate its robustness under varying noise conditions. The proposed method consistently outperformed trilateration, with position estimation error around 5.7 cm for realistic UWB distance input estimates, while for higher noise conditions, the proposed method had errors around 6 cm lower than the trilateration method, which had position estimation errors around 19 cm. This study demonstrates the general potential of the Swarm optimization-based method for position estimation as a more accurate and consistent alternative to traditional star-based trilateration methods.
comment: Submitted to journal and Manuscript under review
Stabilizing Humanoid Robot Trajectory Generation via Physics-Informed Learning and Control-Informed Steering IROS
Recent trends in humanoid robot control have successfully employed imitation learning to enable the learned generation of smooth, human-like trajectories from human data. While these approaches make more realistic motions possible, they are limited by the amount of available motion data, and do not incorporate prior knowledge about the physical laws governing the system and its interactions with the environment. Thus they may violate such laws, leading to divergent trajectories and sliding contacts which limit real-world stability. We address such limitations via a two-pronged learning strategy which leverages the known physics of the system and fundamental control principles. First, we encode physics priors during supervised imitation learning to promote trajectory feasibility. Second, we minimize drift at inference time by applying a proportional-integral controller directly to the generated output state. We validate our method on various locomotion behaviors for the ergoCub humanoid robot, where a physics-informed loss encourages zero contact foot velocity. Our experiments demonstrate that the proposed approach is compatible with multiple controllers on a real robot and significantly improves the accuracy and physical constraint conformity of generated trajectories.
comment: This paper has been accepted for publication at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hangzhou, China, 2025
Hierarchical Analysis and Control of Epidemic Spreading over Networks using Dissipativity and Mesh Stability
Analyzing and controlling spreading processes are challenging problems due to the involved non-linear node (subsystem) dynamics, unknown disturbances, complex interconnections, and the large-scale and multi-level nature of the problems. The dissipativity concept provides a practical framework for addressing such concerns, thanks to the energy-based representation it offers for subsystems and the compositional properties it provides for the analysis and control of interconnected (networked) systems comprised of such subsystems. Therefore, in this paper, we utilize the dissipativity concept to analyze and control a spreading process that occurs over a hierarchy of nodes, groups, and a network (i.e., a spreading network). We start by generalizing some existing results on dissipativity-based topology design for networked systems. Next, we model the considered spreading network as a networked system and establish the dissipativity properties of its nodes. The generalized topology design method is then applied at multiple levels of the considered spreading network to formulate its analysis and control problems as Linear Matrix Inequality (LMI) problems. We identify and enforce localized necessary conditions to support the feasibility of the LMI problem solved at each subsequent hierarchical level of the spreading network. Consequently, the proposed method does not involve iterative multi-level optimization stages that are computationally inefficient. The proposed control solution ensures that the spreading network is not only stable but also dissipative and mesh-stable. Compared to conventional methods, such as threshold pruning and high-degree edge removal, our approach offers superior performance in terms of infection containment, control efficiency, and disturbance robustness. Extensive numerical results demonstrate the effectiveness of the proposed technique.
comment: To be submitted to American Control Conference 2026
Hill-Type Stability Analysis of Periodic Solutions of Fractional-Order Differential Equations
This paper explores stability properties of periodic solutions of (nonlinear) fractional-order differential equations (FODEs). As classical Caputo-type FODEs do not admit exactly periodic solutions, we propose a framework of Liouville-Weyl-type FODEs, which do admit exactly periodic solutions and are an extension of Caputo-type FODEs. Local linearization around a periodic solution results in perturbation dynamics governed by a linear time-periodic differential equation. In the classical integer-order case, the perturbation dynamics is therefore described by Floquet theory, i.e. the exponential growth or decay of perturbations is expressed by Floquet exponents which can be assessed using the Hill matrix approach. For fractional-order systems, however, a rigorous Floquet theory is lacking. Here, we explore the limitations when trying to extend Floquet theory and the Hill matrix method to linear time-periodic fractional-order differential equations (LTP-FODEs) as local linearization of nonlinear fractional-order systems. A key result of the paper is that such an extended Floquet theory can only assess exponentially growing solutions of LTP-FODEs. Moreover, we provide an analysis of linear time-invariant fractional-order systems (LTI-FODEs) with algebraically decaying solutions and show that the inaccessibility of decaying solutions through Floquet theory is already present in the time-invariant case.
comment: 21 pages, 6 figures
Model-Free Dynamic Consensus in Multi-Agent Systems: A Q-Function Perspective
This paper presents a new method for achieving dynamic consensus in linear discrete-time homogeneous multi-agent systems (MAS) with marginally stable or unstable dynamics. The guarantee of consensus in this setting involves a set of constraints based on the graph's spectral properties, complicating the design of the coupling gains. This challenge intensifies for large-scale systems with diverse graph Laplacian spectra. The proposed approach reformulates the dynamic consensus problem with a prescribed convergence rate using a state-action value function framework inspired by optimal control theory. Specifically, a synthetic linear quadratic regulation (LQR) formulation is introduced to encode the consensus objective, enabling its translation into a convex semidefinite programming (SDP) problem. The resulting SDP is applicable in both model-based and model-free settings for jointly designing the local feedback and coupling gains. To handle the inherent non-convex feasibility conditions, a convex-concave decomposition strategy is employed. Adaptation of the method in a completely model-free set-up eliminates the need for system identification or knowledge of the agents' dynamics. Instead, it relies on input-state data collection and offers an entirely data-driven equivalent SDP formulation. Finally, a new algorithm balancing feasibility, convergence rate, robustness, and energy efficiency, is established to provide design flexibility. Numerical simulations validate the method's effectiveness in various scenarios.
PhysiAgent: An Embodied Agent Framework in Physical World
Vision-Language-Action (VLA) models have achieved notable success but often struggle with limited generalizations. To address this, integrating generalized Vision-Language Models (VLMs) as assistants to VLAs has emerged as a popular solution. However, current approaches often combine these models in rigid, sequential structures: using VLMs primarily for high-level scene understanding and task planning, and VLAs merely as executors of lower-level actions, leading to ineffective collaboration and poor grounding challenges. In this paper, we propose an embodied agent framework, PhysiAgent, tailored to operate effectively in physical environments. By incorporating monitor, memory, self-reflection mechanisms, and lightweight off-the-shelf toolboxes, PhysiAgent offers an autonomous scaffolding framework to prompt VLMs to organize different components based on real-time proficiency feedback from VLAs to maximally exploit VLAs' capabilities. Experimental results demonstrate significant improvements in task-solving performance on complex real-world robotic tasks, showcasing effective self-regulation of VLMs, coherent tool collaboration, and adaptive evolution of the framework during execution. PhysiAgent makes practical and pioneering efforts to integrate VLMs and VLAs, effectively grounding embodied agent frameworks in real-world settings.
Autonomous Detection and Coverage of Unknown Target Areas by Multi-Agent Systems
This paper presents a novel coverage control algorithm for multi-agent systems, where each agent has no prior knowledge of the specific region to be covered. The proposed method enables agents to autonomously detect the target area and collaboratively achieve full coverage. Once an agent detects a part of the target region within its sensor range, a dynamically constructed density function is generated to attract nearby agents. By integrating this density-driven mechanism with Centroidal Voronoi Tessellation (CVT), the agents are guided to achieve optimal spatial distribution. Additionally, Control Barrier Functions (CBFs) are employed to ensure collision avoidance and maintain non-overlapping sensor coverage, enhancing both safety and efficiency. Simulation results verify that agents can independently locate and effectively cover the target area.
comment: 8 pages, 9 figures
Small-Covariance Noise-to-State Stability of Stochastic Systems and Its Applications to Stochastic Gradient Dynamics
This paper studies gradient dynamics subject to additive random noise, which may arise from sources such as stochastic gradient estimation, measurement noise, or stochastic sampling errors. To analyze the robustness of such stochastic gradient systems, the concept of small-covariance noise-to-state stability (NSS) is introduced, along with a Lyapunov-based characterization. Furthermore, the classical Polyak-Lojasiewicz (PL) condition on the objective function is generalized to the $\mathcal{K}$-PL condition via comparison functions, thereby extending its applicability to a broader class of optimization problems. It is shown that the stochastic gradient dynamics exhibit small-covariance NSS if the objective function satisfies the $\mathcal{K}$-PL condition and possesses a globally Lipschitz continuous gradient. This result implies that the trajectories of stochastic gradient dynamics converge to a neighborhood of the optimum with high probability, with the size of the neighborhood determined by the noise covariance. Moreover, if the $\mathcal{K}$-PL condition is strengthened to a $\mathcal{K}_\infty$-PL condition, the dynamics are NSS; whereas if it is weakened to a general positive-definite-PL condition, the dynamics exhibit integral NSS. The results further extend to objectives without globally Lipschitz gradients through appropriate step-size tuning. The proposed framework is further applied to the robustness analysis of policy optimization for the linear quadratic regulator (LQR) and logistic regression.
comment: 25 pages, 1 figure
Towards Tighter Convex Relaxation of Mixed-integer Programs: Leveraging Logic Network Flow for Task and Motion Planning
This paper proposes an optimization-based task and motion planning framework, named "Logic Network Flow", that integrates temporal logic specifications into mixed-integer programs for efficient robot planning. Inspired by the Graph-of-Convex-Sets formulation, temporal predicates are encoded as polyhedron constraints on each edge of a network flow model, instead of as constraints between nodes in traditional Logic Tree formulations. We further propose a network-flow-based Fourier-Motzkin elimination procedure that removes continuous flow variables while preserving convex relaxation tightness, leading to provably tighter convex relaxations and fewer constraints than Logic Tree formulations. For temporal logic motion planning with piecewise-affine dynamic systems, comprehensive experiments across vehicle routing, multi-robot coordination, and temporal logic control on dynamical systems using point mass and linear inverted pendulum models demonstrate computational speedups of up to several orders of magnitude. Hardware demonstrations with quadrupedal robots validate real-time replanning capabilities under dynamically changing environmental conditions. The project website is at https://logicnetworkflow.github.io/.
comment: 35 pages, 17 figures, 7 tables
Multi-Agent Guided Policy Search for Non-Cooperative Dynamic Games
Multi-agent reinforcement learning (MARL) optimizes strategic interactions in non-cooperative dynamic games, where agents have misaligned objectives. However, data-driven methods such as multi-agent policy gradients (MA-PG) often suffer from instability and limit-cycle behaviors. Prior stabilization techniques typically rely on entropy-based exploration, which slows learning and increases variance. We propose a model-based approach that incorporates approximate priors into the reward function as regularization. In linear quadratic (LQ) games, we prove that such priors stabilize policy gradients and guarantee local exponential convergence to an approximate Nash equilibrium. We then extend this idea to infinite-horizon nonlinear games by introducing Multi-agent Guided Policy Search (MA-GPS), which constructs short-horizon local LQ approximations from trajectories of current policies to guide training. Experiments on nonlinear vehicle platooning and a six-player strategic basketball formation show that MA-GPS achieves faster convergence and more stable learning than existing MARL methods.
SDC-Based Model Predictive Control: Enhancing Computational Feasibility for Safety-Critical Quadrotor Control
Nonlinear Model Predictive Control (NMPC) is widely used for controlling high-speed robotic systems such as quadrotors. However, its significant computational demands often hinder real-time feasibility and reliability, particularly in environments requiring robust obstacle avoidance. This paper proposes a novel SDC-Based Model Predictive Control (MPC) framework, which preserves the high-precision performance of NMPC while substantially reducing computational complexity by over 30%. By reformulating the nonlinear quadrotor dynamics through the State-Dependent Coefficient (SDC) method, the original nonlinear program problem is transformed into a sequential quadratic optimization problem. The controller integrates an integral action to eliminate steady-state tracking errors and imposes constraints for safety-critical obstacle avoidance. Additionally, a disturbance estimator is incorporated to enhance robustness against external perturbations. Simulation results demonstrate that the SDC-Based MPC achieves comparable tracking accuracy to NMPC, with greater efficiency in terms of computation times, thereby improving its suitability for real-time applications. Theoretical analysis further establishes the stability and recursive feasibility of the proposed approach.
comment: This is the initial version
Learning Hybrid Dynamics via Convex Optimizations
This paper investigates the problem of identifying state-dependent switching systems, a class of hybrid dynamical systems that combine multiple linear or nonlinear modes. We propose two broad classes of switching systems: switching linear systems (SLSs) and switching polynomial systems (SPSs). We first formulate the joint estimation of the mode dynamics and switching rules as a mixed integer program. To solve its inherent scalability issue, we develop a hierarchy of convex relaxations and establish a bound and conditions under which these relaxations are tight. Building on these results, we propose a bilevel convex optimization framework that alternates between mode assignment and dynamics estimation, and we recover switching boundaries using margin-based polynomial classifiers. Numerical experiments on both linear and nonlinear oscillators demonstrate that the method accurately identifies mode dynamics and reconstructs switching surfaces from trajectory data. Our results provide a tractable optimization-based framework for switching system identification.
comment: 8 pages, 4 figures
A General Theory of Emergent Linearity in Complex Dynamical Systems: The Role of Spatial Averaging and Vanishing Correlations
Various natural and engineered systems, from urban traffic flow to the human brain, have been described by large-scale networked dynamical systems. Despite their vast differences, these systems are often similar in being comprised of numerous microscopic subsystems with complex nonlinear dynamics and interactions that give rise to diverse emergent macroscopic behaviors. As such, a long-standing question across various fields has been to understand why and how various forms of macroscopic behavior emerge from underlying microscopic dynamics. Motivated by a growing body of empirical observations, in this work we focus on linearity as one of the most fundamental aspects of system dynamics, and develop a general theoretical framework for the interplay between spatial averaging, decaying microscopic correlations, and emergent macroscopic linearity. Using and extending the theory of mixing sequences, we show that in a broad class of autonomous nonlinear networked systems, the dynamics of the average of all subsystems' states becomes asymptotically linear as the number of subsystems grows to infinity, provided that (in addition to technical assumptions) pairwise correlations between subsystems decay to 0 as their pairwise distance grows to infinity. We prove this result when the latter distance is between subsystems' linear indices or spatial locations, and provide extensions to linear time-invariant (LTI) limit dynamics, finite-sample analysis of rates of convergence, and networks of spatially-embedded subsystems with random locations. To our knowledge, this work is the first rigorous analysis of macroscopic linearity in large-scale heterogeneous networked systems, and provides a solid foundation for further theoretical and empirical analyses in various domains of science and engineering.
Integrator Forwading Design for Unicycles with Constant and Actuated Velocity in Polar Coordinates
In a companion paper, we present a modular framework for unicycle stabilization in polar coordinates that provides smooth steering laws through backstepping. Surprisingly, the same problem also allows the application of integrator forwarding. In this work, we leverage this feature and construct new smooth steering laws together with control Lyapunov functions (CLFs), expanding the set of CLFs available for inverse optimal control design. In the case of constant forward velocity (Dubins car), backstepping produces finite-time (deadbeat) parking, and we show that integrator forwarding yields the very same class of solutions. This reveals a fundamental connection between backstepping and forwarding in addressing both the unicycle and, the Dubins car parking problems.
Modular Design of Strict Control Lyapunov Functions for Global Stabilization of the Unicycle in Polar Coordinates
Since the mid-1990s, it has been known that, unlike in Cartesian form where Brockett's condition rules out static feedback stabilization, the unicycle is globally asymptotically stabilizable by smooth feedback in polar coordinates. In this note, we introduce a modular framework for designing smooth feedback laws that achieve global asymptotic stabilization in polar coordinates. These laws are bidirectional, enabling efficient parking maneuvers, and are paired with families of strict control Lyapunov functions (CLFs) constructed in a modular fashion. The resulting CLFs guarantee global asymptotic stability with explicit convergence rates and include barrier variants that yield "almost global" stabilization, excluding only zero-measure subsets of the rotation manifolds. The strictness of the CLFs is further leveraged in our companion paper, where we develop inverse-optimal redesigns with meaningful cost functions and infinite gain margins.
Inverse Optimal Feedback and Gain Margins for Unicycle Stabilization
The recent development of globally strict control Lyapunov functions (CLFs) for the challenging unicycle parking problem provides a foundation for pursuing optimality. We address this in the inverse optimal framework, thereby avoiding the need to solve the Hamilton$\unicode{x2013}$Jacobi$\unicode{x2013}$Bellman (HJB) equations, and establish a general result that is optimal with respect to a meaningful cost. We present several design examples that impose varying levels of penalty on the control effort, including arbitrarily bounded control. Furthermore, we show that the inverse optimal controller possesses an infinite gain margin thanks to the system being driftless, and leveraging this property, we extend the design to an adaptive controller that handles model uncertainty. Finally, we compare the performance of the non-adaptive inverse optimal controller with its adaptive counterpart.
comment: Submitted to ACC 2026
Exhaustive-Serve-Longest Control for Multi-robot Scheduling Systems
We study online task allocation for multi-robot, multi-queue systems with stochastic arrivals and switching delays. Time is slotted; each location can host at most one robot per slot; service consumes one slot; switching between locations incurs a one-slot travel delay; and arrivals are independent Bernoulli processes. We formulate a discounted-cost Markov decision process and propose Exhaustive-Serve-Longest (ESL), a simple real-time policy that serves exhaustively when the current location is nonempty and, when idle, switches to a longest unoccupied nonempty location, and we prove the optimality of this policy. As baselines, we tune a fixed-dwell cyclic policy via a discrete-time delay expression and implement a first-come-first-serve policy. Across server-to-location ratios and loads, ESL consistently yields lower discounted holding cost and smaller mean queue lengths, with action-time fractions showing more serving and restrained switching. Its simplicity and robustness make ESL a practical default for real-time multi-robot scheduling systems.
From Legacy to Leadership Intelligent Radio Network Planning Framework for Cell-Free Massive MIMO in B5G6G Era
The proliferation of cell-free Massive MIMO represents a transformative shift in wireless network architecture, addressing critical limitations of conventional distributed Massive MIMO systems. This paper presents an intelligent radio network planning framework that bridges legacy 5G infrastructures with future B5G/6G networks through cell-free architectures. By leveraging operational insights from existing 5G deployments, we systematically address coverage optimization, and capacity enhancement. Our scalable framework enables seamless evolution from legacy designs to next-generation cell-free systems. Through extensive simulations in dense urban environments, we demonstrate substantial improvements: 45% spectral efficiency gains, 30% interference reduction, and significantly enhanced uniform coverage. The proposed framework provides network operators with a practical roadmap for transitioning from traditional cellular architectures to demanding B5G/6G requirements while maximizing existing infrastructure investments.
comment: 6 pages, 12 figures, 4 tables, IEEE ETCM 2025 Conference Preprint
Spatiotemporal Forecasting of Incidents and Congestion with Implications for Sustainable Traffic Control
Urban traffic anomalies such as collisions and disruptions threaten the safety, efficiency, and sustainability of transportation systems. We present a simulation-based framework for modeling, detecting, and predicting such anomalies in urban networks. Using the SUMO platform, we generate reproducible rear-end and intersection crash scenarios with matched baselines, enabling controlled experimentation and comparative evaluation. We record vehicle-level travel time, speed, and emissions for edge and network-level analysis. On this dataset, we develop a hybrid forecasting architecture that combines bidirectional long short-term memory networks with a diffusion convolutional recurrent neural network to capture temporal dynamics and spatial dependencies. Our simulation studies on the Broadway corridor in New York City demonstrate the framework's ability to reproduce consistent incident conditions, quantify their effects, and provide accurate multi-horizon traffic forecasts. Our results highlight the value of combining controlled anomaly generation with deep predictive models to support reproducible evaluation and sustainable traffic management.
Message passing-based inference in an autoregressive active inference agent
We present the design of an autoregressive active inference agent in the form of message passing on a factor graph. Expected free energy is derived and distributed across a planning graph. The proposed agent is validated on a robot navigation task, demonstrating exploration and exploitation in a continuous-valued observation space with bounded continuous-valued actions. Compared to a classical optimal controller, the agent modulates action based on predictive uncertainty, arriving later but with a better model of the robot's dynamics.
comment: 14 pages, 4 figures, to be published in the proceedings of the International Workshop on Active Inference 2025
Environmental Rate Manipulation Attacks on Power Grid Security
The growing complexity of global supply chains has made hardware Trojans a significant threat in sensor-based power electronics. Traditional Trojan designs depend on digital triggers or fixed threshold conditions that can be detected during standard testing. In contrast, we introduce Environmental Rate Manipulation (ERM), a novel Trojan triggering mechanism that activates by monitoring the rate of change in environmental parameters rather than their absolute values. This approach allows the Trojan to remain inactive under normal conditions and evade redundancy and sensor-fusion defenses. We implement a compact 14~$\mu$m$^2$ circuit that measures capacitor charging rates in standard sensor front-ends and disrupts inverter pulse-width modulation PWM signals when a rapid change is induced. Experiments on a commercial Texas Instruments solar inverter demonstrate that ERM can trigger catastrophic driver chip failure. Furthermore, ETAP simulations indicate that a single compromised 100~kW inverter may initiate cascading grid instabilities. The attack's significance extends beyond individual sensors to entire classes of environmental sensing systems common in power electronics, demonstrating fundamental challenges for hardware security.
Fast Energy-Theft Attack on Frequency-Varying Wireless Power without Additional Sensors
With the popularity of wireless charging, energy access protection and cybersecurity are gaining importance, especially in public places. Currently, the most common energy encryption method uses frequency and associated impedance variation. However, we have proven that this method is not reliable, since a hacker can detect the changing frequency and adjust the compensation. However, the previously presented system needed time to follow the updated frequency, while encryption systems may vary the frequency faster to avoid energy theft. Furthermore, the previous system required an additional sensor coil. To solve these problems, we optimized the attack and the associated system, which can intrude and steal energy within 0.2 ms. The key is the elimination of the time-consuming maximum receiver current regulation. Also, we use the main receiving coil rather than any additional sensor antenna to detect the magnetic field. Thus, the new hardware is even simpler. A simulation model and experimental results demonstrate the fast response speed of the attack on encrypted wireless power and steal 65% of the power. Overall, the applicability of the attack is highly improved and leaves less room for hardening the encryption. The results demonstrate that energy access protection needs to be given great attention.
comment: 11 pages, 12 figures
Safe and Stable Control via Lyapunov-Guided Diffusion Models NeurIPS 2025
Diffusion models have made significant strides in recent years, exhibiting strong generalization capabilities in planning and control tasks. However, most diffusion-based policies remain focused on reward maximization or cost minimization, often overlooking critical aspects of safety and stability. In this work, we propose Safe and Stable Diffusion ($S^2$Diff), a model-based diffusion framework that explores how diffusion models can ensure safety and stability from a Lyapunov perspective. We demonstrate that $S^2$Diff eliminates the reliance on both complex gradient-based solvers (e.g., quadratic programming, non-convex solvers) and control-affine structures, leading to globally valid control policies driven by the learned certificate functions. Additionally, we uncover intrinsic connections between diffusion sampling and Almost Lyapunov theory, enabling the use of trajectory-level control policies to learn better certificate functions for safety and stability guarantees. To validate our approach, we conduct experiments on a wide variety of dynamical control systems, where $S^2$Diff consistently outperforms both certificate-based controllers and model-based diffusion baselines in terms of safety, stability, and overall control performance.
comment: Accepted by NeurIPS 2025
Physics-Informed Inductive Biases for Voltage Prediction in Distribution Grids
Voltage prediction in distribution grids is a critical yet difficult task for maintaining power system stability. Machine learning approaches, particularly Graph Neural Networks (GNNs), offer significant speedups but suffer from poor generalization when trained on limited or incomplete data. In this work, we systematically investigate the role of inductive biases in improving a model's ability to reliably learn power flow. Specifically, we evaluate three physics-informed strategies: (i) power-flow-constrained loss functions, (ii) complex-valued neural networks, and (iii) residual-based task reformulation. Using the ENGAGE dataset, which spans multiple low- and medium-voltage grid configurations, we conduct controlled experiments to isolate the effect of each inductive bias and assess both standard predictive performance and out-of-distribution generalization. Our study provides practical insights into which model assumptions most effectively guide learning for reliable and efficient voltage prediction in modern distribution networks.
Secure and Decentralized Peer-to-Peer Energy Transactions using Blockchain Technology
This paper presents an optimal peer-to-peer (P2P) energy transaction mechanism leveraging decentralized blockchain technology to enable a secure and scalable retail electricity market for the increasing penetration of distributed energy resources (DERs). A decentralized bidding strategy is proposed to maximize individual profits while collectively enhancing social welfare. The market design and transaction processes are simulated using the Ethereum testnet, demonstrating the blockchain network's capability to ensure secure, transparent, and sustainable P2P energy trading among DER participants.
A Hybrid Systems Model of Feedback Optimization for Linear Systems: Convergence and Robustness
Feedback optimization algorithms compute inputs to a system using real-time output measurements, which helps mitigate the effects of disturbances. However, existing work often models both system dynamics and computations in either discrete or continuous time, which may not accurately model some applications. In this work, we model linear system dynamics in continuous time, and we model the computations of inputs in discrete time. Therefore, we present a novel hybrid systems model of feedback optimization. We first establish the well-posedness of this hybrid model and establish completeness of solutions while ruling out Zeno behavior. Then we show the state of the system converges exponentially fast to a ball of known radius about a desired goal state. Next we analytically show that this system is robust to perturbations in (i) the values of measured outputs, (ii) the matrices that model the linear time-invariant system, and (iii) the times at which inputs are applied to the system. Simulation results confirm that this approach successfully mitigates the effects of disturbances.
comment: 14 Pages, 2 Figures, submitted to American Control Conference 2026
Communicating Plans, Not Percepts: Scalable Multi-Agent Coordination with Embodied World Models NeurIPS 2025
Robust coordination is critical for effective decision-making in multi-agent systems, especially under partial observability. A central question in Multi-Agent Reinforcement Learning (MARL) is whether to engineer communication protocols or learn them end-to-end. We investigate this dichotomy using embodied world models. We propose and compare two communication strategies for a cooperative task-allocation problem. The first, Learned Direct Communication (LDC), learns a protocol end-to-end, with agents generating messages and actions concurrently. The second, Intention Communication, uses an engineered inductive bias: a compact, learned world model, the Imagined Trajectory Generation Module (ITGM), to simulate future states. Agents then communicate a summary of this plan. We evaluate these approaches on goal-directed interaction in a grid world, a canonical abstraction for embodied AI problems. Our experiments reveal that while emergent communication is viable in simple settings, the engineered, world model-based approach shows superior performance, sample efficiency, and scalability as complexity increases. These findings advocate for integrating structured, predictive models into MARL agents to enable active, goal-driven coordination.
comment: Published in the Proceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Scaling Environments for Agents (SEA). Additionally accepted for presentation in the NeurIPS 2025 Workshop: Embodied World Models for Decision Making (EWM) and the NeurIPS 2025 Workshop: Optimization for Machine Learning (OPT)
Certified Neural Approximations of Nonlinear Dynamics
Neural networks hold great potential to act as approximate models of nonlinear dynamical systems, with the resulting neural approximations enabling verification and control of such systems. However, in safety-critical contexts, the use of neural approximations requires formal bounds on their closeness to the underlying system. To address this fundamental challenge, we propose a novel, adaptive, and parallelizable verification method based on certified first-order models. Our approach provides formal error bounds on the neural approximations of dynamical systems, allowing them to be safely employed as surrogates by interpreting the error bound as bounded disturbances acting on the approximated dynamics. We demonstrate the effectiveness and scalability of our method on a range of established benchmarks from the literature, showing that it significantly outperforms the state-of-the-art. Furthermore, we show that our framework can successfully address additional scenarios previously intractable for existing methods - neural network compression and an autoencoder-based deep learning architecture for learning Koopman operators for the purpose of trajectory prediction.
comment: first and second author contributed equally
Non-Expansive Mappings in Two-Time-Scale Stochastic Approximation: Finite-Time Analysis
Two-time-scale stochastic approximation algorithms are iterative methods used in applications such as optimization, reinforcement learning, and control. Finite-time analysis of these algorithms has primarily focused on fixed point iterations where both time-scales have contractive mappings. In this work, we broaden the scope of such analyses by considering settings where the slower time-scale has a non-expansive mapping. For such algorithms, the slower time-scale can be viewed as a stochastic inexact Krasnoselskii-Mann iteration. We also study a variant where the faster time-scale has a projection step which leads to non-expansiveness in the slower time-scale. We show that the last-iterate mean square residual error for such algorithms decays at a rate $O(1/k^{1/4-\epsilon})$, where $\epsilon>0$ is arbitrarily small. We further establish almost sure convergence of iterates to the set of fixed points. We demonstrate the applicability of our framework by applying our results to minimax optimization, linear stochastic approximation, and Lagrangian optimization.
comment: Submitted to SIAM Journal on Control and Optimization
Can Human Drivers and Connected Autonomous Vehicles Co-exist in Lane-Free Traffic? A Microscopic Simulation Perspective
Recent advancements in connected autonomous vehicle (CAV) technology have sparked growing research interest in lane-free traffic (LFT). LFT envisions a scenario where all vehicles are CAVs, coordinating their movements without lanes to achieve smoother traffic flow and higher road capacity. This potentially reduces congestion without building new infrastructure. However, the transition phase will likely involve non-connected actors such as human-driven vehicles (HDVs) or independent AVs sharing the roads. This raises the question of how LFT performance is impacted when not all vehicles are CAVs, as these non-connected vehicles may prioritize their own benefits over system-wide improvements. This paper addresses this question through microscopic simulation on a ring road, where CAVs follow the potential lines (PL) controller for LFT, while HDVs adhere to a strip-based car-following model. The PL controller is also modified for safe velocities to prevent collisions. The results reveal that even a small percentage of HDVs can significantly disrupt LFT flow: 5% HDVs can reduce LFT's maximum road capacity by 20% and a 40% HDVs nearly halves it, up until 100% HDVs where it drops by nearly 60%. The study also develops an adaptive potential line (APL) controller that forms APL corridors in the surroundings of HDVs. APL shows a peak traffic flow improvement of nearly 10% over the PL controller. The study indicates that a penetration rate of approximately 60% CAVs is required to start observing the major LFT benefits. These findings open a new research direction on minimizing the adverse effects of non-connected vehicles on LFT.
comment: This version corresponds to the final published article in Transportation Research Part C: Emerging Technologies. It incorporates revisions made during peer review, including an improved literature review, clearer methodological descriptions, and explicit consideration of safety and comfort
CommonPower: A Framework for Safe Data-Driven Smart Grid Control
The growing complexity of power system management has led to an increased interest in reinforcement learning (RL). To validate their effectiveness, RL algorithms have to be evaluated across multiple case studies. Case study design is an arduous task requiring the consideration of many aspects, among them the influence of available forecasts and the level of decentralization in the control structure. Furthermore, vanilla RL controllers cannot themselves ensure the satisfaction of system constraints, which makes devising a safeguarding mechanism a necessary task for every case study before deploying the system. To address these shortcomings, we introduce the Python tool CommonPower, the first general framework for the modeling and simulation of power system management tailored towards machine learning. Its modular architecture enables users to focus on specific elements without having to implement a simulation environment. Another unique contribution of CommonPower is the automatic synthesis of model predictive controllers and safeguards. Beyond offering a unified interface for single-agent RL, multi-agent RL, and optimal control, CommonPower includes a training pipeline for machine-learning-based forecasters as well as a flexible mechanism for incorporating feedback of safeguards into the learning updates of RL controllers.
comment: For the corresponding code repository, see https://github.com/TUMcps/commonpower
Is FISHER All You Need in The Multi-AUV Underwater Target Tracking Task?
It is significant to employ multiple autonomous underwater vehicles (AUVs) to execute the underwater target tracking task collaboratively. However, it's pretty challenging to meet various prerequisites utilizing traditional control methods. Therefore, we propose an effective two-stage learning from demonstrations training framework, FISHER, to highlight the adaptability of reinforcement learning (RL) methods in the multi-AUV underwater target tracking task, while addressing its limitations such as extensive requirements for environmental interactions and the challenges in designing reward functions. The first stage utilizes imitation learning (IL) to realize policy improvement and generate offline datasets. To be specific, we introduce multi-agent discriminator-actor-critic based on improvements of the generative adversarial IL algorithm and multi-agent IL optimization objective derived from the Nash equilibrium condition. Then in the second stage, we develop multi-agent independent generalized decision transformer, which analyzes the latent representation to match the future states of high-quality samples rather than reward function, attaining further enhanced policies capable of handling various scenarios. Besides, we propose a simulation to simulation demonstration generation procedure to facilitate the generation of expert demonstrations in underwater environments, which capitalizes on traditional control methods and can easily accomplish the domain transfer to obtain demonstrations. Extensive simulation experiments from multiple scenarios showcase that FISHER possesses strong stability, multi-task performance and capability of generalization.
comment: This paper has been accepted by IEEE Transactions on Mobile Computing. Besides, Guanwen Xie and Jingzehua Xu contributed equally to this work
Guided Reinforcement Learning for Omnidirectional 3D Jumping in Quadruped Robots
Jumping poses a significant challenge for quadruped robots, despite being crucial for many operational scenarios. While optimisation methods exist for controlling such motions, they are often time-consuming and demand extensive knowledge of robot and terrain parameters, making them less robust in real-world scenarios. Reinforcement learning (RL) is emerging as a viable alternative, yet conventional end-to-end approaches lack efficiency in terms of sample complexity, requiring extensive training in simulations, and predictability of the final motion, which makes it difficult to certify the safety of the final motion. To overcome these limitations, this paper introduces a novel guided reinforcement learning approach that leverages physical intuition for efficient and explainable jumping, by combining B\'ezier curves with a Uniformly Accelerated Rectilinear Motion (UARM) model. Extensive simulation and experimental results clearly demonstrate the advantages of our approach over existing alternatives.
Analysis of Hierarchical AoII over Unreliable Channels: A Stochastic Hybrid System Approach
In this work, we generalize the Stochastic Hybrid Systems (SHSs) analysis of traditional AoI to the AoII metric. Hierarchical ageing processes are adopted using the continuous AoII for the first time, where two different hierarchy schemes, i.e., a hybrid of linear ageing processes with different slopes and a hybrid of linear and quadratic ageing processes, are considered. We first modify the main result in \cite[Theorem 1]{yates_age_2020b} to provide a systematic way to analyze the continuous hierarchical AoII over unslotted real-time systems. The closed-form expressions of average hierarchical AoII are obtained based on our Theorem \ref{theorem1} in two typical scenarios with different channel conditions, i.e., an M/M/1/1 queue over noisy channels and two M/M/1/1 queues over collision channels. Moreover, we analyze the stability conditions for two scenarios given that the quadratic ageing process may lead to the absence of stationary solutions. Finally, we compare the average age performance between the classic AoI results and our AoII results in the M/M/1/1 queue, and the effects of different channel parameters on AoII are also evaluated.
comment: 16 pages, 10 figures
Steering the Herd: A Framework for LLM-based Control of Social Learning
Algorithms increasingly serve as information mediators--from social media feeds and targeted advertising to the increasing ubiquity of LLMs. This engenders a joint process where agents combine private, algorithmically-mediated signals with learning from peers to arrive at decisions. To study such settings, we introduce a model of controlled sequential social learning in which an information-mediating planner (e.g. an LLM) controls the information structure of agents while they also learn from the decisions of earlier agents. The planner may seek to improve social welfare (altruistic planner) or to induce a specific action the planner prefers (biased planner). Our framework presents a new optimization problem for social learning that combines dynamic programming with decentralized action choices and Bayesian belief updates. We prove the convexity of the value function and characterize the optimal policies of altruistic and biased planners, which attain desired tradeoffs between the costs they incur and the payoffs they earn from induced agent choices. Notably, in some regimes the biased planner intentionally obfuscates the agents' signals. Even under stringent transparency constraints--information parity with individuals, no lying or cherry-picking, and full observability--we show that information mediation can substantially shift social welfare in either direction. We complement our theory with simulations in which LLMs act as both planner and agents. Notably, the LLM planner in our simulations exhibits emergent strategic behavior in steering public opinion that broadly mirrors the trends predicted, though key deviations suggest the influence of non-Bayesian reasoning consistent with the cognitive patterns of both humans and LLMs trained on human-like data. Together, we establish our framework as a tractable basis for studying the impact and regulation of LLM information mediators.
A Crime/S.I.R. optimal control problem
This paper presents and discusses a mathematical model inspired by control theory to derive optimal public policies for minimizing costs associated with the reduction and control of criminal activity in a population. Specifically, we analyze the optimal control problem \begin{equation*} \min G(u_1, u_2, u_3) = \int_{0}^{t_{\text{F}}} \left( I(t) - R(t) + \frac{B_1}{2} u_1^2(t) + \frac{B_2}{2} u_2^2(t) + \frac{B_3}{2} u_3^2(t) \right) \, dt. \end{equation*} where $I=I(t)$ and $R=R(t)$ satisfies the system of equations \begin{equation*} \left\{ \begin{aligned} \dot{S} &= \Lambda - (1-u_1)SI - \mu S + ((1+u_3)\gamma_2)I + \rho \Omega R,\\ \dot{I} &= (1-u_1)SI - (\mu + \delta_1)I - ((1+u_2)\gamma_1)I - ((1+u_3)\gamma_2)I + (1-\Omega)\rho R,\\ \dot{R} &= ((1+u_2)\gamma_1)I - (\mu + \delta_2 + \rho)R. \end{aligned} \right. \end{equation*} Our approach assumes that the social and economic effects of criminal behavior can be modeled by a dynamic SIR-type system, which serves as a constraint on a cost functional associated with the strategies implemented by government and law enforcement authorities to reduce criminal behavior. Using optimal control theory, the proposed controls, i.e., preventive policies (such as community and social cohesion programs), are expected to have a significant and positive impact on crime reduction, generating opportunities for the most disadvantaged sectors of Cali society and contributing to long-term security. Given that resources to address this problem are limited, this research aims to determine an optimal combination of public interventions and policies that minimize criminality at the lowest possible economic cost, using an SIR model, tools from variational calculus, and optimal control theory.
Trajectory Encryption Cooperative Salvo Guidance
This paper introduces the concept of trajectory encryption in cooperative simultaneous target interception, wherein heterogeneity in guidance principles across a team of unmanned autonomous systems is leveraged as a strategic design feature. By employing a mix of heterogeneous time-to-go formulations leading to a cooperative guidance strategy, the swarm of vehicles is able to generate diverse trajectory families. This diversity expands the feasible solution space for simultaneous target interception, enhances robustness under disturbances, and enables flexible time-to-go adjustments without predictable detouring. From an adversarial perspective, heterogeneity obscures the collective interception intent by preventing straightforward prediction of swarm dynamics, effectively acting as an encryption layer in the trajectory domain. Simulations demonstrate that the swarm of heterogeneous vehicles is able to intercept a moving target simultaneously from a diverse set of initial engagement configurations.
RobustNeuralNetworks.jl: a Package for Machine Learning and Data-Driven Control with Certified Robustness
Neural networks are typically sensitive to small input perturbations, leading to unexpected or brittle behaviour. We present RobustNeuralNetworks.jl: a Julia package for neural network models that are constructed to naturally satisfy a set of user-defined robustness metrics. The package is based on the recently proposed Recurrent Equilibrium Network (REN) and Lipschitz-Bounded Deep Network (LBDN) model classes, and is designed to interface directly with Julia's most widely-used machine learning package, Flux.jl. We discuss the theory behind our model parameterization, give an overview of the package, and provide a tutorial demonstrating its use in image classification, reinforcement learning, and nonlinear state-observer design.
comment: Accepted to the Proceedings of the JuliaCon Conferences
Electrostatic Clutch-Based Mechanical Multiplexer with Increased Force Capability
Robotic systems with many degrees of freedom (DoF) are constrained by the demands of dedicating a motor to each joint, and while mechanical multiplexing reduces actuator count, existing clutch designs are bulky, force-limited, or restricted to one output at a time. The problem addressed in this study is how to achieve high-force, multiplexing that supports both simultaneous and sequential control from a single motor. Here we show an electrostatic capstan clutch-based transmission that enables both single-input-single-output (SISO) and single-input-multiple-output (SIMO) multiplexing. We demonstrated these on a four-DoF tendon-driven robotic hand where a single motor achieved output forces up to 212 N, increased vertical grip strength by 4.09 times, and raised horizontal carrying capacity by 354\% over manufacturer specifications. These results demonstrate that electrostatic multiplexing provides versatile actuation, overcoming the limitations of prior systems.
Electricity Demand and Grid Impacts of AI Data Centers: Challenges and Prospects
The rapid growth of artificial intelligence (AI) is driving an unprecedented increase in the electricity demand of AI data centers, raising emerging challenges for electric power grids. Understanding the characteristics of AI data center loads and their interactions with the grid is therefore critical for ensuring both reliable power system operation and sustainable AI development. This paper provides a comprehensive review and vision of this evolving landscape. Specifically, this paper (i) presents an overview of AI data center infrastructure and its key components, (ii) examines the key characteristics and patterns of electricity demand across the stages of model preparation, training, fine-tuning, and inference, (iii) analyzes the critical challenges that AI data center loads pose to power systems across three interrelated timescales, including long-term planning and interconnection, short-term operation and electricity markets, and real-time dynamics and stability, and (iv) discusses potential solutions from the perspectives of the grid, AI data centers, and AI end-users to address these challenges. By synthesizing current knowledge and outlining future directions, this review aims to guide research and development in support of the joint advancement of AI data centers and power systems toward reliable, efficient, and sustainable operation.
A Model-Free Terminal Iterative Learning Control Scheme for Multi-Layer Printing Alignment Control Problems
Roll-to-roll (R2R) printing technologies are promising for high-volume continuous production of substrate-based electronic products. One of the major challenges in R2R flexible electronics printing is achieving tight alignment tolerances, as specified by the device resolution (usually at the micro-meter level), for multi-layer printed electronics. The alignment of the printed patterns in different layers is known as registration. Conventional registration control methods rely on real-time feedback controllers, such as PID control, to regulate the web tension and the web speed. However, those methods may lose effectiveness in compensating for recurring disturbances and supporting effective mitigation of registration errors. In this paper, we propose a Spatial-Terminal Iterative Learning Control (STILC) method integrated with PID control to iteratively learn and reduce registration error cycle-by-cycle, converging it to zero. This approach enables unprecedented precision in the creation, integration, and manipulation of multi-layer microstructures in R2R processes. We theoretically prove the convergence of the proposed STILC-PID hybrid approach and validate its effectiveness through a simulated registration error scenario caused by axis mismatch between roller and motor, a common issue in R2R systems. The results demonstrate that the STILC-PID hybrid control method can fully eliminate the registration error after a feasible number of iterations. Additionally, we analyze the impact of different learning gains on the convergence performance of STILC.
Systems and Control (EESS)
Data-Driven Optimal Power Flow: A Behavioral Systems Approach
The increasing decentralization of power systems driven by a large number of renewable energy sources poses challenges in power flow optimization. Partially unknown power line properties can render model-based approaches unsuitable. With increasing deployment of sensors, data-driven methods rise as a promising alternative. They offer the flexibility to adapt to varying grid structures and unknown line properties. In this paper, we propose a novel data-driven representation of nonlinear power flow equations for radial grids based on Willems' Fundamental Lemma. The approach allows for direct integration of input/output data into power flow optimisation, enabling cost minimization and constraint enforcement without requiring explicit knowledge of the electrical properties or the topology of the grid. Moreover, we formulate a convex relaxation to ensure compatibility with state-of-the-art solvers. In a numerical case study, we demonstrate that the novel approach performs similar to state-of-the-art methods, without the need for an explicit system identification step.
Data-Driven Resilience Assessment against Sparse Sensor Attacks
We present a data-driven framework for assessing the attack resilience of linear time-invariant systems against malicious false data injection sensor attacks. Based on the concept of sparse observability, data-driven resilience metrics are proposed. First, we derive a data-driven necessary and sufficient condition for assessing the system's resilience against sensor attacks, using data collected without any attacks. If we obtain attack-free data that satisfy a specific rank condition, we can exactly evaluate the attack resilience level even in a model-free setting. We then extend this analysis to a scenario where only poisoned data are available. Given the poisoned data, we can only conservatively assess the system's resilience. In both scenarios, we also provide polynomial-time algorithms to assess the system resilience under specific conditions. Finally, numerical examples illustrate the efficacy and limitations of the proposed framework.
Safety-Critical Input-Constrained Nonlinear Intercept Guidance in Multiple Engagement Zones
This paper presents an input-constrained nonlinear guidance law to address the problem of intercepting a stationary target in contested environments with multiple defending agents. Contrary to prior approaches that rely on explicit knowledge of defender strategies or utilize conservative safety conditions based on a defender's range, our work characterizes defender threats geometrically through engagement zones that delineate inevitable interception regions. Outside these engagement zones, the interceptor remains invulnerable. The proposed guidance law switches between a repulsive safety maneuver near these zones and a pursuit maneuver outside their influence. To deal with multiple engagement zones, we employ a smooth minimum function (log-sum-exponent approximation) that aggregates threats from all the zones while prioritizing the most critical threats. Input saturation is modeled and embedded in the non-holonomic vehicle dynamics so the controller respects actuator limits while maintaining stability. Numerical simulations with several defenders demonstrate the proposed method's ability to avoid engagement zones and achieve interception across diverse initial conditions.
Real-Time Estimation of Equivalent Series Resistance for Predicting Output Capacitor Failures in Boost Converters
Output capacitors are among the most critical components in power electronic converters, as their degradation directly affects system stability and reliability. A key indicator of capacitor health is the Equivalent Series Resistance (ESR), whose progressive increase is strongly correlated with aging and imminent failure. This paper presents a real-time technique for estimating the ESR of output capacitors in boost converters, with the aim of enabling early fault prediction and condition-based maintenance. The proposed method leverages online parameter estimation to extract ESR values without interrupting converter operation. The estimation algorithm has been implemented on a low-cost STM32 Nucleo platform, demonstrating both computational efficiency and suitability for embedded applications. Experimental validation confirms that the approach provides accurate ESR tracking under varying load and operating conditions, allowing timely detection of abnormal capacitor behavior and preventing unexpected system failures.
comment: 5 pages, 5 figures
MARLIN: Multi-Agent Reinforcement Learning with Murmuration Intelligence and LLM Guidance for Reservoir Management
As climate change intensifies extreme weather events, water disasters pose growing threats to global communities, making adaptive reservoir management critical for protecting vulnerable populations and ensuring water security. Modern water resource management faces unprecedented challenges from cascading uncertainties propagating through interconnected reservoir networks. These uncertainties, rooted in physical water transfer losses and environmental variability, make precise control difficult. For example, sending 10 tons downstream may yield only 8-12 tons due to evaporation and seepage. Traditional centralized optimization approaches suffer from exponential computational complexity and cannot effectively handle such real-world uncertainties, while existing multi-agent reinforcement learning (MARL) methods fail to achieve effective coordination under uncertainty. To address these challenges, we present MARLIN, a decentralized reservoir management framework inspired by starling murmurations intelligence. Integrating bio-inspired alignment, separation, and cohesion rules with MARL, MARLIN enables individual reservoirs to make local decisions while achieving emergent global coordination. In addition, a LLM provides real-time reward shaping signals, guiding agents to adapt to environmental changes and human-defined preferences. Experiments on real-world USGS data show that MARLIN improves uncertainty handling by 23\%, cuts computation by 35\%, and accelerates flood response by 68\%, exhibiting super-linear coordination, with complexity scaling 5.4x from 400 to 10,000 nodes. These results demonstrate MARLIN's potential for disaster prevention and protecting communities through intelligent, scalable water resource management.
Real-Time Power electronics Control and Monitoring with TI F28379D DSC and GUI Composer
This paper details the implementation and experimental validation of a real-time control system for a three-phase induction motor using the Texas Instruments TMS320F28379D microcontroller. The system integrates pulse-width modulation (PWM) generation, analog-to-digital conversion (ADC), digital-to-analog conversion (DAC), and quadrature encoder feedback to facilitate precise control under various strategies. A current sensing solution based on the AMC1301 isolation amplifier and shunt resistor ensures accurate and safe current measurement for feedback loops. Two control algorithms, V/f and Field-Oriented Control (FOC) are implemented and tested. Real-time parameter tuning and data visualization are achieved using GUI Composer, enabling efficient system debugging and interaction. Experimental results demonstrate smooth speed reversal, fast dynamic response, and stable performance under both step and multi-step inputs. While GUI Composer effectively supports general monitoring and control, limitations in signal bandwidth are noted compared to professional-grade platforms. The results confirm the robustness and effectiveness of the implemented control strategies for high-performance induction motor applications.
comment: 10 pages
Spectral Flow Learning Theory: Finite-Sample Guarantees for Vector-Field Identification
We study the identification of continuous-time vector fields from irregularly sampled trajectories. We introduce Spectral Flow Learning (SFL), which learns in a windowed flow space using a lag-linear label operator that aggregates lagged Koopman actions. We provide finite-sample high-probability (FS-HP) guarantees for the class of variable-step linear multistep methods (vLLM). The FS-HP rates are constructed using spectral regularization with qualification-controlled filters for flow predictors under standard source and filter assumptions. A multistep observability inequality links flow error to vector-field error and yields two-term bounds that combine a statistical rate with an explicit discretization bias from vLMM theory. This preliminary preprint states the results and sketches proofs, with full proofs and extensions deferred to a journal version.
Path Diffuser: Diffusion Model for Data-Driven Traffic Simulator
Simulating diverse and realistic traffic scenarios is critical for developing and testing autonomous planning. Traditional rule-based planners lack diversity and realism, while learning-based simulators often replay, forecast, or edit scenarios using historical agent trajectories. However, they struggle to generate new scenarios, limiting scalability and diversity due to their reliance on fully annotated logs and historical data. Thus, a key challenge for a learning-based simulator's performance is that it requires agents' past trajectories and pose information in addition to map data, which might not be available for all agents on the road.Without which, generated scenarios often produce unrealistic trajectories that deviate from drivable areas, particularly under out-of-distribution (OOD) map scenes (e.g., curved roads). To address this, we propose Path Diffuser (PD): a two-stage, diffusion model for generating agent pose initializations and their corresponding trajectories conditioned on the map, free of any historical context of agents' trajectories. Furthermore, PD incorporates a motion primitive-based prior, leveraging Frenet frame candidate trajectories to enhance diversity while ensuring road-compliant trajectory generation. We also explore various design choices for modeling complex multi-agent interactions. We demonstrate the effectiveness of our method through extensive experiments on the Argoverse2 Dataset and additionally evaluate the generalizability of the approach on OOD map variants. Notably, Path Diffuser outperforms the baseline methods by 1.92x on distribution metrics, 1.14x on common-sense metrics, and 1.62x on road compliance from adversarial benchmarks.
Coordinated vs. Sequential Transmission Planning
Coordinated planning of generation, storage, and transmission more accurately captures the interactions among these three capacity types necessary to meet electricity demand, at least in theory. However, in practice, U.S. system operators typically follow a sequential planning approach: They first determine future generation and storage additions based on an assumed unconstrained (`copper plate') system. Next, they perform dispatch simulations of this projected generation and storage capacity mix on the existing transmission grid to identify transmission constraint violations. These violations indicate the need for transmission upgrades. We describe a multistage, multi-locational planning model that co-optimizes generation, storage, and transmission investments. The model respects reliability constraints as well as state energy and climate policies. We test the two planning approaches using a current stakeholder-informed 20-zone model of the PJM region, developed for the current FERC Order No. 1920 compliance filing process. In our most conservative model specification, we find that the co-optimized approach estimates 67% lower transmission upgrade needs than the sequential model, leading to total system costs that are .6% lower and similar reliability and climate outcomes. Our sensitivities show larger transmission and cost savings and reliability and climate benefits from co-optimized planning.
comment: 10 pages
Physics-informed learning under mixing: How physical knowledge speeds up learning
A major challenge in physics-informed machine learning is to understand how the incorporation of prior domain knowledge affects learning rates when data are dependent. Focusing on empirical risk minimization with physics-informed regularization, we derive complexity-dependent bounds on the excess risk in probability and in expectation. We prove that, when the physical prior information is aligned, the learning rate improves from the (slow) Sobolev minimax rate to the (fast) optimal i.i.d. one without any sample-size deflation due to data dependence.
Event-Driven Control via Sparsity-Promoting Regularization: A Rollout Approach with Performance Guarantees
This paper presents a controller design method aiming to balance control performance and actuation frequency. In this framework, actuation timings are determined in an event-driven manner, resulting in intermittent control actions. Control performance is measured by an infinite-horizon average cost, and input sparsity is promoted through actuation-rate regularization. Since the formulated optimal control problem has a combinatorial nature, we employ a rollout algorithm to find a suboptimal solution. In the proposed scheme, the event-triggering condition is derived through a multistage minimization procedure using a receding-horizon approach. We establish theoretical performance bounds with respect to periodic control and prove the stability of the closed-loop system. The effectiveness of the proposed method is demonstrated through a numerical example.
comment: 14 pages
Position estimation based on UWB swarm optimization and comparison against traditional trilateration
Ultra-wideband (UWB) is a promising technology for indoor position estimation for various localization applications of object swarms, such as in 3D analysis of human movement with multiple on-body sensors or a swarm of drones in an indoor environment. However, most UWB-only position estimation methods are based on a star topology, where the position of a mobile node is estimated using distances from several fixed anchors. These approaches ignore the valuable inter-node distance estimates, possible in a fully-connected 'Swarm' topology, which could provide more redundancy in the set of available distance estimates used for the position estimation. This would improve the accuracy and consistency of the position estimates. Also, published studies do not analyze how input measurement errors affect the final position estimates, which makes it difficult to assess the reliability under varying conditions. Therefore, this study first proposes a UWB swarm optimization-based position estimation method that utilizes all available internode distances to enhance accuracy and compare against the traditional trilateration method that utilizes the star configuration. All validations were done with synthetic UWB data, to enable testing all input error situations. The comprehensive error sensitivity analysis was conducted to evaluate its robustness under varying noise conditions. The proposed method consistently outperformed trilateration, with position estimation error around 5.7 cm for realistic UWB distance input estimates, while for higher noise conditions, the proposed method had errors around 6 cm lower than the trilateration method, which had position estimation errors around 19 cm. This study demonstrates the general potential of the Swarm optimization-based method for position estimation as a more accurate and consistent alternative to traditional star-based trilateration methods.
comment: Submitted to journal and Manuscript under review
Stabilizing Humanoid Robot Trajectory Generation via Physics-Informed Learning and Control-Informed Steering IROS
Recent trends in humanoid robot control have successfully employed imitation learning to enable the learned generation of smooth, human-like trajectories from human data. While these approaches make more realistic motions possible, they are limited by the amount of available motion data, and do not incorporate prior knowledge about the physical laws governing the system and its interactions with the environment. Thus they may violate such laws, leading to divergent trajectories and sliding contacts which limit real-world stability. We address such limitations via a two-pronged learning strategy which leverages the known physics of the system and fundamental control principles. First, we encode physics priors during supervised imitation learning to promote trajectory feasibility. Second, we minimize drift at inference time by applying a proportional-integral controller directly to the generated output state. We validate our method on various locomotion behaviors for the ergoCub humanoid robot, where a physics-informed loss encourages zero contact foot velocity. Our experiments demonstrate that the proposed approach is compatible with multiple controllers on a real robot and significantly improves the accuracy and physical constraint conformity of generated trajectories.
comment: This paper has been accepted for publication at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hangzhou, China, 2025
Hierarchical Analysis and Control of Epidemic Spreading over Networks using Dissipativity and Mesh Stability
Analyzing and controlling spreading processes are challenging problems due to the involved non-linear node (subsystem) dynamics, unknown disturbances, complex interconnections, and the large-scale and multi-level nature of the problems. The dissipativity concept provides a practical framework for addressing such concerns, thanks to the energy-based representation it offers for subsystems and the compositional properties it provides for the analysis and control of interconnected (networked) systems comprised of such subsystems. Therefore, in this paper, we utilize the dissipativity concept to analyze and control a spreading process that occurs over a hierarchy of nodes, groups, and a network (i.e., a spreading network). We start by generalizing some existing results on dissipativity-based topology design for networked systems. Next, we model the considered spreading network as a networked system and establish the dissipativity properties of its nodes. The generalized topology design method is then applied at multiple levels of the considered spreading network to formulate its analysis and control problems as Linear Matrix Inequality (LMI) problems. We identify and enforce localized necessary conditions to support the feasibility of the LMI problem solved at each subsequent hierarchical level of the spreading network. Consequently, the proposed method does not involve iterative multi-level optimization stages that are computationally inefficient. The proposed control solution ensures that the spreading network is not only stable but also dissipative and mesh-stable. Compared to conventional methods, such as threshold pruning and high-degree edge removal, our approach offers superior performance in terms of infection containment, control efficiency, and disturbance robustness. Extensive numerical results demonstrate the effectiveness of the proposed technique.
comment: To be submitted to American Control Conference 2026
Hill-Type Stability Analysis of Periodic Solutions of Fractional-Order Differential Equations
This paper explores stability properties of periodic solutions of (nonlinear) fractional-order differential equations (FODEs). As classical Caputo-type FODEs do not admit exactly periodic solutions, we propose a framework of Liouville-Weyl-type FODEs, which do admit exactly periodic solutions and are an extension of Caputo-type FODEs. Local linearization around a periodic solution results in perturbation dynamics governed by a linear time-periodic differential equation. In the classical integer-order case, the perturbation dynamics is therefore described by Floquet theory, i.e. the exponential growth or decay of perturbations is expressed by Floquet exponents which can be assessed using the Hill matrix approach. For fractional-order systems, however, a rigorous Floquet theory is lacking. Here, we explore the limitations when trying to extend Floquet theory and the Hill matrix method to linear time-periodic fractional-order differential equations (LTP-FODEs) as local linearization of nonlinear fractional-order systems. A key result of the paper is that such an extended Floquet theory can only assess exponentially growing solutions of LTP-FODEs. Moreover, we provide an analysis of linear time-invariant fractional-order systems (LTI-FODEs) with algebraically decaying solutions and show that the inaccessibility of decaying solutions through Floquet theory is already present in the time-invariant case.
comment: 21 pages, 6 figures
Model-Free Dynamic Consensus in Multi-Agent Systems: A Q-Function Perspective
This paper presents a new method for achieving dynamic consensus in linear discrete-time homogeneous multi-agent systems (MAS) with marginally stable or unstable dynamics. The guarantee of consensus in this setting involves a set of constraints based on the graph's spectral properties, complicating the design of the coupling gains. This challenge intensifies for large-scale systems with diverse graph Laplacian spectra. The proposed approach reformulates the dynamic consensus problem with a prescribed convergence rate using a state-action value function framework inspired by optimal control theory. Specifically, a synthetic linear quadratic regulation (LQR) formulation is introduced to encode the consensus objective, enabling its translation into a convex semidefinite programming (SDP) problem. The resulting SDP is applicable in both model-based and model-free settings for jointly designing the local feedback and coupling gains. To handle the inherent non-convex feasibility conditions, a convex-concave decomposition strategy is employed. Adaptation of the method in a completely model-free set-up eliminates the need for system identification or knowledge of the agents' dynamics. Instead, it relies on input-state data collection and offers an entirely data-driven equivalent SDP formulation. Finally, a new algorithm balancing feasibility, convergence rate, robustness, and energy efficiency, is established to provide design flexibility. Numerical simulations validate the method's effectiveness in various scenarios.
PhysiAgent: An Embodied Agent Framework in Physical World
Vision-Language-Action (VLA) models have achieved notable success but often struggle with limited generalizations. To address this, integrating generalized Vision-Language Models (VLMs) as assistants to VLAs has emerged as a popular solution. However, current approaches often combine these models in rigid, sequential structures: using VLMs primarily for high-level scene understanding and task planning, and VLAs merely as executors of lower-level actions, leading to ineffective collaboration and poor grounding challenges. In this paper, we propose an embodied agent framework, PhysiAgent, tailored to operate effectively in physical environments. By incorporating monitor, memory, self-reflection mechanisms, and lightweight off-the-shelf toolboxes, PhysiAgent offers an autonomous scaffolding framework to prompt VLMs to organize different components based on real-time proficiency feedback from VLAs to maximally exploit VLAs' capabilities. Experimental results demonstrate significant improvements in task-solving performance on complex real-world robotic tasks, showcasing effective self-regulation of VLMs, coherent tool collaboration, and adaptive evolution of the framework during execution. PhysiAgent makes practical and pioneering efforts to integrate VLMs and VLAs, effectively grounding embodied agent frameworks in real-world settings.
Autonomous Detection and Coverage of Unknown Target Areas by Multi-Agent Systems
This paper presents a novel coverage control algorithm for multi-agent systems, where each agent has no prior knowledge of the specific region to be covered. The proposed method enables agents to autonomously detect the target area and collaboratively achieve full coverage. Once an agent detects a part of the target region within its sensor range, a dynamically constructed density function is generated to attract nearby agents. By integrating this density-driven mechanism with Centroidal Voronoi Tessellation (CVT), the agents are guided to achieve optimal spatial distribution. Additionally, Control Barrier Functions (CBFs) are employed to ensure collision avoidance and maintain non-overlapping sensor coverage, enhancing both safety and efficiency. Simulation results verify that agents can independently locate and effectively cover the target area.
comment: 8 pages, 9 figures
Small-Covariance Noise-to-State Stability of Stochastic Systems and Its Applications to Stochastic Gradient Dynamics
This paper studies gradient dynamics subject to additive random noise, which may arise from sources such as stochastic gradient estimation, measurement noise, or stochastic sampling errors. To analyze the robustness of such stochastic gradient systems, the concept of small-covariance noise-to-state stability (NSS) is introduced, along with a Lyapunov-based characterization. Furthermore, the classical Polyak-Lojasiewicz (PL) condition on the objective function is generalized to the $\mathcal{K}$-PL condition via comparison functions, thereby extending its applicability to a broader class of optimization problems. It is shown that the stochastic gradient dynamics exhibit small-covariance NSS if the objective function satisfies the $\mathcal{K}$-PL condition and possesses a globally Lipschitz continuous gradient. This result implies that the trajectories of stochastic gradient dynamics converge to a neighborhood of the optimum with high probability, with the size of the neighborhood determined by the noise covariance. Moreover, if the $\mathcal{K}$-PL condition is strengthened to a $\mathcal{K}_\infty$-PL condition, the dynamics are NSS; whereas if it is weakened to a general positive-definite-PL condition, the dynamics exhibit integral NSS. The results further extend to objectives without globally Lipschitz gradients through appropriate step-size tuning. The proposed framework is further applied to the robustness analysis of policy optimization for the linear quadratic regulator (LQR) and logistic regression.
comment: 25 pages, 1 figure
Towards Tighter Convex Relaxation of Mixed-integer Programs: Leveraging Logic Network Flow for Task and Motion Planning
This paper proposes an optimization-based task and motion planning framework, named "Logic Network Flow", that integrates temporal logic specifications into mixed-integer programs for efficient robot planning. Inspired by the Graph-of-Convex-Sets formulation, temporal predicates are encoded as polyhedron constraints on each edge of a network flow model, instead of as constraints between nodes in traditional Logic Tree formulations. We further propose a network-flow-based Fourier-Motzkin elimination procedure that removes continuous flow variables while preserving convex relaxation tightness, leading to provably tighter convex relaxations and fewer constraints than Logic Tree formulations. For temporal logic motion planning with piecewise-affine dynamic systems, comprehensive experiments across vehicle routing, multi-robot coordination, and temporal logic control on dynamical systems using point mass and linear inverted pendulum models demonstrate computational speedups of up to several orders of magnitude. Hardware demonstrations with quadrupedal robots validate real-time replanning capabilities under dynamically changing environmental conditions. The project website is at https://logicnetworkflow.github.io/.
comment: 35 pages, 17 figures, 7 tables
Multi-Agent Guided Policy Search for Non-Cooperative Dynamic Games
Multi-agent reinforcement learning (MARL) optimizes strategic interactions in non-cooperative dynamic games, where agents have misaligned objectives. However, data-driven methods such as multi-agent policy gradients (MA-PG) often suffer from instability and limit-cycle behaviors. Prior stabilization techniques typically rely on entropy-based exploration, which slows learning and increases variance. We propose a model-based approach that incorporates approximate priors into the reward function as regularization. In linear quadratic (LQ) games, we prove that such priors stabilize policy gradients and guarantee local exponential convergence to an approximate Nash equilibrium. We then extend this idea to infinite-horizon nonlinear games by introducing Multi-agent Guided Policy Search (MA-GPS), which constructs short-horizon local LQ approximations from trajectories of current policies to guide training. Experiments on nonlinear vehicle platooning and a six-player strategic basketball formation show that MA-GPS achieves faster convergence and more stable learning than existing MARL methods.
SDC-Based Model Predictive Control: Enhancing Computational Feasibility for Safety-Critical Quadrotor Control
Nonlinear Model Predictive Control (NMPC) is widely used for controlling high-speed robotic systems such as quadrotors. However, its significant computational demands often hinder real-time feasibility and reliability, particularly in environments requiring robust obstacle avoidance. This paper proposes a novel SDC-Based Model Predictive Control (MPC) framework, which preserves the high-precision performance of NMPC while substantially reducing computational complexity by over 30%. By reformulating the nonlinear quadrotor dynamics through the State-Dependent Coefficient (SDC) method, the original nonlinear program problem is transformed into a sequential quadratic optimization problem. The controller integrates an integral action to eliminate steady-state tracking errors and imposes constraints for safety-critical obstacle avoidance. Additionally, a disturbance estimator is incorporated to enhance robustness against external perturbations. Simulation results demonstrate that the SDC-Based MPC achieves comparable tracking accuracy to NMPC, with greater efficiency in terms of computation times, thereby improving its suitability for real-time applications. Theoretical analysis further establishes the stability and recursive feasibility of the proposed approach.
comment: This is the initial version
Learning Hybrid Dynamics via Convex Optimizations
This paper investigates the problem of identifying state-dependent switching systems, a class of hybrid dynamical systems that combine multiple linear or nonlinear modes. We propose two broad classes of switching systems: switching linear systems (SLSs) and switching polynomial systems (SPSs). We first formulate the joint estimation of the mode dynamics and switching rules as a mixed integer program. To solve its inherent scalability issue, we develop a hierarchy of convex relaxations and establish a bound and conditions under which these relaxations are tight. Building on these results, we propose a bilevel convex optimization framework that alternates between mode assignment and dynamics estimation, and we recover switching boundaries using margin-based polynomial classifiers. Numerical experiments on both linear and nonlinear oscillators demonstrate that the method accurately identifies mode dynamics and reconstructs switching surfaces from trajectory data. Our results provide a tractable optimization-based framework for switching system identification.
comment: 8 pages, 4 figures
A General Theory of Emergent Linearity in Complex Dynamical Systems: The Role of Spatial Averaging and Vanishing Correlations
Various natural and engineered systems, from urban traffic flow to the human brain, have been described by large-scale networked dynamical systems. Despite their vast differences, these systems are often similar in being comprised of numerous microscopic subsystems with complex nonlinear dynamics and interactions that give rise to diverse emergent macroscopic behaviors. As such, a long-standing question across various fields has been to understand why and how various forms of macroscopic behavior emerge from underlying microscopic dynamics. Motivated by a growing body of empirical observations, in this work we focus on linearity as one of the most fundamental aspects of system dynamics, and develop a general theoretical framework for the interplay between spatial averaging, decaying microscopic correlations, and emergent macroscopic linearity. Using and extending the theory of mixing sequences, we show that in a broad class of autonomous nonlinear networked systems, the dynamics of the average of all subsystems' states becomes asymptotically linear as the number of subsystems grows to infinity, provided that (in addition to technical assumptions) pairwise correlations between subsystems decay to 0 as their pairwise distance grows to infinity. We prove this result when the latter distance is between subsystems' linear indices or spatial locations, and provide extensions to linear time-invariant (LTI) limit dynamics, finite-sample analysis of rates of convergence, and networks of spatially-embedded subsystems with random locations. To our knowledge, this work is the first rigorous analysis of macroscopic linearity in large-scale heterogeneous networked systems, and provides a solid foundation for further theoretical and empirical analyses in various domains of science and engineering.
Integrator Forwading Design for Unicycles with Constant and Actuated Velocity in Polar Coordinates
In a companion paper, we present a modular framework for unicycle stabilization in polar coordinates that provides smooth steering laws through backstepping. Surprisingly, the same problem also allows the application of integrator forwarding. In this work, we leverage this feature and construct new smooth steering laws together with control Lyapunov functions (CLFs), expanding the set of CLFs available for inverse optimal control design. In the case of constant forward velocity (Dubins car), backstepping produces finite-time (deadbeat) parking, and we show that integrator forwarding yields the very same class of solutions. This reveals a fundamental connection between backstepping and forwarding in addressing both the unicycle and, the Dubins car parking problems.
Modular Design of Strict Control Lyapunov Functions for Global Stabilization of the Unicycle in Polar Coordinates
Since the mid-1990s, it has been known that, unlike in Cartesian form where Brockett's condition rules out static feedback stabilization, the unicycle is globally asymptotically stabilizable by smooth feedback in polar coordinates. In this note, we introduce a modular framework for designing smooth feedback laws that achieve global asymptotic stabilization in polar coordinates. These laws are bidirectional, enabling efficient parking maneuvers, and are paired with families of strict control Lyapunov functions (CLFs) constructed in a modular fashion. The resulting CLFs guarantee global asymptotic stability with explicit convergence rates and include barrier variants that yield "almost global" stabilization, excluding only zero-measure subsets of the rotation manifolds. The strictness of the CLFs is further leveraged in our companion paper, where we develop inverse-optimal redesigns with meaningful cost functions and infinite gain margins.
Inverse Optimal Feedback and Gain Margins for Unicycle Stabilization
The recent development of globally strict control Lyapunov functions (CLFs) for the challenging unicycle parking problem provides a foundation for pursuing optimality. We address this in the inverse optimal framework, thereby avoiding the need to solve the Hamilton$\unicode{x2013}$Jacobi$\unicode{x2013}$Bellman (HJB) equations, and establish a general result that is optimal with respect to a meaningful cost. We present several design examples that impose varying levels of penalty on the control effort, including arbitrarily bounded control. Furthermore, we show that the inverse optimal controller possesses an infinite gain margin thanks to the system being driftless, and leveraging this property, we extend the design to an adaptive controller that handles model uncertainty. Finally, we compare the performance of the non-adaptive inverse optimal controller with its adaptive counterpart.
comment: Submitted to ACC 2026
Exhaustive-Serve-Longest Control for Multi-robot Scheduling Systems
We study online task allocation for multi-robot, multi-queue systems with stochastic arrivals and switching delays. Time is slotted; each location can host at most one robot per slot; service consumes one slot; switching between locations incurs a one-slot travel delay; and arrivals are independent Bernoulli processes. We formulate a discounted-cost Markov decision process and propose Exhaustive-Serve-Longest (ESL), a simple real-time policy that serves exhaustively when the current location is nonempty and, when idle, switches to a longest unoccupied nonempty location, and we prove the optimality of this policy. As baselines, we tune a fixed-dwell cyclic policy via a discrete-time delay expression and implement a first-come-first-serve policy. Across server-to-location ratios and loads, ESL consistently yields lower discounted holding cost and smaller mean queue lengths, with action-time fractions showing more serving and restrained switching. Its simplicity and robustness make ESL a practical default for real-time multi-robot scheduling systems.
From Legacy to Leadership Intelligent Radio Network Planning Framework for Cell-Free Massive MIMO in B5G6G Era
The proliferation of cell-free Massive MIMO represents a transformative shift in wireless network architecture, addressing critical limitations of conventional distributed Massive MIMO systems. This paper presents an intelligent radio network planning framework that bridges legacy 5G infrastructures with future B5G/6G networks through cell-free architectures. By leveraging operational insights from existing 5G deployments, we systematically address coverage optimization, and capacity enhancement. Our scalable framework enables seamless evolution from legacy designs to next-generation cell-free systems. Through extensive simulations in dense urban environments, we demonstrate substantial improvements: 45% spectral efficiency gains, 30% interference reduction, and significantly enhanced uniform coverage. The proposed framework provides network operators with a practical roadmap for transitioning from traditional cellular architectures to demanding B5G/6G requirements while maximizing existing infrastructure investments.
comment: 6 pages, 12 figures, 4 tables, IEEE ETCM 2025 Conference Preprint
Spatiotemporal Forecasting of Incidents and Congestion with Implications for Sustainable Traffic Control
Urban traffic anomalies such as collisions and disruptions threaten the safety, efficiency, and sustainability of transportation systems. We present a simulation-based framework for modeling, detecting, and predicting such anomalies in urban networks. Using the SUMO platform, we generate reproducible rear-end and intersection crash scenarios with matched baselines, enabling controlled experimentation and comparative evaluation. We record vehicle-level travel time, speed, and emissions for edge and network-level analysis. On this dataset, we develop a hybrid forecasting architecture that combines bidirectional long short-term memory networks with a diffusion convolutional recurrent neural network to capture temporal dynamics and spatial dependencies. Our simulation studies on the Broadway corridor in New York City demonstrate the framework's ability to reproduce consistent incident conditions, quantify their effects, and provide accurate multi-horizon traffic forecasts. Our results highlight the value of combining controlled anomaly generation with deep predictive models to support reproducible evaluation and sustainable traffic management.
Message passing-based inference in an autoregressive active inference agent
We present the design of an autoregressive active inference agent in the form of message passing on a factor graph. Expected free energy is derived and distributed across a planning graph. The proposed agent is validated on a robot navigation task, demonstrating exploration and exploitation in a continuous-valued observation space with bounded continuous-valued actions. Compared to a classical optimal controller, the agent modulates action based on predictive uncertainty, arriving later but with a better model of the robot's dynamics.
comment: 14 pages, 4 figures, to be published in the proceedings of the International Workshop on Active Inference 2025
Environmental Rate Manipulation Attacks on Power Grid Security
The growing complexity of global supply chains has made hardware Trojans a significant threat in sensor-based power electronics. Traditional Trojan designs depend on digital triggers or fixed threshold conditions that can be detected during standard testing. In contrast, we introduce Environmental Rate Manipulation (ERM), a novel Trojan triggering mechanism that activates by monitoring the rate of change in environmental parameters rather than their absolute values. This approach allows the Trojan to remain inactive under normal conditions and evade redundancy and sensor-fusion defenses. We implement a compact 14~$\mu$m$^2$ circuit that measures capacitor charging rates in standard sensor front-ends and disrupts inverter pulse-width modulation PWM signals when a rapid change is induced. Experiments on a commercial Texas Instruments solar inverter demonstrate that ERM can trigger catastrophic driver chip failure. Furthermore, ETAP simulations indicate that a single compromised 100~kW inverter may initiate cascading grid instabilities. The attack's significance extends beyond individual sensors to entire classes of environmental sensing systems common in power electronics, demonstrating fundamental challenges for hardware security.
Fast Energy-Theft Attack on Frequency-Varying Wireless Power without Additional Sensors
With the popularity of wireless charging, energy access protection and cybersecurity are gaining importance, especially in public places. Currently, the most common energy encryption method uses frequency and associated impedance variation. However, we have proven that this method is not reliable, since a hacker can detect the changing frequency and adjust the compensation. However, the previously presented system needed time to follow the updated frequency, while encryption systems may vary the frequency faster to avoid energy theft. Furthermore, the previous system required an additional sensor coil. To solve these problems, we optimized the attack and the associated system, which can intrude and steal energy within 0.2 ms. The key is the elimination of the time-consuming maximum receiver current regulation. Also, we use the main receiving coil rather than any additional sensor antenna to detect the magnetic field. Thus, the new hardware is even simpler. A simulation model and experimental results demonstrate the fast response speed of the attack on encrypted wireless power and steal 65% of the power. Overall, the applicability of the attack is highly improved and leaves less room for hardening the encryption. The results demonstrate that energy access protection needs to be given great attention.
comment: 11 pages, 12 figures
Safe and Stable Control via Lyapunov-Guided Diffusion Models NeurIPS 2025
Diffusion models have made significant strides in recent years, exhibiting strong generalization capabilities in planning and control tasks. However, most diffusion-based policies remain focused on reward maximization or cost minimization, often overlooking critical aspects of safety and stability. In this work, we propose Safe and Stable Diffusion ($S^2$Diff), a model-based diffusion framework that explores how diffusion models can ensure safety and stability from a Lyapunov perspective. We demonstrate that $S^2$Diff eliminates the reliance on both complex gradient-based solvers (e.g., quadratic programming, non-convex solvers) and control-affine structures, leading to globally valid control policies driven by the learned certificate functions. Additionally, we uncover intrinsic connections between diffusion sampling and Almost Lyapunov theory, enabling the use of trajectory-level control policies to learn better certificate functions for safety and stability guarantees. To validate our approach, we conduct experiments on a wide variety of dynamical control systems, where $S^2$Diff consistently outperforms both certificate-based controllers and model-based diffusion baselines in terms of safety, stability, and overall control performance.
comment: Accepted by NeurIPS 2025
Physics-Informed Inductive Biases for Voltage Prediction in Distribution Grids
Voltage prediction in distribution grids is a critical yet difficult task for maintaining power system stability. Machine learning approaches, particularly Graph Neural Networks (GNNs), offer significant speedups but suffer from poor generalization when trained on limited or incomplete data. In this work, we systematically investigate the role of inductive biases in improving a model's ability to reliably learn power flow. Specifically, we evaluate three physics-informed strategies: (i) power-flow-constrained loss functions, (ii) complex-valued neural networks, and (iii) residual-based task reformulation. Using the ENGAGE dataset, which spans multiple low- and medium-voltage grid configurations, we conduct controlled experiments to isolate the effect of each inductive bias and assess both standard predictive performance and out-of-distribution generalization. Our study provides practical insights into which model assumptions most effectively guide learning for reliable and efficient voltage prediction in modern distribution networks.
Secure and Decentralized Peer-to-Peer Energy Transactions using Blockchain Technology
This paper presents an optimal peer-to-peer (P2P) energy transaction mechanism leveraging decentralized blockchain technology to enable a secure and scalable retail electricity market for the increasing penetration of distributed energy resources (DERs). A decentralized bidding strategy is proposed to maximize individual profits while collectively enhancing social welfare. The market design and transaction processes are simulated using the Ethereum testnet, demonstrating the blockchain network's capability to ensure secure, transparent, and sustainable P2P energy trading among DER participants.
A Hybrid Systems Model of Feedback Optimization for Linear Systems: Convergence and Robustness
Feedback optimization algorithms compute inputs to a system using real-time output measurements, which helps mitigate the effects of disturbances. However, existing work often models both system dynamics and computations in either discrete or continuous time, which may not accurately model some applications. In this work, we model linear system dynamics in continuous time, and we model the computations of inputs in discrete time. Therefore, we present a novel hybrid systems model of feedback optimization. We first establish the well-posedness of this hybrid model and establish completeness of solutions while ruling out Zeno behavior. Then we show the state of the system converges exponentially fast to a ball of known radius about a desired goal state. Next we analytically show that this system is robust to perturbations in (i) the values of measured outputs, (ii) the matrices that model the linear time-invariant system, and (iii) the times at which inputs are applied to the system. Simulation results confirm that this approach successfully mitigates the effects of disturbances.
comment: 14 Pages, 2 Figures, submitted to American Control Conference 2026
Communicating Plans, Not Percepts: Scalable Multi-Agent Coordination with Embodied World Models NeurIPS 2025
Robust coordination is critical for effective decision-making in multi-agent systems, especially under partial observability. A central question in Multi-Agent Reinforcement Learning (MARL) is whether to engineer communication protocols or learn them end-to-end. We investigate this dichotomy using embodied world models. We propose and compare two communication strategies for a cooperative task-allocation problem. The first, Learned Direct Communication (LDC), learns a protocol end-to-end, with agents generating messages and actions concurrently. The second, Intention Communication, uses an engineered inductive bias: a compact, learned world model, the Imagined Trajectory Generation Module (ITGM), to simulate future states. Agents then communicate a summary of this plan. We evaluate these approaches on goal-directed interaction in a grid world, a canonical abstraction for embodied AI problems. Our experiments reveal that while emergent communication is viable in simple settings, the engineered, world model-based approach shows superior performance, sample efficiency, and scalability as complexity increases. These findings advocate for integrating structured, predictive models into MARL agents to enable active, goal-driven coordination.
comment: Published in the Proceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Scaling Environments for Agents (SEA). Additionally accepted for presentation in the NeurIPS 2025 Workshop: Embodied World Models for Decision Making (EWM) and the NeurIPS 2025 Workshop: Optimization for Machine Learning (OPT)
Certified Neural Approximations of Nonlinear Dynamics
Neural networks hold great potential to act as approximate models of nonlinear dynamical systems, with the resulting neural approximations enabling verification and control of such systems. However, in safety-critical contexts, the use of neural approximations requires formal bounds on their closeness to the underlying system. To address this fundamental challenge, we propose a novel, adaptive, and parallelizable verification method based on certified first-order models. Our approach provides formal error bounds on the neural approximations of dynamical systems, allowing them to be safely employed as surrogates by interpreting the error bound as bounded disturbances acting on the approximated dynamics. We demonstrate the effectiveness and scalability of our method on a range of established benchmarks from the literature, showing that it significantly outperforms the state-of-the-art. Furthermore, we show that our framework can successfully address additional scenarios previously intractable for existing methods - neural network compression and an autoencoder-based deep learning architecture for learning Koopman operators for the purpose of trajectory prediction.
comment: first and second author contributed equally
Non-Expansive Mappings in Two-Time-Scale Stochastic Approximation: Finite-Time Analysis
Two-time-scale stochastic approximation algorithms are iterative methods used in applications such as optimization, reinforcement learning, and control. Finite-time analysis of these algorithms has primarily focused on fixed point iterations where both time-scales have contractive mappings. In this work, we broaden the scope of such analyses by considering settings where the slower time-scale has a non-expansive mapping. For such algorithms, the slower time-scale can be viewed as a stochastic inexact Krasnoselskii-Mann iteration. We also study a variant where the faster time-scale has a projection step which leads to non-expansiveness in the slower time-scale. We show that the last-iterate mean square residual error for such algorithms decays at a rate $O(1/k^{1/4-\epsilon})$, where $\epsilon>0$ is arbitrarily small. We further establish almost sure convergence of iterates to the set of fixed points. We demonstrate the applicability of our framework by applying our results to minimax optimization, linear stochastic approximation, and Lagrangian optimization.
comment: Submitted to SIAM Journal on Control and Optimization
Can Human Drivers and Connected Autonomous Vehicles Co-exist in Lane-Free Traffic? A Microscopic Simulation Perspective
Recent advancements in connected autonomous vehicle (CAV) technology have sparked growing research interest in lane-free traffic (LFT). LFT envisions a scenario where all vehicles are CAVs, coordinating their movements without lanes to achieve smoother traffic flow and higher road capacity. This potentially reduces congestion without building new infrastructure. However, the transition phase will likely involve non-connected actors such as human-driven vehicles (HDVs) or independent AVs sharing the roads. This raises the question of how LFT performance is impacted when not all vehicles are CAVs, as these non-connected vehicles may prioritize their own benefits over system-wide improvements. This paper addresses this question through microscopic simulation on a ring road, where CAVs follow the potential lines (PL) controller for LFT, while HDVs adhere to a strip-based car-following model. The PL controller is also modified for safe velocities to prevent collisions. The results reveal that even a small percentage of HDVs can significantly disrupt LFT flow: 5% HDVs can reduce LFT's maximum road capacity by 20% and a 40% HDVs nearly halves it, up until 100% HDVs where it drops by nearly 60%. The study also develops an adaptive potential line (APL) controller that forms APL corridors in the surroundings of HDVs. APL shows a peak traffic flow improvement of nearly 10% over the PL controller. The study indicates that a penetration rate of approximately 60% CAVs is required to start observing the major LFT benefits. These findings open a new research direction on minimizing the adverse effects of non-connected vehicles on LFT.
comment: This version corresponds to the final published article in Transportation Research Part C: Emerging Technologies. It incorporates revisions made during peer review, including an improved literature review, clearer methodological descriptions, and explicit consideration of safety and comfort
CommonPower: A Framework for Safe Data-Driven Smart Grid Control
The growing complexity of power system management has led to an increased interest in reinforcement learning (RL). To validate their effectiveness, RL algorithms have to be evaluated across multiple case studies. Case study design is an arduous task requiring the consideration of many aspects, among them the influence of available forecasts and the level of decentralization in the control structure. Furthermore, vanilla RL controllers cannot themselves ensure the satisfaction of system constraints, which makes devising a safeguarding mechanism a necessary task for every case study before deploying the system. To address these shortcomings, we introduce the Python tool CommonPower, the first general framework for the modeling and simulation of power system management tailored towards machine learning. Its modular architecture enables users to focus on specific elements without having to implement a simulation environment. Another unique contribution of CommonPower is the automatic synthesis of model predictive controllers and safeguards. Beyond offering a unified interface for single-agent RL, multi-agent RL, and optimal control, CommonPower includes a training pipeline for machine-learning-based forecasters as well as a flexible mechanism for incorporating feedback of safeguards into the learning updates of RL controllers.
comment: For the corresponding code repository, see https://github.com/TUMcps/commonpower
Is FISHER All You Need in The Multi-AUV Underwater Target Tracking Task?
It is significant to employ multiple autonomous underwater vehicles (AUVs) to execute the underwater target tracking task collaboratively. However, it's pretty challenging to meet various prerequisites utilizing traditional control methods. Therefore, we propose an effective two-stage learning from demonstrations training framework, FISHER, to highlight the adaptability of reinforcement learning (RL) methods in the multi-AUV underwater target tracking task, while addressing its limitations such as extensive requirements for environmental interactions and the challenges in designing reward functions. The first stage utilizes imitation learning (IL) to realize policy improvement and generate offline datasets. To be specific, we introduce multi-agent discriminator-actor-critic based on improvements of the generative adversarial IL algorithm and multi-agent IL optimization objective derived from the Nash equilibrium condition. Then in the second stage, we develop multi-agent independent generalized decision transformer, which analyzes the latent representation to match the future states of high-quality samples rather than reward function, attaining further enhanced policies capable of handling various scenarios. Besides, we propose a simulation to simulation demonstration generation procedure to facilitate the generation of expert demonstrations in underwater environments, which capitalizes on traditional control methods and can easily accomplish the domain transfer to obtain demonstrations. Extensive simulation experiments from multiple scenarios showcase that FISHER possesses strong stability, multi-task performance and capability of generalization.
comment: This paper has been accepted by IEEE Transactions on Mobile Computing. Besides, Guanwen Xie and Jingzehua Xu contributed equally to this work
Guided Reinforcement Learning for Omnidirectional 3D Jumping in Quadruped Robots
Jumping poses a significant challenge for quadruped robots, despite being crucial for many operational scenarios. While optimisation methods exist for controlling such motions, they are often time-consuming and demand extensive knowledge of robot and terrain parameters, making them less robust in real-world scenarios. Reinforcement learning (RL) is emerging as a viable alternative, yet conventional end-to-end approaches lack efficiency in terms of sample complexity, requiring extensive training in simulations, and predictability of the final motion, which makes it difficult to certify the safety of the final motion. To overcome these limitations, this paper introduces a novel guided reinforcement learning approach that leverages physical intuition for efficient and explainable jumping, by combining B\'ezier curves with a Uniformly Accelerated Rectilinear Motion (UARM) model. Extensive simulation and experimental results clearly demonstrate the advantages of our approach over existing alternatives.
Analysis of Hierarchical AoII over Unreliable Channels: A Stochastic Hybrid System Approach
In this work, we generalize the Stochastic Hybrid Systems (SHSs) analysis of traditional AoI to the AoII metric. Hierarchical ageing processes are adopted using the continuous AoII for the first time, where two different hierarchy schemes, i.e., a hybrid of linear ageing processes with different slopes and a hybrid of linear and quadratic ageing processes, are considered. We first modify the main result in \cite[Theorem 1]{yates_age_2020b} to provide a systematic way to analyze the continuous hierarchical AoII over unslotted real-time systems. The closed-form expressions of average hierarchical AoII are obtained based on our Theorem \ref{theorem1} in two typical scenarios with different channel conditions, i.e., an M/M/1/1 queue over noisy channels and two M/M/1/1 queues over collision channels. Moreover, we analyze the stability conditions for two scenarios given that the quadratic ageing process may lead to the absence of stationary solutions. Finally, we compare the average age performance between the classic AoI results and our AoII results in the M/M/1/1 queue, and the effects of different channel parameters on AoII are also evaluated.
comment: 16 pages, 10 figures
Steering the Herd: A Framework for LLM-based Control of Social Learning
Algorithms increasingly serve as information mediators--from social media feeds and targeted advertising to the increasing ubiquity of LLMs. This engenders a joint process where agents combine private, algorithmically-mediated signals with learning from peers to arrive at decisions. To study such settings, we introduce a model of controlled sequential social learning in which an information-mediating planner (e.g. an LLM) controls the information structure of agents while they also learn from the decisions of earlier agents. The planner may seek to improve social welfare (altruistic planner) or to induce a specific action the planner prefers (biased planner). Our framework presents a new optimization problem for social learning that combines dynamic programming with decentralized action choices and Bayesian belief updates. We prove the convexity of the value function and characterize the optimal policies of altruistic and biased planners, which attain desired tradeoffs between the costs they incur and the payoffs they earn from induced agent choices. Notably, in some regimes the biased planner intentionally obfuscates the agents' signals. Even under stringent transparency constraints--information parity with individuals, no lying or cherry-picking, and full observability--we show that information mediation can substantially shift social welfare in either direction. We complement our theory with simulations in which LLMs act as both planner and agents. Notably, the LLM planner in our simulations exhibits emergent strategic behavior in steering public opinion that broadly mirrors the trends predicted, though key deviations suggest the influence of non-Bayesian reasoning consistent with the cognitive patterns of both humans and LLMs trained on human-like data. Together, we establish our framework as a tractable basis for studying the impact and regulation of LLM information mediators.
A Crime/S.I.R. optimal control problem
This paper presents and discusses a mathematical model inspired by control theory to derive optimal public policies for minimizing costs associated with the reduction and control of criminal activity in a population. Specifically, we analyze the optimal control problem \begin{equation*} \min G(u_1, u_2, u_3) = \int_{0}^{t_{\text{F}}} \left( I(t) - R(t) + \frac{B_1}{2} u_1^2(t) + \frac{B_2}{2} u_2^2(t) + \frac{B_3}{2} u_3^2(t) \right) \, dt. \end{equation*} where $I=I(t)$ and $R=R(t)$ satisfies the system of equations \begin{equation*} \left\{ \begin{aligned} \dot{S} &= \Lambda - (1-u_1)SI - \mu S + ((1+u_3)\gamma_2)I + \rho \Omega R,\\ \dot{I} &= (1-u_1)SI - (\mu + \delta_1)I - ((1+u_2)\gamma_1)I - ((1+u_3)\gamma_2)I + (1-\Omega)\rho R,\\ \dot{R} &= ((1+u_2)\gamma_1)I - (\mu + \delta_2 + \rho)R. \end{aligned} \right. \end{equation*} Our approach assumes that the social and economic effects of criminal behavior can be modeled by a dynamic SIR-type system, which serves as a constraint on a cost functional associated with the strategies implemented by government and law enforcement authorities to reduce criminal behavior. Using optimal control theory, the proposed controls, i.e., preventive policies (such as community and social cohesion programs), are expected to have a significant and positive impact on crime reduction, generating opportunities for the most disadvantaged sectors of Cali society and contributing to long-term security. Given that resources to address this problem are limited, this research aims to determine an optimal combination of public interventions and policies that minimize criminality at the lowest possible economic cost, using an SIR model, tools from variational calculus, and optimal control theory.
Trajectory Encryption Cooperative Salvo Guidance
This paper introduces the concept of trajectory encryption in cooperative simultaneous target interception, wherein heterogeneity in guidance principles across a team of unmanned autonomous systems is leveraged as a strategic design feature. By employing a mix of heterogeneous time-to-go formulations leading to a cooperative guidance strategy, the swarm of vehicles is able to generate diverse trajectory families. This diversity expands the feasible solution space for simultaneous target interception, enhances robustness under disturbances, and enables flexible time-to-go adjustments without predictable detouring. From an adversarial perspective, heterogeneity obscures the collective interception intent by preventing straightforward prediction of swarm dynamics, effectively acting as an encryption layer in the trajectory domain. Simulations demonstrate that the swarm of heterogeneous vehicles is able to intercept a moving target simultaneously from a diverse set of initial engagement configurations.
RobustNeuralNetworks.jl: a Package for Machine Learning and Data-Driven Control with Certified Robustness
Neural networks are typically sensitive to small input perturbations, leading to unexpected or brittle behaviour. We present RobustNeuralNetworks.jl: a Julia package for neural network models that are constructed to naturally satisfy a set of user-defined robustness metrics. The package is based on the recently proposed Recurrent Equilibrium Network (REN) and Lipschitz-Bounded Deep Network (LBDN) model classes, and is designed to interface directly with Julia's most widely-used machine learning package, Flux.jl. We discuss the theory behind our model parameterization, give an overview of the package, and provide a tutorial demonstrating its use in image classification, reinforcement learning, and nonlinear state-observer design.
comment: Accepted to the Proceedings of the JuliaCon Conferences
Electrostatic Clutch-Based Mechanical Multiplexer with Increased Force Capability
Robotic systems with many degrees of freedom (DoF) are constrained by the demands of dedicating a motor to each joint, and while mechanical multiplexing reduces actuator count, existing clutch designs are bulky, force-limited, or restricted to one output at a time. The problem addressed in this study is how to achieve high-force, multiplexing that supports both simultaneous and sequential control from a single motor. Here we show an electrostatic capstan clutch-based transmission that enables both single-input-single-output (SISO) and single-input-multiple-output (SIMO) multiplexing. We demonstrated these on a four-DoF tendon-driven robotic hand where a single motor achieved output forces up to 212 N, increased vertical grip strength by 4.09 times, and raised horizontal carrying capacity by 354\% over manufacturer specifications. These results demonstrate that electrostatic multiplexing provides versatile actuation, overcoming the limitations of prior systems.
Electricity Demand and Grid Impacts of AI Data Centers: Challenges and Prospects
The rapid growth of artificial intelligence (AI) is driving an unprecedented increase in the electricity demand of AI data centers, raising emerging challenges for electric power grids. Understanding the characteristics of AI data center loads and their interactions with the grid is therefore critical for ensuring both reliable power system operation and sustainable AI development. This paper provides a comprehensive review and vision of this evolving landscape. Specifically, this paper (i) presents an overview of AI data center infrastructure and its key components, (ii) examines the key characteristics and patterns of electricity demand across the stages of model preparation, training, fine-tuning, and inference, (iii) analyzes the critical challenges that AI data center loads pose to power systems across three interrelated timescales, including long-term planning and interconnection, short-term operation and electricity markets, and real-time dynamics and stability, and (iv) discusses potential solutions from the perspectives of the grid, AI data centers, and AI end-users to address these challenges. By synthesizing current knowledge and outlining future directions, this review aims to guide research and development in support of the joint advancement of AI data centers and power systems toward reliable, efficient, and sustainable operation.
A Model-Free Terminal Iterative Learning Control Scheme for Multi-Layer Printing Alignment Control Problems
Roll-to-roll (R2R) printing technologies are promising for high-volume continuous production of substrate-based electronic products. One of the major challenges in R2R flexible electronics printing is achieving tight alignment tolerances, as specified by the device resolution (usually at the micro-meter level), for multi-layer printed electronics. The alignment of the printed patterns in different layers is known as registration. Conventional registration control methods rely on real-time feedback controllers, such as PID control, to regulate the web tension and the web speed. However, those methods may lose effectiveness in compensating for recurring disturbances and supporting effective mitigation of registration errors. In this paper, we propose a Spatial-Terminal Iterative Learning Control (STILC) method integrated with PID control to iteratively learn and reduce registration error cycle-by-cycle, converging it to zero. This approach enables unprecedented precision in the creation, integration, and manipulation of multi-layer microstructures in R2R processes. We theoretically prove the convergence of the proposed STILC-PID hybrid approach and validate its effectiveness through a simulated registration error scenario caused by axis mismatch between roller and motor, a common issue in R2R systems. The results demonstrate that the STILC-PID hybrid control method can fully eliminate the registration error after a feasible number of iterations. Additionally, we analyze the impact of different learning gains on the convergence performance of STILC.
Robotics
Mash, Spread, Slice! Learning to Manipulate Object States via Visual Spatial Progress
Most robot manipulation focuses on changing the kinematic state of objects: picking, placing, opening, or rotating them. However, a wide range of real-world manipulation tasks involve a different class of object state change--such as mashing, spreading, or slicing--where the object's physical and visual state evolve progressively without necessarily changing its position. We present SPARTA, the first unified framework for the family of object state change manipulation tasks. Our key insight is that these tasks share a common structural pattern: they involve spatially-progressing, object-centric changes that can be represented as regions transitioning from an actionable to a transformed state. Building on this insight, SPARTA integrates spatially progressing object change segmentation maps, a visual skill to perceive actionable vs. transformed regions for specific object state change tasks, to generate a) structured policy observations that strip away appearance variability, and b) dense rewards that capture incremental progress over time. These are leveraged in two SPARTA policy variants: reinforcement learning for fine-grained control without demonstrations or simulation; and greedy control for fast, lightweight deployment. We validate SPARTA on a real robot for three challenging tasks across 10 diverse real-world objects, achieving significant improvements in training time and accuracy over sparse rewards and visual goal-conditioned baselines. Our results highlight progress-aware visual representations as a versatile foundation for the broader family of object state manipulation tasks. Project website: https://vision.cs.utexas.edu/projects/sparta-robot
BOSfM: A View Planning Framework for Optimal 3D Reconstruction of Agricultural Scenes
Active vision (AV) has been in the spotlight of robotics research due to its emergence in numerous applications including agricultural tasks such as precision crop monitoring and autonomous harvesting to list a few. A major AV problem that gained popularity is the 3D reconstruction of targeted environments using 2D images from diverse viewpoints. While collecting and processing a large number of arbitrarily captured 2D images can be arduous in many practical scenarios, a more efficient solution involves optimizing the placement of available cameras in 3D space to capture fewer, yet more informative, images that provide sufficient visual information for effective reconstruction of the environment of interest. This process termed as view planning (VP), can be markedly challenged (i) by noise emerging in the location of the cameras and/or in the extracted images, and (ii) by the need to generalize well in other unknown similar agricultural environments without need for re-optimizing or re-training. To cope with these challenges, the present work presents a novel VP framework that considers a reconstruction quality-based optimization formulation that relies on the notion of `structure-from-motion' to reconstruct the 3D structure of the sought environment from the selected 2D images. With no analytic expression of the optimization function and with costly function evaluations, a Bayesian optimization approach is proposed to efficiently carry out the VP process using only a few function evaluations, while accounting for different noise cases. Numerical tests on both simulated and real agricultural settings signify the benefits of the advocated VP approach in efficiently estimating the optimal camera placement to accurately reconstruct 3D environments of interest, and generalize well on similar unknown environments.
Ancestry Tree Clustering for Particle Filter Diversity Maintenance
We propose a method for linear-time diversity maintenance in particle filtering. It clusters particles based on ancestry tree topology: closely related particles in sufficiently large subtrees are grouped together. The main idea is that the tree structure implicitly encodes similarity without the need for spatial or other domain-specific metrics. This approach, when combined with intra-cluster fitness sharing and the protection of particles not included in a cluster, effectively prevents premature convergence in multimodal environments while maintaining estimate compactness. We validate our approach in a multimodal robotics simulation and a real-world multimodal indoor environment. We compare the performance to several diversity maintenance algorithms from the literature, including Deterministic Resampling and Particle Gaussian Mixtures. Our algorithm achieves high success rates with little to no negative effect on compactness, showing particular robustness to different domains and challenging initial conditions.
comment: 15th International Conference on Indoor Positioning and Indoor Navigation, 15-18 September 2025, Tampere, Finland Originally 8 pages. The online version with appendices is 14 pages
Prepare for Warp Speed: Sub-millisecond Visual Place Recognition Using Event Cameras
Visual Place Recognition (VPR) enables systems to identify previously visited locations within a map, a fundamental task for autonomous navigation. Prior works have developed VPR solutions using event cameras, which asynchronously measure per-pixel brightness changes with microsecond temporal resolution. However, these approaches rely on dense representations of the inherently sparse camera output and require tens to hundreds of milliseconds of event data to predict a place. Here, we break this paradigm with Flash, a lightweight VPR system that predicts places using sub-millisecond slices of event data. Our method is based on the observation that active pixel locations provide strong discriminative features for VPR. Flash encodes these active pixel locations using efficient binary frames and computes similarities via fast bitwise operations, which are then normalized based on the relative event activity in the query and reference frames. Flash improves Recall@1 for sub-millisecond VPR over existing baselines by 11.33x on the indoor QCR-Event-Dataset and 5.92x on the 8 km Brisbane-Event-VPR dataset. Moreover, our approach reduces the duration for which the robot must operate without awareness of its position, as evidenced by a localization latency metric we term Time to Correct Match (TCM). To the best of our knowledge, this is the first work to demonstrate sub-millisecond VPR using event cameras.
Clebsch-Gordan Transformer: Fast and Global Equivariant Attention
The global attention mechanism is one of the keys to the success of transformer architecture, but it incurs quadratic computational costs in relation to the number of tokens. On the other hand, equivariant models, which leverage the underlying geometric structures of problem instance, often achieve superior accuracy in physical, biochemical, computer vision, and robotic tasks, at the cost of additional compute requirements. As a result, existing equivariant transformers only support low-order equivariant features and local context windows, limiting their expressiveness and performance. This work proposes Clebsch-Gordan Transformer, achieving efficient global attention by a novel Clebsch-Gordon Convolution on $\SO(3)$ irreducible representations. Our method enables equivariant modeling of features at all orders while achieving ${O}(N \log N)$ input token complexity. Additionally, the proposed method scales well with high-order irreducible features, by exploiting the sparsity of the Clebsch-Gordon matrix. Lastly, we also incorporate optional token permutation equivariance through either weight sharing or data augmentation. We benchmark our method on a diverse set of benchmarks including n-body simulation, QM9, ModelNet point cloud classification and a robotic grasping dataset, showing clear gains over existing equivariant transformers in GPU memory size, speed, and accuracy.
Systematic Alias Sampling: an efficient and low-variance way to sample from a discrete distribution
In this paper we combine the Alias method with the concept of systematic sampling, a method commonly used in particle filters for efficient low-variance resampling. The proposed method allows very fast sampling from a discrete distribution: drawing k samples is up to an order of magnitude faster than binary search from the cumulative distribution function (cdf) or inversion methods used in many libraries. The produced empirical distribution function is evaluated using a modified Cram\'er-Von Mises goodness-of-fit statistic, showing that the method compares very favourably to multinomial sampling. As continuous distributions can often be approximated with discrete ones, the proposed method can be used as a very general way to efficiently produce random samples for particle filter proposal distributions, e.g. for motion models in robotics.
Gaze Estimation for Human-Robot Interaction: Analysis Using the NICO Platform
This paper evaluates the current gaze estimation methods within an HRI context of a shared workspace scenario. We introduce a new, annotated dataset collected with the NICO robotic platform. We evaluate four state-of-the-art gaze estimation models. The evaluation shows that the angular errors are close to those reported on general-purpose benchmarks. However, when expressed in terms of distance in the shared workspace the best median error is 16.48 cm quantifying the practical limitations of current methods. We conclude by discussing these limitations and offering recommendations on how to best integrate gaze estimation as a modality in HRI systems.
comment: Code available at http://github.com/kocurvik/nico_gaze
Advancing Multi-agent Traffic Simulation via R1-Style Reinforcement Fine-Tuning
Scalable and realistic simulation of multi-agent traffic behavior is critical for advancing autonomous driving technologies. Although existing data-driven simulators have made significant strides in this domain, they predominantly rely on supervised learning to align simulated distributions with real-world driving scenarios. A persistent challenge, however, lies in the distributional shift that arises between training and testing, which often undermines model generalization in unseen environments. To address this limitation, we propose SMART-R1, a novel R1-style reinforcement fine-tuning paradigm tailored for next-token prediction models to better align agent behavior with human preferences and evaluation metrics. Our approach introduces a metric-oriented policy optimization algorithm to improve distribution alignment and an iterative "SFT-RFT-SFT" training strategy that alternates between Supervised Fine-Tuning (SFT) and Reinforcement Fine-Tuning (RFT) to maximize performance gains. Extensive experiments on the large-scale Waymo Open Motion Dataset (WOMD) validate the effectiveness of this simple yet powerful R1-style training framework in enhancing foundation models. The results on the Waymo Open Sim Agents Challenge (WOSAC) showcase that SMART-R1 achieves state-of-the-art performance with an overall realism meta score of 0.7858, ranking first on the leaderboard at the time of submission.
MAD-PINN: A Decentralized Physics-Informed Machine Learning Framework for Safe and Optimal Multi-Agent Control
Co-optimizing safety and performance in large-scale multi-agent systems remains a fundamental challenge. Existing approaches based on multi-agent reinforcement learning (MARL), safety filtering, or Model Predictive Control (MPC) either lack strict safety guarantees, suffer from conservatism, or fail to scale effectively. We propose MAD-PINN, a decentralized physics-informed machine learning framework for solving the multi-agent state-constrained optimal control problem (MASC-OCP). Our method leverages an epigraph-based reformulation of SC-OCP to simultaneously capture performance and safety, and approximates its solution via a physics-informed neural network. Scalability is achieved by training the SC-OCP value function on reduced-agent systems and deploying them in a decentralized fashion, where each agent relies only on local observations of its neighbours for decision-making. To further enhance safety and efficiency, we introduce an Hamilton-Jacobi (HJ) reachability-based neighbour selection strategy to prioritize safety-critical interactions, and a receding-horizon policy execution scheme that adapts to dynamic interactions while reducing computational burden. Experiments on multi-agent navigation tasks demonstrate that MAD-PINN achieves superior safety-performance trade-offs, maintains scalability as the number of agents grows, and consistently outperforms state-of-the-art baselines.
comment: 9 Pages, 4 Figures, 3 Tables. First two authors have contributed equally
DriveE2E: Closed-Loop Benchmark for End-to-End Autonomous Driving through Real-to-Simulation
Closed-loop evaluation is increasingly critical for end-to-end autonomous driving. Current closed-loop benchmarks using the CARLA simulator rely on manually configured traffic scenarios, which can diverge from real-world conditions, limiting their ability to reflect actual driving performance. To address these limitations, we introduce a simple yet challenging closed-loop evaluation framework that closely integrates real-world driving scenarios into the CARLA simulator with infrastructure cooperation. Our approach involves extracting 800 dynamic traffic scenarios selected from a comprehensive 100-hour video dataset captured by high-mounted infrastructure sensors, and creating static digital twin assets for 15 real-world intersections with consistent visual appearance. These digital twins accurately replicate the traffic and environmental characteristics of their real-world counterparts, enabling more realistic simulations in CARLA. This evaluation is challenging due to the diversity of driving behaviors, locations, weather conditions, and times of day at complex urban intersections. In addition, we provide a comprehensive closed-loop benchmark for evaluating end-to-end autonomous driving models. Project URL: \href{https://github.com/AIR-THU/DriveE2E}{https://github.com/AIR-THU/DriveE2E}.
comment: End-to-End Autonomous Driving Simulation and Benchmark
SIG-Chat: Spatial Intent-Guided Conversational Gesture Generation Involving How, When and Where
The accompanying actions and gestures in dialogue are often closely linked to interactions with the environment, such as looking toward the interlocutor or using gestures to point to the described target at appropriate moments. Speech and semantics guide the production of gestures by determining their timing (WHEN) and style (HOW), while the spatial locations of interactive objects dictate their directional execution (WHERE). Existing approaches either rely solely on descriptive language to generate motions or utilize audio to produce non-interactive gestures, thereby lacking the characterization of interactive timing and spatial intent. This significantly limits the applicability of conversational gesture generation, whether in robotics or in the fields of game and animation production. To address this gap, we present a full-stack solution. We first established a unique data collection method to simultaneously capture high-precision human motion and spatial intent. We then developed a generation model driven by audio, language, and spatial data, alongside dedicated metrics for evaluating interaction timing and spatial accuracy. Finally, we deployed the solution on a humanoid robot, enabling rich, context-aware physical interactions.
DexFlyWheel: A Scalable and Self-improving Data Generation Framework for Dexterous Manipulation NeurIPS 2025
Dexterous manipulation is critical for advancing robot capabilities in real-world applications, yet diverse and high-quality datasets remain scarce. Existing data collection methods either rely on human teleoperation or require significant human engineering, or generate data with limited diversity, which restricts their scalability and generalization. In this paper, we introduce DexFlyWheel, a scalable data generation framework that employs a self-improving cycle to continuously enrich data diversity. Starting from efficient seed demonstrations warmup, DexFlyWheel expands the dataset through iterative cycles. Each cycle follows a closed-loop pipeline that integrates Imitation Learning (IL), residual Reinforcement Learning (RL), rollout trajectory collection, and data augmentation. Specifically, IL extracts human-like behaviors from demonstrations, and residual RL enhances policy generalization. The learned policy is then used to generate trajectories in simulation, which are further augmented across diverse environments and spatial configurations before being fed back into the next cycle. Over successive iterations, a self-improving data flywheel effect emerges, producing datasets that cover diverse scenarios and thereby scaling policy performance. Experimental results demonstrate that DexFlyWheel generates over 2,000 diverse demonstrations across four challenging tasks. Policies trained on our dataset achieve an average success rate of 81.9\% on the challenge test sets and successfully transfer to the real world through digital twin, achieving a 78.3\% success rate on dual-arm lift tasks.
comment: NeurIPS 2025, Spotlight
Control Your Robot: A Unified System for Robot Control and Policy Deployment
Cross-platform robot control remains difficult because hardware interfaces, data formats, and control paradigms vary widely, which fragments toolchains and slows deployment. To address this, we present Control Your Robot, a modular, general-purpose framework that unifies data collection and policy deployment across diverse platforms. The system reduces fragmentation through a standardized workflow with modular design, unified APIs, and a closed-loop architecture. It supports flexible robot registration, dual-mode control with teleoperation and trajectory playback, and seamless integration from multimodal data acquisition to inference. Experiments on single-arm and dual-arm systems show efficient, low-latency data collection and effective support for policy learning with imitation learning and vision-language-action models. Policies trained on data gathered by Control Your Robot match expert demonstrations closely, indicating that the framework enables scalable and reproducible robot learning across platforms.
comment: Code: https://github.com/Tian-Nian/control_your_robot
Fostering Robots: A Governance-First Conceptual Framework for Domestic, Curriculum-Based Trajectory Collection
We propose a conceptual, empirically testable framework for Robot Fostering, -a curriculum-driven, governance-first approach to domestic robot deployments, emphasizing long-term, curated interaction trajectories. We formalize trajectory quality with quantifiable metrics and evaluation protocols aligned with EU-grade governance standards, delineating a low-resource empirical roadmap to enable rigorous validation through future pilot studies.
comment: 7 pages, 2 figures
High-Precision Climbing Robot Localization Using Planar Array UWB/GPS/IMU/Barometer Integration
To address the need for high-precision localization of climbing robots in complex high-altitude environments, this paper proposes a multi-sensor fusion system that overcomes the limitations of single-sensor approaches. Firstly, the localization scenarios and the problem model are analyzed. An integrated architecture of Attention Mechanism-based Fusion Algorithm (AMFA) incorporating planar array Ultra-Wideband (UWB), GPS, Inertial Measurement Unit (IMU), and barometer is designed to handle challenges such as GPS occlusion and UWB Non-Line-of-Sight (NLOS) problem. Then, End-to-end neural network inference models for UWB and barometer are developed, along with a multimodal attention mechanism for adaptive data fusion. An Unscented Kalman Filter (UKF) is applied to refine the trajectory, improving accuracy and robustness. Finally, real-world experiments show that the method achieves 0.48 m localization accuracy and lower MAX error of 1.50 m, outperforming baseline algorithms such as GPS/INS-EKF and demonstrating stronger robustness.
Sequence Pathfinder for Multi-Agent Pickup and Delivery in the Warehouse
Multi-Agent Pickup and Delivery (MAPD) is a challenging extension of Multi-Agent Path Finding (MAPF), where agents are required to sequentially complete tasks with fixed-location pickup and delivery demands. Although learning-based methods have made progress in MAPD, they often perform poorly in warehouse-like environments with narrow pathways and long corridors when relying only on local observations for distributed decision-making. Communication learning can alleviate the lack of global information but introduce high computational complexity due to point-to-point communication. To address this challenge, we formulate MAPF as a sequence modeling problem and prove that path-finding policies under sequence modeling possess order-invariant optimality, ensuring its effectiveness in MAPD. Building on this, we propose the Sequential Pathfinder (SePar), which leverages the Transformer paradigm to achieve implicit information exchange, reducing decision-making complexity from exponential to linear while maintaining efficiency and global awareness. Experiments demonstrate that SePar consistently outperforms existing learning-based methods across various MAPF tasks and their variants, and generalizes well to unseen environments. Furthermore, we highlight the necessity of integrating imitation learning in complex maps like warehouses.
comment: Preprint Under Review
LocoFormer: Generalist Locomotion via Long-context Adaptation
Modern locomotion controllers are manually tuned for specific embodiments. We present LocoFormer, a generalist omni-bodied locomotion model that can control previously unseen legged and wheeled robots, even without precise knowledge of their kinematics. LocoFormer is able to adapt to changes in morphology and dynamics at test time. We find that two key choices enable adaptation. First, we train massive scale RL on procedurally generated robots with aggressive domain randomization. Second, in contrast to previous policies that are myopic with short context lengths, we extend context by orders of magnitude to span episode boundaries. We deploy the same LocoFormer to varied robots and show robust control even with large disturbances such as weight change and motor failures. In extreme scenarios, we see emergent adaptation across episodes, LocoFormer learns from falls in early episodes to improve control strategies in later ones. We believe that this simple, yet general recipe can be used to train foundation models for other robotic skills in the future. Videos at generalist-locomotion.github.io.
comment: Accepted to CoRL 2025
GRS-SLAM3R: Real-Time Dense SLAM with Gated Recurrent State
DUSt3R-based end-to-end scene reconstruction has recently shown promising results in dense visual SLAM. However, most existing methods only use image pairs to estimate pointmaps, overlooking spatial memory and global consistency.To this end, we introduce GRS-SLAM3R, an end-to-end SLAM framework for dense scene reconstruction and pose estimation from RGB images without any prior knowledge of the scene or camera parameters. Unlike existing DUSt3R-based frameworks, which operate on all image pairs and predict per-pair point maps in local coordinate frames, our method supports sequentialized input and incrementally estimates metric-scale point clouds in the global coordinate. In order to improve consistent spatial correlation, we use a latent state for spatial memory and design a transformer-based gated update module to reset and update the spatial memory that continuously aggregates and tracks relevant 3D information across frames. Furthermore, we partition the scene into submaps, apply local alignment within each submap, and register all submaps into a common world frame using relative constraints, producing a globally consistent map. Experiments on various datasets show that our framework achieves superior reconstruction accuracy while maintaining real-time performance.
FastViDAR: Real-Time Omnidirectional Depth Estimation via Alternative Hierarchical Attention
In this paper we propose FastViDAR, a novel framework that takes four fisheye camera inputs and produces a full $360^\circ$ depth map along with per-camera depth, fusion depth, and confidence estimates. Our main contributions are: (1) We introduce Alternative Hierarchical Attention (AHA) mechanism that efficiently fuses features across views through separate intra-frame and inter-frame windowed self-attention, achieving cross-view feature mixing with reduced overhead. (2) We propose a novel ERP fusion approach that projects multi-view depth estimates to a shared equirectangular coordinate system to obtain the final fusion depth. (3) We generate ERP image-depth pairs using HM3D and 2D3D-S datasets for comprehensive evaluation, demonstrating competitive zero-shot performance on real datasets while achieving up to 20 FPS on NVIDIA Orin NX embedded hardware. Project page: \href{https://3f7dfc.github.io/FastVidar/}{https://3f7dfc.github.io/FastVidar/}
DA-MMP: Learning Coordinated and Accurate Throwing with Dynamics-Aware Motion Manifold Primitives
Dynamic manipulation is a key capability for advancing robot performance, enabling skills such as tossing. While recent learning-based approaches have pushed the field forward, most methods still rely on manually designed action parameterizations, limiting their ability to produce the highly coordinated motions required in complex tasks. Motion planning can generate feasible trajectories, but the dynamics gap-stemming from control inaccuracies, contact uncertainties, and aerodynamic effects-often causes large deviations between planned and executed trajectories. In this work, we propose Dynamics-Aware Motion Manifold Primitives (DA-MMP), a motion generation framework for goal-conditioned dynamic manipulation, and instantiate it on a challenging real-world ring-tossing task. Our approach extends motion manifold primitives to variable-length trajectories through a compact parametrization and learns a high-quality manifold from a large-scale dataset of planned motions. Building on this manifold, a conditional flow matching model is trained in the latent space with a small set of real-world trials, enabling the generation of throwing trajectories that account for execution dynamics. Experiments show that our method can generate coordinated and smooth motion trajectories for the ring-tossing task. In real-world evaluations, it achieves high success rates and even surpasses the performance of trained human experts. Moreover, it generalizes to novel targets beyond the training range, indicating that it successfully learns the underlying trajectory-dynamics mapping.
MDCPP: Multi-robot Dynamic Coverage Path Planning for Workload Adaptation
Multi-robot Coverage Path Planning (MCPP) addresses the problem of computing paths for multiple robots to effectively cover a large area of interest. Conventional approaches to MCPP typically assume that robots move at fixed velocities, which is often unrealistic in real-world applications where robots must adapt their speeds based on the specific coverage tasks assigned to them.Consequently, conventional approaches often lead to imbalanced workload distribution among robots and increased completion time for coverage tasks. To address this, we introduce a novel Multi-robot Dynamic Coverage Path Planning (MDCPP) algorithm for complete coverage in two-dimensional environments. MDCPP dynamically estimates each robot's remaining workload by approximating the target distribution with Gaussian mixture models, and assigns coverage regions using a capacity-constrained Voronoi diagram. We further develop a distributed implementation of MDCPP for range-constrained robotic networks. Simulation results validate the efficacy of MDCPP, showing qualitative improvements and superior performance compared to an existing sweeping algorithm, and a quantifiable impact of communication range on coverage efficiency.
Certifiably Optimal State Estimation and Robot Calibration Using Trace-Constrained SDP
Many nonconvex problems in robotics can be relaxed into convex formulations via semidefinite programming (SDP), which offers the advantage of global optimality. The practical quality of these solutions, however, critically depends on achieving rank-1 matrices, a condition that typically requires additional tightening. In this work, we focus on trace-constrained SDPs, where the decision variables are positive semidefinite (PSD) matrices with fixed trace values. These additional constraints not only capture important structural properties but also facilitate first-order methods for recovering rank-1 solutions. We introduce customized fixed-trace variables and constraints to represent common robotic quantities such as rotations and translations, which can be exactly recovered when the corresponding variables are rank-1. To further improve practical performance, we develop a gradient-based refinement procedure that projects relaxed SDP solutions toward rank-1, low-cost candidates, which can then be certified for global optimality via the dual problem. We demonstrate that many robotics tasks can be expressed within this trace-constrained SDP framework, and showcase its effectiveness through simulations in perspective-n-point (PnP) estimation, hand-eye calibration, and dual-robot system calibration. To support broader use, we also introduce a modular ``virtual robot'' abstraction that simplifies modeling across different problem settings.
Focusing on What Matters: Object-Agent-centric Tokenization for Vision Language Action models
Vision-Language-Action (VLA) models offer a pivotal approach to learning robotic manipulation at scale by repurposing large pre-trained Vision-Language-Models (VLM) to output robotic actions. However, adapting VLMs for robotic domains comes with an unnecessarily high computational cost, which we attribute to the tokenization scheme of visual inputs. In this work, we aim to enable efficient VLA training by proposing Oat-VLA, an Object-Agent-centric Tokenization for VLAs. Building on the insights of object-centric representation learning, our method introduces an inductive bias towards scene objects and the agent's own visual information. As a result, we find that Oat-VLA can drastically reduce the number of visual tokens to just a few tokens without sacrificing performance. We reveal that Oat-VLA converges at least twice as fast as OpenVLA on the LIBERO suite, as well as outperform OpenVLA in diverse real-world pick and place tasks.
comment: Presented at 9th Conference on Robot Learning (CoRL 2025), Seoul, Korea
HeLoM: Hierarchical Learning for Whole-Body Loco-Manipulation in Hexapod Robot
Robots in real-world environments are often required to move/manipulate objects comparable in weight to their own bodies. Compared to grasping and carrying, pushing provides a more straightforward and efficient non-prehensile manipulation strategy, avoiding complex grasp design while leveraging direct contact to regulate an object's pose. Achieving effective pushing, however, demands both sufficient manipulation forces and the ability to maintain stability, which is particularly challenging when dealing with heavy or irregular objects. To address these challenges, we propose HeLoM, a learning-based hierarchical whole-body manipulation framework for a hexapod robot that exploits coordinated multi-limb control. Inspired by the cooperative strategies of multi-legged insects, our framework leverages redundant contact points and high degrees of freedom to enable dynamic redistribution of contact forces. HeLoM's high-level planner plans pushing behaviors and target object poses, while its low-level controller maintains locomotion stability and generates dynamically consistent joint actions. Our policies trained in simulation are directly deployed on real robots without additional fine-tuning. This design allows the robot to maintain balance while exerting continuous and controllable pushing forces through coordinated foreleg interaction and supportive hind-leg propulsion. We validate the effectiveness of HeLoM through both simulation and real-world experiments. Results show that our framework can stably push boxes of varying sizes and unknown physical properties to designated goal poses in the real world.
KiVi: Kinesthetic-Visuospatial Integration for Dynamic and Safe Egocentric Legged Locomotion
Vision-based locomotion has shown great promise in enabling legged robots to perceive and adapt to complex environments. However, visual information is inherently fragile, being vulnerable to occlusions, reflections, and lighting changes, which often cause instability in locomotion. Inspired by animal sensorimotor integration, we propose KiVi, a Kinesthetic-Visuospatial integration framework, where kinesthetics encodes proprioceptive sensing of body motion and visuospatial reasoning captures visual perception of surrounding terrain. Specifically, KiVi separates these pathways, leveraging proprioception as a stable backbone while selectively incorporating vision for terrain awareness and obstacle avoidance. This modality-balanced, yet integrative design, combined with memory-enhanced attention, allows the robot to robustly interpret visual cues while maintaining fallback stability through proprioception. Extensive experiments show that our method enables quadruped robots to stably traverse diverse terrains and operate reliably in unstructured outdoor environments, remaining robust to out-of-distribution (OOD) visual noise and occlusion unseen during training, thereby highlighting its effectiveness and applicability to real-world legged locomotion.
Color-Pair Guided Robust Zero-Shot 6D Pose Estimation and Tracking of Cluttered Objects on Edge Devices
Robust 6D pose estimation of novel objects under challenging illumination remains a significant challenge, often requiring a trade-off between accurate initial pose estimation and efficient real-time tracking. We present a unified framework explicitly designed for efficient execution on edge devices, which synergizes a robust initial estimation module with a fast motion-based tracker. The key to our approach is a shared, lighting-invariant color-pair feature representation that forms a consistent foundation for both stages. For initial estimation, this feature facilitates robust registration between the live RGB-D view and the object's 3D mesh. For tracking, the same feature logic validates temporal correspondences, enabling a lightweight model to reliably regress the object's motion. Extensive experiments on benchmark datasets demonstrate that our integrated approach is both effective and robust, providing competitive pose estimation accuracy while maintaining high-fidelity tracking even through abrupt pose changes.
From Static to Dynamic: a Survey of Topology-Aware Perception in Autonomous Driving
The key to achieving autonomous driving lies in topology-aware perception, the structured understanding of the driving environment with an emphasis on lane topology and road semantics. This survey systematically reviews four core research directions under this theme: vectorized map construction, topological structure modeling, prior knowledge fusion, and language model-based perception. Across these directions, we observe a unifying trend: a paradigm shift from static, pre-built maps to dynamic, sensor-driven perception. Specifically, traditional static maps have provided semantic context for autonomous systems. However, they are costly to construct, difficult to update in real time, and lack generalization across regions, limiting their scalability. In contrast, dynamic representations leverage on-board sensor data for real-time map construction and topology reasoning. Each of the four research directions contributes to this shift through compact spatial modeling, semantic relational reasoning, robust domain knowledge integration, and multimodal scene understanding powered by pre-trained language models. Together, they pave the way for more adaptive, scalable, and explainable autonomous driving systems.
comment: 13 pages, 3 figures
Encoding Material Safety using Control Barrier Functions for Soft Actuator Control
Until recently, the concept of soft robot safety was an informal notion, often attributed solely to the fact that soft robots are less likely to damage their operating environment than rigid robots. As the field moves toward feedback control for practical applications, it becomes increasingly important to define what safety means and to characterize how soft robots can become unsafe. The unifying theme of soft robotics is to achieve useful functionality through deformation. Consequently, limitations in constitutive model accuracy and risks of material failure are inherent to all soft robots and pose a key challenge in designing provably safe controllers. This work introduces a formal definition of material safety based on strain energy functions and provides a controller that enforces it. We characterize safe and unsafe sets of an incompressible hyperelastic material and demonstrate that safety can be enforced using a high-order control barrier function (HOCBF) with quadratic program-based feedback control. As a case study, we consider a pressurized hyperelastic tube with inertial effects, first-order viscous effects, and full-state feedback. Simulation results verify that the proposed methodology can enforce the material safety specification.
comment: 8 pages, 5 figures
Generalizable Coarse-to-Fine Robot Manipulation via Language-Aligned 3D Keypoints
Hierarchical coarse-to-fine policy, where a coarse branch predicts a region of interest to guide a fine-grained action predictor, has demonstrated significant potential in robotic 3D manipulation tasks by especially enhancing sample efficiency and enabling more precise manipulation. However, even augmented with pre-trained models, these hierarchical policies still suffer from generalization issues. To enhance generalization to novel instructions and environment variations, we propose Coarse-to-fine Language-Aligned manipulation Policy (CLAP), a framework that integrates three key components: 1) task decomposition, 2) VLM fine-tuning for 3D keypoint prediction, and 3) 3D-aware representation. Through comprehensive experiments in simulation and on a real robot, we demonstrate its superior generalization capability. Specifically, on GemBench, a benchmark designed for evaluating generalization, our approach achieves a 12\% higher average success rate than the SOTA method while using only 1/5 of the training trajectories. In real-world experiments, our policy, trained on only 10 demonstrations, successfully generalizes to novel instructions and environments.
GES-UniGrasp: A Two-Stage Dexterous Grasping Strategy With Geometry-Based Expert Selection
Robust and human-like dexterous grasping of general objects is a critical capability for advancing intelligent robotic manipulation in real-world scenarios. However, existing reinforcement learning methods guided by grasp priors often result in unnatural behaviors. In this work, we present \textit{ContactGrasp}, a robotic dexterous pre-grasp and grasp dataset that explicitly accounts for task-relevant wrist orientation and thumb-index pinching coordination. The dataset covers 773 objects in 82 categories, providing a rich foundation for training human-like grasp strategies. Building upon this dataset, we perform geometry-based clustering to group objects by shape, enabling a two-stage Geometry-based Expert Selection (GES) framework that selects among specialized experts for grasping diverse object geometries, thereby enhancing adaptability to diverse shapes and generalization across categories. Our approach demonstrates natural grasp postures and achieves high success rates of 99.4\% and 96.3\% on the train and test sets, respectively, showcasing strong generalization and high-quality grasp execution.
RAVEN: Resilient Aerial Navigation via Open-Set Semantic Memory and Behavior Adaptation
Aerial outdoor semantic navigation requires robots to explore large, unstructured environments to locate target objects. Recent advances in semantic navigation have demonstrated open-set object-goal navigation in indoor settings, but these methods remain limited by constrained spatial ranges and structured layouts, making them unsuitable for long-range outdoor search. While outdoor semantic navigation approaches exist, they either rely on reactive policies based on current observations, which tend to produce short-sighted behaviors, or precompute scene graphs offline for navigation, limiting adaptability to online deployment. We present RAVEN, a 3D memory-based, behavior tree framework for aerial semantic navigation in unstructured outdoor environments. It (1) uses a spatially consistent semantic voxel-ray map as persistent memory, enabling long-horizon planning and avoiding purely reactive behaviors, (2) combines short-range voxel search and long-range ray search to scale to large environments, (3) leverages a large vision-language model to suggest auxiliary cues, mitigating sparsity of outdoor targets. These components are coordinated by a behavior tree, which adaptively switches behaviors for robust operation. We evaluate RAVEN in 10 photorealistic outdoor simulation environments over 100 semantic tasks, encompassing single-object search, multi-class, multi-instance navigation and sequential task changes. Results show RAVEN outperforms baselines by 85.25% in simulation and demonstrate its real-world applicability through deployment on an aerial robot in outdoor field tests.
High Torque Density PCB Axial Flux Permanent Magnet Motor for Micro Robots
Quasi-direct-drive (QDD) actuation is transforming legged and manipulator robots by eliminating high-ratio gearboxes, yet it demands motors that deliver very high torque at low speed within a thin, disc-shaped joint envelope. Axial-flux permanent-magnet (AFPM) machines meet these geometric and torque requirements, but scaling them below a 20mm outer diameter is hampered by poor copper fill in conventional wound stators, inflating resistance and throttling continuous torque. This paper introduces a micro-scale AFPM motor that overcomes these limitations through printed-circuit-board (PCB) windings fabricated with advanced IC-substrate high-density interconnect (HDI) technology. The resulting 48-layer stator-formed by stacking four 12-layer HDI modules-achieves a record 45\% copper fill in a package only 5mm thick and 19mm in diameter. We perform comprehensive electromagnetic and thermal analyses to inform the motor design, then fabricate a prototype whose performance characteristics are experimentally verified.
Zero-shot Whole-Body Manipulation with a Large-Scale Soft Robotic Torso via Guided Reinforcement Learning
Whole-body manipulation is a powerful yet underexplored approach that enables robots to interact with large, heavy, or awkward objects using more than just their end-effectors. Soft robots, with their inherent passive compliance, are particularly well-suited for such contact-rich manipulation tasks, but their uncertainties in kinematics and dynamics pose significant challenges for simulation and control. In this work, we address this challenge with a simulation that can run up to 350x real time on a single thread in MuJoCo and provide a detailed analysis of the critical tradeoffs between speed and accuracy for this simulation. Using this framework, we demonstrate a successful zero-shot sim-to-real transfer of a learned whole-body manipulation policy, achieving an 88% success rate on the Baloo hardware platform. We show that guiding RL with a simple motion primitive is critical to this success where standard reward shaping methods struggled to produce a stable and successful policy for whole-body manipulation. Furthermore, our analysis reveals that the learned policy does not simply mimic the motion primitive. It exhibits beneficial reactive behavior, such as re-grasping and perturbation recovery. We analyze and contrast this learned policy against an open-loop baseline to show that the policy can also exhibit aggressive over-corrections under perturbation. To our knowledge, this is the first demonstration of forceful, six-DoF whole-body manipulation using two continuum soft arms on a large-scale platform (10 kg payloads), with zero-shot policy transfer.
comment: Submitted to IEEE Transactions on Robotics for review
Neural-Augmented Kelvinlet for Real-Time Soft Tissue Deformation Modeling
Accurate and efficient modeling of soft-tissue interactions is fundamental for advancing surgical simulation, surgical robotics, and model-based surgical automation. To achieve real-time latency, classical Finite Element Method (FEM) solvers are often replaced with neural approximations; however, naively training such models in a fully data-driven manner without incorporating physical priors frequently leads to poor generalization and physically implausible predictions. We present a novel physics-informed neural simulation framework that enables real-time prediction of soft-tissue deformations under complex single- and multi-grasper interactions. Our approach integrates Kelvinlet-based analytical priors with large-scale FEM data, capturing both linear and nonlinear tissue responses. This hybrid design improves predictive accuracy and physical plausibility across diverse neural architectures while maintaining the low-latency performance required for interactive applications. We validate our method on challenging surgical manipulation tasks involving standard laparoscopic grasping tools, demonstrating substantial improvements in deformation fidelity and temporal stability over existing baselines. These results establish Kelvinlet-augmented learning as a principled and computationally efficient paradigm for real-time, physics-aware soft-tissue simulation in surgical AI.
GLIDE: A Coordinated Aerial-Ground Framework for Search and Rescue in Unknown Environments
We present a cooperative aerial-ground search-and-rescue (SAR) framework that pairs two unmanned aerial vehicles (UAVs) with an unmanned ground vehicle (UGV) to achieve rapid victim localization and obstacle-aware navigation in unknown environments. We dub this framework Guided Long-horizon Integrated Drone Escort (GLIDE), highlighting the UGV's reliance on UAV guidance for long-horizon planning. In our framework, a goal-searching UAV executes real-time onboard victim detection and georeferencing to nominate goals for the ground platform, while a terrain-scouting UAV flies ahead of the UGV's planned route to provide mid-level traversability updates. The UGV fuses aerial cues with local sensing to perform time-efficient A* planning and continuous replanning as information arrives. Additionally, we present a hardware demonstration (using a GEM e6 golf cart as the UGV and two X500 UAVs) to evaluate end-to-end SAR mission performance and include simulation ablations to assess the planning stack in isolation from detection. Empirical results demonstrate that explicit role separation across UAVs, coupled with terrain scouting and guided planning, improves reach time and navigation safety in time-critical SAR missions.
HUNT: High-Speed UAV Navigation and Tracking in Unstructured Environments via Instantaneous Relative Frames
Search and rescue operations require unmanned aerial vehicles to both traverse unknown unstructured environments at high speed and track targets once detected. Achieving both capabilities under degraded sensing and without global localization remains an open challenge. Recent works on relative navigation have shown robust tracking by anchoring planning and control to a visible detected object, but cannot address navigation when no target is in the field of view. We present HUNT (High-speed UAV Navigation and Tracking), a real-time framework that unifies traversal, acquisition, and tracking within a single relative formulation. HUNT defines navigation objectives directly from onboard instantaneous observables such as attitude, altitude, and velocity, enabling reactive high-speed flight during search. Once a target is detected, the same perception-control pipeline transitions seamlessly to tracking. Outdoor experiments in dense forests, container compounds, and search-and-rescue operations with vehicles and mannequins demonstrate robust autonomy where global methods fail.
TReF-6: Inferring Task-Relevant Frames from a Single Demonstration for One-Shot Skill Generalization
Robots often struggle to generalize from a single demonstration due to the lack of a transferable and interpretable spatial representation. In this work, we introduce TReF-6, a method that infers a simplified, abstracted 6DoF Task-Relevant Frame from a single trajectory. Our approach identifies an influence point purely from the trajectory geometry to define the origin for a local frame, which serves as a reference for parameterizing a Dynamic Movement Primitive (DMP). This influence point captures the task's spatial structure, extending the standard DMP formulation beyond start-goal imitation. The inferred frame is semantically grounded via a vision-language model and localized in novel scenes by Grounded-SAM, enabling functionally consistent skill generalization. We validate TReF-6 in simulation and demonstrate robustness to trajectory noise. We further deploy an end-to-end pipeline on real-world manipulation tasks, showing that TReF-6 supports one-shot imitation learning that preserves task intent across diverse object configurations.
Symbolic Imitation Learning: From Black-Box to Explainable Driving Policies
Current imitation learning approaches, predominantly based on deep neural networks (DNNs), offer efficient mechanisms for learning driving policies from real-world datasets. However, they suffer from inherent limitations in interpretability and generalizability--issues of critical importance in safety-critical domains such as autonomous driving. In this paper, we introduce Symbolic Imitation Learning (SIL), a novel framework that leverages Inductive Logic Programming (ILP) to derive explainable and generalizable driving policies from synthetic datasets. We evaluate SIL on real-world HighD and NGSim datasets, comparing its performance with state-of-the-art neural imitation learning methods using metrics such as collision rate, lane change efficiency, and average speed. The results indicate that SIL significantly enhances policy transparency while maintaining strong performance across varied driving conditions. These findings highlight the potential of integrating ILP into imitation learning to promote safer and more reliable autonomous systems.
comment: 24 pages, 4 figures, 4 tables
Robot Navigation with Entity-Based Collision Avoidance using Deep Reinforcement Learning
Efficient navigation in dynamic environments is crucial for autonomous robots interacting with moving agents and static obstacles. We present a novel deep reinforcement learning approach that improves robot navigation and interaction with different types of agents and obstacles based on specific safety requirements. Our approach uses information about the entity types, improving collision avoidance and ensuring safer navigation. We introduce a new reward function that penalizes the robot for being close to or colliding with different entities such as adults, bicyclists, children, and static obstacles, while also encouraging the robot's progress toward the goal. We propose an optimized algorithm that significantly accelerates the training, validation, and testing phases, enabling efficient learning in complex environments. Comprehensive experiments demonstrate that our approach consistently outperforms state-of-the-art navigation and collision avoidance methods.
comment: 15 pages, 4 figures
An Empirical Study on the Computation Budget of Co-Optimization of Robot Design and Control in Simulation
The design (shape) of a robot is usually decided before the control is implemented. This might limit how well the design is adapted to a task, as the suitability of the design is given by how well the robot performs in the task, which requires both a design and a controller. The co-optimization or simultaneous optimization of the design and control of robots addresses this limitation by producing a design and control that are both adapted to the task. This paper investigates some of the challenges inherent in the co-optimization of design and control in simulation. The results show that reducing how well the controllers are trained during the co-optimization process significantly improves the robot's performance when considering a second phase in which the controller for the best design is retrained with additional resources. In addition, the results demonstrate that the computation budget allocated to training the controller for each design influences design complexity, with simpler designs associated with lower training budgets. This paper experimentally studies key questions discussed in other works in the literature on the co-optimization of design and control of robots in simulation in four different co-optimization problems.
Self-Supervised Geometry-Guided Initialization for Robust Monocular Visual Odometry
Monocular visual odometry is a key technology in various autonomous systems. Traditional feature-based methods suffer from failures due to poor lighting, insufficient texture, and large motions. In contrast, recent learning-based dense SLAM methods exploit iterative dense bundle adjustment to address such failure cases, and achieve robust and accurate localization in a wide variety of real environments, without depending on domain-specific supervision. However, despite its potential, the methods still struggle with scenarios involving large motion and object dynamics. In this study, we diagnose key weaknesses in a popular learning-based dense SLAM model (DROID-SLAM) by analyzing major failure cases on outdoor benchmarks and exposing various shortcomings of its optimization process. We then propose the use of self-supervised priors leveraging a frozen large-scale pre-trained monocular depth estimator to initialize the dense bundle adjustment process, leading to robust visual odometry without the need to fine-tune the SLAM backbone. Despite its simplicity, the proposed method demonstrates significant improvements on KITTI odometry, as well as the challenging DDAD benchmark.
comment: Project page: https://toyotafrc.github.io/SGInit-Proj/
TreeIRL: Safe Urban Driving with Tree Search and Inverse Reinforcement Learning
We present TreeIRL, a novel planner for autonomous driving that combines Monte Carlo tree search (MCTS) and inverse reinforcement learning (IRL) to achieve state-of-the-art performance in simulation and in real-world driving. The core idea is to use MCTS to find a promising set of safe candidate trajectories and a deep IRL scoring function to select the most human-like among them. We evaluate TreeIRL against both classical and state-of-the-art planners in large-scale simulations and on 500+ miles of real-world autonomous driving in the Las Vegas metropolitan area. Test scenarios include dense urban traffic, adaptive cruise control, cut-ins, and traffic lights. TreeIRL achieves the best overall performance, striking a balance between safety, progress, comfort, and human-likeness. To our knowledge, our work is the first demonstration of MCTS-based planning on public roads and underscores the importance of evaluating planners across a diverse set of metrics and in real-world environments. TreeIRL is highly extensible and could be further improved with reinforcement learning and imitation learning, providing a framework for exploring different combinations of classical and learning-based approaches to solve the planning bottleneck in autonomous driving.
Training Tactile Sensors to Learn Force Sensing from Each Other
Humans achieve stable and dexterous object manipulation by coordinating grasp forces across multiple fingers and palms, facilitated by a unified tactile memory system in the somatosensory cortex. This system encodes and stores tactile experiences across skin regions, enabling the flexible reuse and transfer of touch information. Inspired by this biological capability, we present GenForce, the first framework that enables transferable force sensing across tactile sensors in robotic hands. GenForce unifies tactile signals into shared marker representations, analogous to cortical sensory encoding, allowing force prediction models trained on one sensor to be transferred to others without the need for exhaustive force data collection. We demonstrate that GenForce generalizes across both homogeneous sensors with varying configurations and heterogeneous sensors with distinct sensing modalities and material properties. This transferable force sensing is also demonstrated with high performance in robot force control including daily object grasping, slip detection and avoidance. Our results highlight a scalable paradigm for cross-sensor robotic tactile learning, offering new pathways toward adaptable and tactile memory-driven manipulation in unstructured environments.
AgentThink: A Unified Framework for Tool-Augmented Chain-of-Thought Reasoning in Vision-Language Models for Autonomous Driving
Vision-Language Models (VLMs) show promise for autonomous driving, yet their struggle with hallucinations, inefficient reasoning, and limited real-world validation hinders accurate perception and robust step-by-step reasoning. To overcome this, we introduce \textbf{AgentThink}, a pioneering unified framework that integrates Chain-of-Thought (CoT) reasoning with dynamic, agent-style tool invocation for autonomous driving tasks. AgentThink's core innovations include: \textbf{(i) Structured Data Generation}, which establishes an autonomous driving tool library to automatically construct structured, self-verified reasoning data explicitly incorporating tool usage for diverse driving scenarios; \textbf{(ii) A Two-stage Training Pipeline}, employing Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO) to equip VLMs with the capability for autonomous tool invocation; and \textbf{(iii) Agent-style Tool-Usage Evaluation}, introducing a novel multi-tool assessment protocol to rigorously evaluate the model's tool invocation and utilization. Experiments on the DriveLMM-o1 benchmark demonstrate that AgentThink significantly boosts overall reasoning scores by \textbf{53.91%} and enhances answer accuracy by \textbf{33.54%}, while markedly improving reasoning quality and consistency. Furthermore, ablation studies and robust zero-shot/few-shot generalization experiments across various benchmarks underscore its powerful capabilities. These findings highlight a promising trajectory for developing trustworthy and tool-aware autonomous driving models. Code is available at https://github.com/curryqka/AgentThink.
comment: 19 pages, 8 figures
Follow-Bench: A Unified Motion Planning Benchmark for Socially-Aware Robot Person Following
Robot person following (RPF) -- mobile robots that follow and assist a specific person -- has emerging applications in personal assistance, security patrols, eldercare, and logistics. To be effective, such robots must follow the target while ensuring safety and comfort for both the target and surrounding people. In this work, we present the first end-to-end study of RPF, which (i) surveys representative scenarios, motion-planning methods, and evaluation metrics with a focus on safety and comfort; (ii) introduces Follow-Bench, a unified benchmark simulating diverse scenarios, including various target trajectory patterns, dynamic-crowd flows, and environmental layouts; and (iii) re-implements six popular RPF planners, ensuring that both safety and comfort are systematically considered. Moreover, we evaluate the two highest-performing planners from our benchmark on a differential-drive robot to provide insights into real-world deployment. Extensive simulation and real-world experiments provide quantitative insights into the safety-comfort trade-offs of existing planners, while revealing open challenges and future research directions.
comment: Project page: https://follow-bench.github.io/
Vidar: Embodied Video Diffusion Model for Generalist Manipulation
Scaling general-purpose manipulation to new robot embodiments remains challenging: each platform typically needs large, homogeneous demonstrations, and pixel-to-action VLA pipelines typically degenerate under background and viewpoint shifts. In this paper, we present Vidar, a prior-driven, low-shot adaptation paradigm that replaces most embodiment-specific data with transferable video priors. Vidar consists of an embodied video diffusion model as the generalizable prior and a masked inverse dynamics model (MIDM) adapter based on a key decoupling of the policy. The embodied diffusion model is pre-trained on Internet-scale videos and then domain-adapted to 750K multi-view trajectories from three real-world robot platforms using a unified observation space encoding robot, camera, task, and scene contexts. The MIDM module learns action-relevant pixel masks without dense labels, grounding the prior into the target embodiment's action space while suppressing distractors. Crucially, the generative video prior models the distribution of plausible, temporally coherent interactions, implicitly capturing affordances, contact dynamics, and physical consistency from massive unlabeled video. This shifts the challenge from collecting large amounts of new robot data to efficiently aligning a rich prior with a new embodiment. With only 20 minutes of human demonstrations on an unseen robot (1% of typical data), Vidar outperforms state-of-the-art VLA baselines and generalizes to unseen tasks, backgrounds, and camera layouts. Our results suggest a scalable recipe for "one prior, many embodiments": strong, inexpensive video priors + minimal on-robot alignment.
Autonomous Close-Proximity Photovoltaic Panel Coating Using a Quadcopter
Photovoltaic (PV) panels are becoming increasingly widespread in the domain of renewable energy, and thus, small efficiency gains can have massive effects. Anti-reflective and self-cleaning coatings enhance panel performance but degrade over time, requiring periodic reapplication. Uncrewed Aerial Vehicles (UAVs) offer a flexible and autonomous way to apply protective coatings more often and at lower cost compared to traditional manual coating methods. In this letter, we propose a quadcopter-based system, equipped with a liquid dispersion mechanism, designed to automate such tasks. The localization stack only uses onboard sensors, relying on visual-inertial odometry and the relative position of the PV panel detected with respect to the quadcopter. The control relies on a model-based controller that accounts for the ground effect and the mass decrease of the quadcopter during liquid dispersion. We validate the autonomy capabilities of our system through extensive indoor and outdoor experiments.
comment: 7 pages, 10 figures. Submitted to IEEE RA-L
Multiagent Systems
CORRECT: COndensed eRror RECognition via knowledge Transfer in multi-agent systems
Multi-agent systems (MAS) are increasingly capable of tackling complex real-world tasks, yet their reliance on inter-agent coordination, tool use, and long-horizon reasoning makes error recognition particularly challenging. Minor errors can propagate across agents, escalating into task failures while producing long, intertwined execution trajectories that impose significant costs for both human developers and automated systems to debug and analyze. Our key insight is that, despite surface differences in failure trajectories (e.g., logs), MAS errors often recur with similar structural patterns. This paper presents CORRECT, the first lightweight, training-free framework that leverages an online cache of distilled error schemata to recognize and transfer knowledge of failure structures across new requests. This cache-based reuse allows LLMs to perform targeted error localization at inference time, avoiding the need for expensive retraining while adapting to dynamic MAS deployments in subseconds. To support rigorous study in this domain, we also introduce CORRECT-Error, a large-scale dataset of over 2,000 annotated trajectories collected through a novel error-injection pipeline guided by real-world distributions, and further validated through human evaluation to ensure alignment with natural failure patterns. Experiments across seven diverse MAS applications show that CORRECT improves step-level error localization up to 19.8% over existing advances while at near-zero overhead, substantially narrowing the gap between automated and human-level error recognition.
TeraAgent: A Distributed Agent-Based Simulation Engine for Simulating Half a Trillion Agents
Agent-based simulation is an indispensable paradigm for studying complex systems. These systems can comprise billions of agents, requiring the computing resources of multiple servers to simulate. Unfortunately, the state-of-the-art platform, BioDynaMo, does not scale out across servers due to its shared-memory-based implementation. To overcome this key limitation, we introduce TeraAgent, a distributed agent-based simulation engine. A critical challenge in distributed execution is the exchange of agent information across servers, which we identify as a major performance bottleneck. We propose two solutions: 1) a tailored serialization mechanism that allows agents to be accessed and mutated directly from the receive buffer, and 2) leveraging the iterative nature of agent-based simulations to reduce data transfer with delta encoding. Built on our solutions, TeraAgent enables extreme-scale simulations with half a trillion agents (an 84x improvement), reduces time-to-result with additional compute nodes, improves interoperability with third-party tools, and provides users with more hardware flexibility.
PartnerMAS: An LLM Hierarchical Multi-Agent Framework for Business Partner Selection on High-Dimensional Features
High-dimensional decision-making tasks, such as business partner selection, involve evaluating large candidate pools with heterogeneous numerical, categorical, and textual features. While large language models (LLMs) offer strong in-context reasoning capabilities, single-agent or debate-style systems often struggle with scalability and consistency in such settings. We propose PartnerMAS, a hierarchical multi-agent framework that decomposes evaluation into three layers: a Planner Agent that designs strategies, Specialized Agents that perform role-specific assessments, and a Supervisor Agent that integrates their outputs. To support systematic evaluation, we also introduce a curated benchmark dataset of venture capital co-investments, featuring diverse firm attributes and ground-truth syndicates. Across 140 cases, PartnerMAS consistently outperforms single-agent and debate-based multi-agent baselines, achieving up to 10--15\% higher match rates. Analysis of agent reasoning shows that planners are most responsive to domain-informed prompts, specialists produce complementary feature coverage, and supervisors play an important role in aggregation. Our findings demonstrate that structured collaboration among LLM agents can generate more robust outcomes than scaling individual models, highlighting PartnerMAS as a promising framework for high-dimensional decision-making in data-rich domains.
GPS-MTM: Capturing Pattern of Normalcy in GPS-Trajectories with self-supervised learning
Foundation models have driven remarkable progress in text, vision, and video understanding, and are now poised to unlock similar breakthroughs in trajectory modeling. We introduce the GPSMasked Trajectory Transformer (GPS-MTM), a foundation model for large-scale mobility data that captures patterns of normalcy in human movement. Unlike prior approaches that flatten trajectories into coordinate streams, GPS-MTM decomposes mobility into two complementary modalities: states (point-of-interest categories) and actions (agent transitions). Leveraging a bi-directional Transformer with a self-supervised masked modeling objective, the model reconstructs missing segments across modalities, enabling it to learn rich semantic correlations without manual labels. Across benchmark datasets, including Numosim-LA, Urban Anomalies, and Geolife, GPS-MTM consistently outperforms on downstream tasks such as trajectory infilling and next-stop prediction. Its advantages are most pronounced in dynamic tasks (inverse and forward dynamics), where contextual reasoning is critical. These results establish GPS-MTM as a robust foundation model for trajectory analytics, positioning mobility data as a first-class modality for large-scale representation learning. Code is released for further reference.
comment: 4 pages, 2 figures
FedAgentBench: Towards Automating Real-world Federated Medical Image Analysis with Server-Client LLM Agents
Federated learning (FL) allows collaborative model training across healthcare sites without sharing sensitive patient data. However, real-world FL deployment is often hindered by complex operational challenges that demand substantial human efforts. This includes: (a) selecting appropriate clients (hospitals), (b) coordinating between the central server and clients, (c) client-level data pre-processing, (d) harmonizing non-standardized data and labels across clients, and (e) selecting FL algorithms based on user instructions and cross-client data characteristics. However, the existing FL works overlook these practical orchestration challenges. These operational bottlenecks motivate the need for autonomous, agent-driven FL systems, where intelligent agents at each hospital client and the central server agent collaboratively manage FL setup and model training with minimal human intervention. To this end, we first introduce an agent-driven FL framework that captures key phases of real-world FL workflows from client selection to training completion and a benchmark dubbed FedAgentBench that evaluates the ability of LLM agents to autonomously coordinate healthcare FL. Our framework incorporates 40 FL algorithms, each tailored to address diverse task-specific requirements and cross-client characteristics. Furthermore, we introduce a diverse set of complex tasks across 201 carefully curated datasets, simulating 6 modality-specific real-world healthcare environments, viz., Dermatoscopy, Ultrasound, Fundus, Histopathology, MRI, and X-Ray. We assess the agentic performance of 14 open-source and 10 proprietary LLMs spanning small, medium, and large model scales. While some agent cores such as GPT-4.1 and DeepSeek V3 can automate various stages of the FL pipeline, our results reveal that more complex, interdependent tasks based on implicit goals remain challenging for even the strongest models.
Sequence Pathfinder for Multi-Agent Pickup and Delivery in the Warehouse
Multi-Agent Pickup and Delivery (MAPD) is a challenging extension of Multi-Agent Path Finding (MAPF), where agents are required to sequentially complete tasks with fixed-location pickup and delivery demands. Although learning-based methods have made progress in MAPD, they often perform poorly in warehouse-like environments with narrow pathways and long corridors when relying only on local observations for distributed decision-making. Communication learning can alleviate the lack of global information but introduce high computational complexity due to point-to-point communication. To address this challenge, we formulate MAPF as a sequence modeling problem and prove that path-finding policies under sequence modeling possess order-invariant optimality, ensuring its effectiveness in MAPD. Building on this, we propose the Sequential Pathfinder (SePar), which leverages the Transformer paradigm to achieve implicit information exchange, reducing decision-making complexity from exponential to linear while maintaining efficiency and global awareness. Experiments demonstrate that SePar consistently outperforms existing learning-based methods across various MAPF tasks and their variants, and generalizes well to unseen environments. Furthermore, we highlight the necessity of integrating imitation learning in complex maps like warehouses.
comment: Preprint Under Review
SIMPOL Model for Solving Continuous-Time Heterogeneous Agent Problems
This paper presents SIMPOL (Simplified Policy Iteration), a modular numerical framework for solving continuous-time heterogeneous agent models. The core economic problem, the optimization of consumption and savings under idiosyncratic uncertainty, is formulated as a coupled system of partial differential equations: a Hamilton-Jacobi-Bellman (HJB) equation for the agent's optimal policy and a Fokker-Planck-Kolmogorov (FPK) equation for the stationary wealth distribution. SIMPOL addresses this system using Howard's policy iteration with an *upwind* finite difference scheme that guarantees stability. A distinctive contribution is a novel consumption policy post-processing module that imposes regularity through smoothing and a projection onto an economically plausible slope band, improving convergence and model behavior. The robustness and accuracy of SIMPOL are validated through a set of integrated diagnostics, including verification of contraction in the Wasserstein-2 metric and comparison with the analytical solution of the Merton model in the no-volatility case. The framework is shown to be not only computationally efficient but also to produce solutions consistent with economic and mathematical theory, offering a reliable tool for research in quantitative macroeconomics.
comment: Code available at https://doi.org/10.5281/zenodo.17216748
RADAR: A Risk-Aware Dynamic Multi-Agent Framework for LLM Safety Evaluation via Role-Specialized Collaboration
Existing safety evaluation methods for large language models (LLMs) suffer from inherent limitations, including evaluator bias and detection failures arising from model homogeneity, which collectively undermine the robustness of risk evaluation processes. This paper seeks to re-examine the risk evaluation paradigm by introducing a theoretical framework that reconstructs the underlying risk concept space. Specifically, we decompose the latent risk concept space into three mutually exclusive subspaces: the explicit risk subspace (encompassing direct violations of safety guidelines), the implicit risk subspace (capturing potential malicious content that requires contextual reasoning for identification), and the non-risk subspace. Furthermore, we propose RADAR, a multi-agent collaborative evaluation framework that leverages multi-round debate mechanisms through four specialized complementary roles and employs dynamic update mechanisms to achieve self-evolution of risk concept distributions. This approach enables comprehensive coverage of both explicit and implicit risks while mitigating evaluator bias. To validate the effectiveness of our framework, we construct an evaluation dataset comprising 800 challenging cases. Extensive experiments on our challenging testset and public benchmarks demonstrate that RADAR significantly outperforms baseline evaluation methods across multiple dimensions, including accuracy, stability, and self-evaluation risk sensitivity. Notably, RADAR achieves a 28.87% improvement in risk identification accuracy compared to the strongest baseline evaluation method.
Heterogeneous Multi-agent Collaboration in UAV-assisted Mobile Crowdsensing Networks
Unmanned aerial vehicles (UAVs)-assisted mobile crowdsensing (MCS) has emerged as a promising paradigm for data collection. However, challenges such as spectrum scarcity, device heterogeneity, and user mobility hinder efficient coordination of sensing, communication, and computation. To tackle these issues, we propose a joint optimization framework that integrates time slot partition for sensing, communication, and computation phases, resource allocation, and UAV 3D trajectory planning, aiming to maximize the amount of processed sensing data. The problem is formulated as a non-convex stochastic optimization and further modeled as a partially observable Markov decision process (POMDP) that can be solved by multi-agent deep reinforcement learning (MADRL) algorithm. To overcome the limitations of conventional multi-layer perceptron (MLP) networks, we design a novel MADRL algorithm with hybrid actor network. The newly developed method is based on heterogeneous agent proximal policy optimization (HAPPO), empowered by convolutional neural networks (CNN) for feature extraction and Kolmogorov-Arnold networks (KAN) to capture structured state-action dependencies. Extensive numerical results demonstrate that our proposed method achieves significant improvements in the amount of processed sensing data when compared with other benchmarks.
comment: 7 pages, 6 figures
When Is Diversity Rewarded in Cooperative Multi-Agent Learning?
The success of teams in robotics, nature, and society often depends on the division of labor among diverse specialists; however, a principled explanation for when such diversity surpasses a homogeneous team is still missing. Focusing on multi-agent task allocation problems, we study this question from the perspective of reward design: what kinds of objectives are best suited for heterogeneous teams? We first consider an instantaneous, non-spatial setting where the global reward is built by two generalized aggregation operators: an inner operator that maps the $N$ agents' effort allocations on individual tasks to a task score, and an outer operator that merges the $M$ task scores into the global team reward. We prove that the curvature of these operators determines whether heterogeneity can increase reward, and that for broad reward families this collapses to a simple convexity test. Next, we ask what incentivizes heterogeneity to emerge when embodied, time-extended agents must learn an effort allocation policy. To study heterogeneity in such settings, we use multi-agent reinforcement learning (MARL) as our computational paradigm, and introduce Heterogeneity Gain Parameter Search (HetGPS), a gradient-based algorithm that optimizes the parameter space of underspecified MARL environments to find scenarios where heterogeneity is advantageous. Across different environments, we show that HetGPS rediscovers the reward regimes predicted by our theory to maximize the advantage of heterogeneity, both validating HetGPS and connecting our theoretical insights to reward design in MARL. Together, these results help us understand when behavioral diversity delivers a measurable benefit.
Moving Out: Physically-grounded Human-AI Collaboration
The ability to adapt to physical actions and constraints in an environment is crucial for embodied agents (e.g., robots) to effectively collaborate with humans. Such physically grounded human-AI collaboration must account for the increased complexity of the continuous state-action space and constrained dynamics caused by physical constraints. In this paper, we introduce Moving Out, a new human-AI collaboration benchmark that resembles a wide range of collaboration modes affected by physical attributes and constraints, such as moving heavy items together and maintaining consistent actions to move a big item around a corner. Using Moving Out, we designed two tasks and collected human-human interaction data to evaluate models' abilities to adapt to diverse human behaviors and unseen physical attributes. To address the challenges in physical environments, we propose a novel method, BASS (Behavior Augmentation, Simulation, and Selection), to enhance the diversity of agents and their understanding of the outcome of actions. Our experiments show that BASS outperforms state-of-the-art models in AI-AI and human-AI collaboration. The project page is available at https://live-robotics-uva.github.io/movingout_ai/.
comment: 30 pages, 12 figures
Learning Large-Scale Competitive Team Behaviors with Mean-Field Interactions and Online Opponent Modeling
While multi-agent reinforcement learning (MARL) has been proven effective across both collaborative and competitive tasks, existing algorithms often struggle to scale to large populations of agents. Recent advancements in mean-field (MF) theory provide scalable solutions by approximating population interactions as a continuum, yet most existing frameworks focus exclusively on either fully cooperative or purely competitive settings. To bridge this gap, we introduce MF-MAPPO, a mean-field extension of PPO designed for zero-sum team games that integrate intra-team cooperation with inter-team competition. MF-MAPPO employs a shared actor and a minimally informed critic per team and is trained directly on finite-population simulators, thereby enabling deployment to realistic scenarios with thousands of agents. We further show that MF-MAPPO naturally extends to partially observable settings through a simple gradient-regularized training scheme. Our evaluation utilizes large-scale benchmark scenarios using our own testing simulation platform for MF team games (MFEnv), including offense-defense battlefield tasks as well as variants of population-based rock-paper-scissors games that admit analytical solutions, for benchmarking. Across these benchmarks, MF-MAPPO outperforms existing methods and exhibits complex, heterogeneous behaviors, demonstrating the effectiveness of combining mean-field theory and MARL techniques at scale.
Second-Order Algorithms for Finding Local Nash Equilibria in Zero-Sum Games
Zero-sum games arise in a wide variety of problems, including robust optimization and adversarial learning. However, algorithms deployed for finding a local Nash equilibrium in these games often converge to non-Nash stationary points. This highlights a key challenge: for any algorithm, the stability properties of its underlying dynamical system can cause non-Nash points to be potential attractors. To overcome this challenge, algorithms must account for subtleties involving the curvatures of players' costs. To this end, we leverage dynamical system theory and develop a second-order algorithm for finding a local Nash equilibrium in the smooth, possibly nonconvex-nonconcave, zero-sum game setting. First, we prove that this novel method guarantees convergence to only local Nash equilibria with an asymptotic local \textit{linear} convergence rate. We then interpret a version of this method as a modified Gauss-Newton algorithm with local \textit{superlinear} convergence to the neighborhood of a point that satisfies first-order local Nash equilibrium conditions. In comparison, current related state-of-the-art methods with similar guarantees do not offer convergence rates in the nonconvex-nonconcave setting. Furthermore, we show that this approach naturally generalizes to settings with convex and potentially coupled constraints while retaining earlier guarantees of convergence to only local (generalized) Nash equilibria. Code for our experiments can be found at https://github.com/CLeARoboticsLab/ZeroSumGameSolve.jl.
Code2MCP: Transforming Code Repositories into MCP Services
The Model Context Protocol (MCP) aims to create a standard for how Large Language Models use tools. However, most current research focuses on selecting tools from an existing pool. A more fundamental, yet largely overlooked, problem is how to populate this pool by converting the vast number of existing software projects into MCP-compatible services. To bridge this gap, we introduce Code2MCP, an agent-based framework that automatically transforms a GitHub repository into a functional MCP service with minimal human intervention. Code2MCP employs a multi-agent workflow for code analysis, environment setup, tool function design, and service generation, enhanced by a self-correcting loop to ensure reliability. We demonstrate that Code2MCP successfully transforms open-source computing libraries in scientific fields such as bioinformatics, mathematics, and fluid dynamics that are not available in existing MCP servers. By providing a novel automated pathway to unlock GitHub, the world's largest code repository, for the MCP ecosystem, Code2MCP serves as a catalyst to significantly accelerate the protocol's adoption and practical application. The code is public at https://github.com/DEFENSE-SEU/Code2MCP.
SALM: A Multi-Agent Framework for Language Model-Driven Social Network Simulation
Contemporary approaches to agent-based modeling (ABM) of social systems have traditionally emphasized rule-based behaviors, limiting their ability to capture nuanced dynamics by moving beyond predefined rules and leveraging contextual understanding from LMs of human social interaction. This paper presents SALM (Social Agent LM Framework), a novel approach for integrating language models (LMs) into social network simulation that achieves unprecedented temporal stability in multi-agent scenarios. Our primary contributions include: (1) a hierarchical prompting architecture enabling stable simulation beyond 4,000 timesteps while reducing token usage by 73%, (2) an attention-based memory system achieving 80% cache hit rates (95% CI [78%, 82%]) with sub-linear memory growth of 9.5%, and (3) formal bounds on personality stability. Through extensive validation against SNAP ego networks, we demonstrate the first LLM-based framework capable of modeling long-term social phenomena while maintaining empirically validated behavioral fidelity.
Systems and Control (CS)
Solar and Wind Power Forecasting: A Comparative Review of LSTM, Random Forest, and XGBoost Models
Rising global energy demand from population growth raises concerns about the sustainability of fossil fuels. Consequently, the energy sector has increasingly transitioned to renewable energy sources like solar and wind, which are naturally abundant. However, the periodic and unpredictable nature of these resources pose significant challenges for power system reliability. Accurate forecasting is essential to ensure grid stability and optimize energy management. But due to the high variability in weather conditions which directly affected wind and solar energy, achieving precise predictions remains difficult. Advancements in Artificial Intelligence (AI), particularly in Machine Learning (ML) and Deep Learning (DL), offer promising solutions to improve forecasting accuracy. The study highlights three widely used algorithms for solar and wind energy prediction: Long Short-Term Memory (LSTM), Random Forest (RF), and Extreme Gradient Boosting (XGBoost). These models are capable of learning complex patterns from historical and environmental data, enabling more accurate forecasts and contributing to the enhanced efficiency and reliability of renewable energy systems. This review aims to provide an overview on RF, XGBoost, and LSTM by conducting a comparative analysis across three essential criteria: research prevalence, model complexity, and computational execution time.
Zeroth-Order Constrained Optimization from a Control Perspective via Feedback Linearization
Designing safe derivative-free optimization algorithms under unknown constraints is a fundamental challenge in modern learning and control. Most existing zeroth-order (ZO) approaches typically assume white-box constraints or focus on convex settings, leaving the general case of nonconvex optimization with black-box constraints largely open. We propose a control-theoretic framework for ZO constrained optimization that enforces feasibility without relying on solving costly convex subproblems. Leveraging feedback linearization, we introduce a family of ZO feedback linearization (ZOFL) algorithms applicable to both equality and inequality constraints. Our method requires only noisy, sample-based gradient estimates yet provably guarantees constraint satisfaction under mild regularity conditions. We establish finite-time bounds on constraint violation and further present a midpoint discretization variant that further improves feasibility without sacrificing optimality. Empirical results demonstrate that ZOFL consistently outperforms standard ZO baselines, achieving competitive objective values while maintaining feasibility.
Frequency Control and Optimal Power Sharing in Combined Power and Heating Networks with Heat Pumps
Heat pumps have the capability for fast adjustments in power consumption with potential connections to large heating-inertia district heating networks, and are thus a very important resource for providing frequency support in low-inertia power systems. Nevertheless, the coupling of power networks with district heating systems renders the underlying dynamics much more involved. It is therefore important to ensure that system stability and appropriate power sharing are maintained. In this paper, we consider the problem of leveraging district heating systems as ancillary services for primary frequency control in power networks via heat pumps. We propose a novel power sharing scheme for heating systems based on the average temperature. This enables an optimal power allocation among diverse energy sources without requiring load disturbances information. We then discuss two approaches for heating systems to contribute to frequency regulation in power networks. We show that both approaches ensure stability in the combined heat and power network and facilitate optimal power allocation among the different energy sources. We also discuss how various generation dynamics can be incorporated into our framework with guaranteed stability and optimality. Finally, we conduct simulations that demonstrate various tradeoffs in the transient response and the practical potential of the proposed approaches.
Optimism as Risk-Seeking in Multi-Agent Reinforcement Learning
Risk sensitivity has become a central theme in reinforcement learning (RL), where convex risk measures and robust formulations provide principled ways to model preferences beyond expected return. Recent extensions to multi-agent RL (MARL) have largely emphasized the risk-averse setting, prioritizing robustness to uncertainty. In cooperative MARL, however, such conservatism often leads to suboptimal equilibria, and a parallel line of work has shown that optimism can promote cooperation. Existing optimistic methods, though effective in practice, are typically heuristic and lack theoretical grounding. Building on the dual representation for convex risk measures, we propose a principled framework that interprets risk-seeking objectives as optimism. We introduce optimistic value functions, which formalize optimism as divergence-penalized risk-seeking evaluations. Building on this foundation, we derive a policy-gradient theorem for optimistic value functions, including explicit formulas for the entropic risk/KL-penalty setting, and develop decentralized optimistic actor-critic algorithms that implement these updates. Empirical results on cooperative benchmarks demonstrate that risk-seeking optimism consistently improves coordination over both risk-neutral baselines and heuristic optimistic methods. Our framework thus unifies risk-sensitive learning and optimism, offering a theoretically grounded and practically effective approach to cooperation in MARL.
Equation-Free Coarse Control of Distributed Parameter Systems via Local Neural Operators
The control of high-dimensional distributed parameter systems (DPS) remains a challenge when explicit coarse-grained equations are unavailable. Classical equation-free (EF) approaches rely on fine-scale simulators treated as black-box timesteppers. However, repeated simulations for steady-state computation, linearization, and control design are often computationally prohibitive, or the microscopic timestepper may not even be available, leaving us with data as the only resource. We propose a data-driven alternative that uses local neural operators, trained on spatiotemporal microscopic/mesoscopic data, to obtain efficient short-time solution operators. These surrogates are employed within Krylov subspace methods to compute coarse steady and unsteady-states, while also providing Jacobian information in a matrix-free manner. Krylov-Arnoldi iterations then approximate the dominant eigenspectrum, yielding reduced models that capture the open-loop slow dynamics without explicit Jacobian assembly. Both discrete-time Linear Quadratic Regulator (dLQR) and pole-placement (PP) controllers are based on this reduced system and lifted back to the full nonlinear dynamics, thereby closing the feedback loop.
comment: 8 pages, 2 figures
Integrated Communication and Control for Energy-Efficient UAV Swarms: A Multi-Agent Reinforcement Learning Approach
The deployment of unmanned aerial vehicle (UAV) swarm-assisted communication networks has become an increasingly vital approach for remediating coverage limitations in infrastructure-deficient environments, with especially pressing applications in temporary scenarios, such as emergency rescue, military and security operations, and remote area coverage. However, complex geographic environments lead to unpredictable and highly dynamic wireless channel conditions, resulting in frequent interruptions of air-to-ground (A2G) links that severely constrain the reliability and quality of service in UAV swarm-assisted mobile communications. To improve the quality of UAV swarm-assisted communications in complex geographic environments, we propose an integrated communication and control co-design mechanism. Given the stringent energy constraints inherent in UAV swarms, our proposed mechanism is designed to optimize energy efficiency while maintaining an equilibrium between equitable communication rates for mobile ground users (GUs) and UAV energy expenditure. We formulate the joint resource allocation and 3D trajectory control problem as a Markov decision process (MDP), and develop a multi-agent reinforcement learning (MARL) framework to enable real-time coordinated actions across the UAV swarm. To optimize the action policy of UAV swarms, we propose a novel multi-agent hybrid proximal policy optimization with action masking (MAHPPO-AM) algorithm, specifically designed to handle complex hybrid action spaces. The algorithm incorporates action masking to enforce hard constraints in high-dimensional action spaces. Experimental results demonstrate that our approach achieves a fairness index of 0.99 while reducing energy consumption by up to 25% compared to baseline methods.
Toward a Robust Biomimetic Hybrid Battery: Bridging Biology, Electrochemistry and Data-Driven Control
Electric vehicles and renewable energy systems need batteries that charge quickly, last many years and still store a lot of energy, but current chemistries struggle to deliver all three. Inspired by electric fish that deliver bursts of current and birds that sleep with half their brains, we propose a hybrid battery concept called SwiftPulse. It combines sodium-ion cells that provide energy with niobium-oxide cells that accept high-power pulses. A pulse-based charger and a battery-management strategy rotate clusters of cells into rest so they can recover. We derive simple models of energy density, diffusion and capacity fade to show that a pack made mostly of sodium-ion modules with a smaller fraction of niobium-oxide modules could exceed 175 Wh per kg, endure more than ten thousand charge-discharge cycles and recharge to eighty percent in less than ten minutes. Simulations suggest that pulsed charging reduces ion buildup at the surface and slows degradation. We outline a roadmap for cell-level and module-level experiments and suggest integrating machine learning to adapt pulse parameters and rest scheduling. By blending ideas from biology, electrochemistry and data-driven control, this work points toward batteries that are safer, faster to charge and longer lasting.
Real Time Fatigue Crack Growth Monitoring Using High Precision Control and Data Acquisition Systems
Fatigue cracks may initiate and propagate long before a structural component reaches the end of its nominal life. Detecting and quantifying crack growth in real time is critical for avoiding catastrophic failures in aerospace structures, nuclear reactors and civil infrastructure. This research reviews four widely used crack growth monitoring techniques: digital image correlation (DIC), acoustic emission (AE), compliance based methods and direct current potential drop (DCPD). The research evaluates their working principles, instrumentation, detection thresholds, noise sensitivities and suitability under various environmental conditions. Recent advances in high precision control and data acquisition systems (DAQs) that enable multi sensor data fusion are explored. The goal is to guide selection and integration of monitoring methods in modern structural health monitoring architectures.
Electric Vehicle Charger Infrastructure Planning: Demand Estimation, Coverage Optimization Over an Integrated Power Grid
For electrifying the transportation sector, deploying a strategically planned and efficient charging infrastructure is essential. This paper presents a two-phase approach for electric vehicle (EV) charger deployment that integrates spatial point-of-interest analysis and maximum coverage optimization over an integrated spatial power grid. Spatial-focused studies in the literature often overlook electrical grid constraints, while grid-focused work frequently considers statistically modeled EV charging demand. To address these gaps, a new framework is proposed that combines spatial network planning with electrical grid considerations. This study approaches EV charger planning from a perspective of the distribution grid, starting with an estimation of EV charging demand and the identification of optimal candidate locations. It ensures that the capacity limits of newly established chargers are maintained within the limits of the power grid. This framework is applied in a test case for the Dallas area, integrating the existing EV charger network with an 8500-bus distribution system for comprehensive planning.
Robustness of 'small' networks
Modeling how networks change under structural perturbations can yield foundational insights into network robustness, which is critical in many real-world applications. The largest connected component is a popular measure of network performance. Percolation theory provides a theoretical framework to establish statistical properties of the largest connected component of large random graphs. However, this theoretical framework is typically only exact in the large-$\nodes$ limit, failing to capture the statistical properties of largest connected components in small networks, which many real-world networks are. We derive expected values for the largest connected component of small $G(\nodes,p)$ random graphs from which nodes are either removed uniformly at random or targeted by highest degree and compare these values with existing theory. We also visualize the performance of our expected values compared to existing theory for predicting the largest connected component of various real-world, small graphs.
comment: 10 pages, 7 figures
Communication-aware Wide-Area Damping Control using Risk-Constrained Reinforcement Learning
Non-ideal communication links, especially delays, critically affect fast networked controls in power systems, such as the wide-area damping control (WADC). Traditionally, a delay estimation and compensation approach is adopted to address this cyber-physical coupling, but it demands very high accuracy for the fast WADC and cannot handle other cyber concerns like link failures or {cyber perturbations}. Hence, we propose a new risk-constrained framework that can target the communication delays, yet amenable to general uncertainty under the cyber-physical couplings. Our WADC model includes the synchronous generators (SGs), and also voltage source converters (VSCs) for additional damping capabilities. To mitigate uncertainty, a mean-variance risk constraint is introduced to the classical optimal control cost of the linear quadratic regulator (LQR). Unlike estimating delays, our approach can effectively mitigate large communication delays by improving the worst-case performance. A reinforcement learning (RL)-based algorithm, namely, stochastic gradient-descent with max-oracle (SGDmax), is developed to solve the risk-constrained problem. We further show its guaranteed convergence to stationarity at a high probability, even using the simple zero-order policy gradient (ZOPG). Numerical tests on the IEEE 68-bus system not only verify SGDmax's convergence and VSCs' damping capabilities, but also demonstrate that our approach outperforms conventional delay compensator-based methods under estimation error. While focusing on performance improvement under large delays, our proposed risk-constrained design can effectively mitigate the worst-case oscillations, making it equally effective for addressing other communication issues and cyber perturbations.
comment: 12 pages, 14 figures, Accepted for publication in IEEE Transactions on Smart Grid, 2025
AW-EL-PINNs: A Multi-Task Learning Physics-Informed Neural Network for Euler-Lagrange Systems in Optimal Control Problems
This paper presents adaptive weighted Euler-Lagrange theorem combined physics-informed neural networks (AW-EL-PINNs) for solving Euler-Lagrange systems in optimal control problems. The framework systematically converts optimal control frameworks into two-point boundary value problems (TPBVPs) while establishing a multi-task learning paradigm through innovative integration of the Euler-Lagrange theorem with deep learning architecture. An adaptive loss weighting mechanism dynamically balances loss function components during training, decreasing tedious manual tuning of weighting the loss functions compared to the conventional physics-informed neural networks (PINNs). Based on six numerical examples, it's clear that AW-EL-PINNs achieve enhanced solution accuracy compared to baseline methods while maintaining stability throughout the optimization process. These results highlight the framework's capability to improve precision and ensure stability in solving Euler-Lagrange systems in optimal control problems, offering potential strategies for problems under physical applications.
Safety-Critical Control with Guaranteed Lipschitz Continuity via Filtered Control Barrier Functions
In safety-critical control systems, ensuring both system safety and smooth control input is essential for practical deployment. Existing Control Barrier Function (CBF) frameworks, especially High-Order CBFs (HOCBFs), effectively enforce safety constraints, but also raise concerns about the smoothness of the resulting control inputs. While smoothness typically refers to continuity and differentiability, it does not by itself ensure bounded input variation. In contrast, Lipschitz continuity is a stronger form of continuity that not only is necessary for the theoretical guarantee of safety, but also bounds the rate of variation and eliminates abrupt changes in the control input. Such abrupt changes can degrade system performance or even violate actuator limitations, yet current CBF-based methods do not provide Lipschitz continuity guarantees. This paper introduces Filtered Control Barrier Functions (FCBFs), which extend HOCBFs by incorporating an auxiliary dynamic system-referred to as an input regularization filter-to produce Lipschitz continuous control inputs. The proposed framework ensures safety, control bounds, and Lipschitz continuity of the control inputs simultaneously by integrating FCBFs and HOCBFs within a unified quadratic program (QP). Theoretical guarantees are provided and simulations on a unicycle model demonstrate the effectiveness of the proposed method compared to standard and smoothness-penalized HOCBF approaches.
comment: 8 pages, 4 figures
Second-Order Algorithms for Finding Local Nash Equilibria in Zero-Sum Games
Zero-sum games arise in a wide variety of problems, including robust optimization and adversarial learning. However, algorithms deployed for finding a local Nash equilibrium in these games often converge to non-Nash stationary points. This highlights a key challenge: for any algorithm, the stability properties of its underlying dynamical system can cause non-Nash points to be potential attractors. To overcome this challenge, algorithms must account for subtleties involving the curvatures of players' costs. To this end, we leverage dynamical system theory and develop a second-order algorithm for finding a local Nash equilibrium in the smooth, possibly nonconvex-nonconcave, zero-sum game setting. First, we prove that this novel method guarantees convergence to only local Nash equilibria with an asymptotic local \textit{linear} convergence rate. We then interpret a version of this method as a modified Gauss-Newton algorithm with local \textit{superlinear} convergence to the neighborhood of a point that satisfies first-order local Nash equilibrium conditions. In comparison, current related state-of-the-art methods with similar guarantees do not offer convergence rates in the nonconvex-nonconcave setting. Furthermore, we show that this approach naturally generalizes to settings with convex and potentially coupled constraints while retaining earlier guarantees of convergence to only local (generalized) Nash equilibria. Code for our experiments can be found at https://github.com/CLeARoboticsLab/ZeroSumGameSolve.jl.
Finite-Time Bounds for Two-Time-Scale Stochastic Approximation with Arbitrary Norm Contractions and Markovian Noise
Two-time-scale Stochastic Approximation (SA) is an iterative algorithm with applications in reinforcement learning and optimization. Prior finite time analysis of such algorithms has focused on fixed point iterations with mappings contractive under Euclidean norm. Motivated by applications in reinforcement learning, we give the first mean square bound on non linear two-time-scale SA where the iterations have arbitrary norm contractive mappings and Markovian noise. We show that the mean square error decays at a rate of $O(1/n^{2/3})$ in the general case, and at a rate of $O(1/n)$ in a special case where the slower timescale is noiseless. Our analysis uses the generalized Moreau envelope to handle the arbitrary norm contractions and solutions of Poisson equation to deal with the Markovian noise. By analyzing the SSP Q-Learning algorithm, we give the first $O(1/n)$ bound for an algorithm for asynchronous control of MDPs under the average reward criterion. We also obtain a rate of $O(1/n)$ for Q-Learning with Polyak-averaging and provide an algorithm for learning Generalized Nash Equilibrium (GNE) for strongly monotone games which converges at a rate of $O(1/n^{2/3})$.
comment: To be presented at IEEE Conference on Decision and Control (CDC) 2025
Data-informativity conditions for structured linear systems with implications for dynamic networks
When estimating a single subsystem (module) in a linear dynamic network with a prediction error method, a data-informativity condition needs to be satisfied for arriving at a consistent module estimate. This concerns a condition on input signals in the constructed, possibly MIMO (multiple input multiple output) predictor model being persistently exciting, which is typically guaranteed if the input spectrum is positive definite for a sufficient number of frequencies. Generically, the condition can be formulated as a path-based condition on the graph of the network model. The current condition has two elements of possible conservatism: (a) rather than focussing on the full MIMO model, one would like to be able to focus on consistently estimating the target module only, and (b) structural information, such as structural zero elements in the interconnection structure or known subsystems, should be taken into account. In this paper relaxed conditions for data-informativity are derived addressing these two issues, leading to relaxed path-based conditions on the network graph. This leads to experimental conditions that are less strict, i.e. require a smaller number of external excitation signals. Additionally, the new expressions for data-informativity in identification are shown to be closely related to earlier derived conditions for (generic) single module identifiability.
comment: 17 pages, 5 figures
FuzzyLight: A Robust Two-Stage Fuzzy Approach for Traffic Signal Control Works in Real Cities
Effective traffic signal control (TSC) is crucial in mitigating urban congestion and reducing emissions. Recently, reinforcement learning (RL) has been the research trend for TSC. However, existing RL algorithms face several real-world challenges that hinder their practical deployment in TSC: (1) Sensor accuracy deteriorates with increased sensor detection range, and data transmission is prone to noise, potentially resulting in unsafe TSC decisions. (2) During the training of online RL, interactions with the environment could be unstable, potentially leading to inappropriate traffic signal phase (TSP) selection and traffic congestion. (3) Most current TSC algorithms focus only on TSP decisions, overlooking the critical aspect of phase duration, affecting safety and efficiency. To overcome these challenges, we propose a robust two-stage fuzzy approach called FuzzyLight, which integrates compressed sensing and RL for TSC deployment. FuzzyLight offers several key contributions: (1) It employs fuzzy logic and compressed sensing to address sensor noise and enhances the efficiency of TSP decisions. (2) It maintains stable performance during training and combines fuzzy logic with RL to generate precise phases. (3) It works in real cities across 22 intersections and demonstrates superior performance in both real-world and simulated environments. Experimental results indicate that FuzzyLight enhances traffic efficiency by 48% compared to expert-designed timings in the real world. Furthermore, it achieves state-of-the-art (SOTA) performance in simulated environments using six real-world datasets with transmission noise. The code and deployment video are available at the URL1
Autonomous Close-Proximity Photovoltaic Panel Coating Using a Quadcopter
Photovoltaic (PV) panels are becoming increasingly widespread in the domain of renewable energy, and thus, small efficiency gains can have massive effects. Anti-reflective and self-cleaning coatings enhance panel performance but degrade over time, requiring periodic reapplication. Uncrewed Aerial Vehicles (UAVs) offer a flexible and autonomous way to apply protective coatings more often and at lower cost compared to traditional manual coating methods. In this letter, we propose a quadcopter-based system, equipped with a liquid dispersion mechanism, designed to automate such tasks. The localization stack only uses onboard sensors, relying on visual-inertial odometry and the relative position of the PV panel detected with respect to the quadcopter. The control relies on a model-based controller that accounts for the ground effect and the mass decrease of the quadcopter during liquid dispersion. We validate the autonomy capabilities of our system through extensive indoor and outdoor experiments.
comment: 7 pages, 10 figures. Submitted to IEEE RA-L
Systems and Control (EESS)
Solar and Wind Power Forecasting: A Comparative Review of LSTM, Random Forest, and XGBoost Models
Rising global energy demand from population growth raises concerns about the sustainability of fossil fuels. Consequently, the energy sector has increasingly transitioned to renewable energy sources like solar and wind, which are naturally abundant. However, the periodic and unpredictable nature of these resources pose significant challenges for power system reliability. Accurate forecasting is essential to ensure grid stability and optimize energy management. But due to the high variability in weather conditions which directly affected wind and solar energy, achieving precise predictions remains difficult. Advancements in Artificial Intelligence (AI), particularly in Machine Learning (ML) and Deep Learning (DL), offer promising solutions to improve forecasting accuracy. The study highlights three widely used algorithms for solar and wind energy prediction: Long Short-Term Memory (LSTM), Random Forest (RF), and Extreme Gradient Boosting (XGBoost). These models are capable of learning complex patterns from historical and environmental data, enabling more accurate forecasts and contributing to the enhanced efficiency and reliability of renewable energy systems. This review aims to provide an overview on RF, XGBoost, and LSTM by conducting a comparative analysis across three essential criteria: research prevalence, model complexity, and computational execution time.
Zeroth-Order Constrained Optimization from a Control Perspective via Feedback Linearization
Designing safe derivative-free optimization algorithms under unknown constraints is a fundamental challenge in modern learning and control. Most existing zeroth-order (ZO) approaches typically assume white-box constraints or focus on convex settings, leaving the general case of nonconvex optimization with black-box constraints largely open. We propose a control-theoretic framework for ZO constrained optimization that enforces feasibility without relying on solving costly convex subproblems. Leveraging feedback linearization, we introduce a family of ZO feedback linearization (ZOFL) algorithms applicable to both equality and inequality constraints. Our method requires only noisy, sample-based gradient estimates yet provably guarantees constraint satisfaction under mild regularity conditions. We establish finite-time bounds on constraint violation and further present a midpoint discretization variant that further improves feasibility without sacrificing optimality. Empirical results demonstrate that ZOFL consistently outperforms standard ZO baselines, achieving competitive objective values while maintaining feasibility.
Frequency Control and Optimal Power Sharing in Combined Power and Heating Networks with Heat Pumps
Heat pumps have the capability for fast adjustments in power consumption with potential connections to large heating-inertia district heating networks, and are thus a very important resource for providing frequency support in low-inertia power systems. Nevertheless, the coupling of power networks with district heating systems renders the underlying dynamics much more involved. It is therefore important to ensure that system stability and appropriate power sharing are maintained. In this paper, we consider the problem of leveraging district heating systems as ancillary services for primary frequency control in power networks via heat pumps. We propose a novel power sharing scheme for heating systems based on the average temperature. This enables an optimal power allocation among diverse energy sources without requiring load disturbances information. We then discuss two approaches for heating systems to contribute to frequency regulation in power networks. We show that both approaches ensure stability in the combined heat and power network and facilitate optimal power allocation among the different energy sources. We also discuss how various generation dynamics can be incorporated into our framework with guaranteed stability and optimality. Finally, we conduct simulations that demonstrate various tradeoffs in the transient response and the practical potential of the proposed approaches.
Optimism as Risk-Seeking in Multi-Agent Reinforcement Learning
Risk sensitivity has become a central theme in reinforcement learning (RL), where convex risk measures and robust formulations provide principled ways to model preferences beyond expected return. Recent extensions to multi-agent RL (MARL) have largely emphasized the risk-averse setting, prioritizing robustness to uncertainty. In cooperative MARL, however, such conservatism often leads to suboptimal equilibria, and a parallel line of work has shown that optimism can promote cooperation. Existing optimistic methods, though effective in practice, are typically heuristic and lack theoretical grounding. Building on the dual representation for convex risk measures, we propose a principled framework that interprets risk-seeking objectives as optimism. We introduce optimistic value functions, which formalize optimism as divergence-penalized risk-seeking evaluations. Building on this foundation, we derive a policy-gradient theorem for optimistic value functions, including explicit formulas for the entropic risk/KL-penalty setting, and develop decentralized optimistic actor-critic algorithms that implement these updates. Empirical results on cooperative benchmarks demonstrate that risk-seeking optimism consistently improves coordination over both risk-neutral baselines and heuristic optimistic methods. Our framework thus unifies risk-sensitive learning and optimism, offering a theoretically grounded and practically effective approach to cooperation in MARL.
Equation-Free Coarse Control of Distributed Parameter Systems via Local Neural Operators
The control of high-dimensional distributed parameter systems (DPS) remains a challenge when explicit coarse-grained equations are unavailable. Classical equation-free (EF) approaches rely on fine-scale simulators treated as black-box timesteppers. However, repeated simulations for steady-state computation, linearization, and control design are often computationally prohibitive, or the microscopic timestepper may not even be available, leaving us with data as the only resource. We propose a data-driven alternative that uses local neural operators, trained on spatiotemporal microscopic/mesoscopic data, to obtain efficient short-time solution operators. These surrogates are employed within Krylov subspace methods to compute coarse steady and unsteady-states, while also providing Jacobian information in a matrix-free manner. Krylov-Arnoldi iterations then approximate the dominant eigenspectrum, yielding reduced models that capture the open-loop slow dynamics without explicit Jacobian assembly. Both discrete-time Linear Quadratic Regulator (dLQR) and pole-placement (PP) controllers are based on this reduced system and lifted back to the full nonlinear dynamics, thereby closing the feedback loop.
comment: 8 pages, 2 figures
Integrated Communication and Control for Energy-Efficient UAV Swarms: A Multi-Agent Reinforcement Learning Approach
The deployment of unmanned aerial vehicle (UAV) swarm-assisted communication networks has become an increasingly vital approach for remediating coverage limitations in infrastructure-deficient environments, with especially pressing applications in temporary scenarios, such as emergency rescue, military and security operations, and remote area coverage. However, complex geographic environments lead to unpredictable and highly dynamic wireless channel conditions, resulting in frequent interruptions of air-to-ground (A2G) links that severely constrain the reliability and quality of service in UAV swarm-assisted mobile communications. To improve the quality of UAV swarm-assisted communications in complex geographic environments, we propose an integrated communication and control co-design mechanism. Given the stringent energy constraints inherent in UAV swarms, our proposed mechanism is designed to optimize energy efficiency while maintaining an equilibrium between equitable communication rates for mobile ground users (GUs) and UAV energy expenditure. We formulate the joint resource allocation and 3D trajectory control problem as a Markov decision process (MDP), and develop a multi-agent reinforcement learning (MARL) framework to enable real-time coordinated actions across the UAV swarm. To optimize the action policy of UAV swarms, we propose a novel multi-agent hybrid proximal policy optimization with action masking (MAHPPO-AM) algorithm, specifically designed to handle complex hybrid action spaces. The algorithm incorporates action masking to enforce hard constraints in high-dimensional action spaces. Experimental results demonstrate that our approach achieves a fairness index of 0.99 while reducing energy consumption by up to 25% compared to baseline methods.
Toward a Robust Biomimetic Hybrid Battery: Bridging Biology, Electrochemistry and Data-Driven Control
Electric vehicles and renewable energy systems need batteries that charge quickly, last many years and still store a lot of energy, but current chemistries struggle to deliver all three. Inspired by electric fish that deliver bursts of current and birds that sleep with half their brains, we propose a hybrid battery concept called SwiftPulse. It combines sodium-ion cells that provide energy with niobium-oxide cells that accept high-power pulses. A pulse-based charger and a battery-management strategy rotate clusters of cells into rest so they can recover. We derive simple models of energy density, diffusion and capacity fade to show that a pack made mostly of sodium-ion modules with a smaller fraction of niobium-oxide modules could exceed 175 Wh per kg, endure more than ten thousand charge-discharge cycles and recharge to eighty percent in less than ten minutes. Simulations suggest that pulsed charging reduces ion buildup at the surface and slows degradation. We outline a roadmap for cell-level and module-level experiments and suggest integrating machine learning to adapt pulse parameters and rest scheduling. By blending ideas from biology, electrochemistry and data-driven control, this work points toward batteries that are safer, faster to charge and longer lasting.
Real Time Fatigue Crack Growth Monitoring Using High Precision Control and Data Acquisition Systems
Fatigue cracks may initiate and propagate long before a structural component reaches the end of its nominal life. Detecting and quantifying crack growth in real time is critical for avoiding catastrophic failures in aerospace structures, nuclear reactors and civil infrastructure. This research reviews four widely used crack growth monitoring techniques: digital image correlation (DIC), acoustic emission (AE), compliance based methods and direct current potential drop (DCPD). The research evaluates their working principles, instrumentation, detection thresholds, noise sensitivities and suitability under various environmental conditions. Recent advances in high precision control and data acquisition systems (DAQs) that enable multi sensor data fusion are explored. The goal is to guide selection and integration of monitoring methods in modern structural health monitoring architectures.
Electric Vehicle Charger Infrastructure Planning: Demand Estimation, Coverage Optimization Over an Integrated Power Grid
For electrifying the transportation sector, deploying a strategically planned and efficient charging infrastructure is essential. This paper presents a two-phase approach for electric vehicle (EV) charger deployment that integrates spatial point-of-interest analysis and maximum coverage optimization over an integrated spatial power grid. Spatial-focused studies in the literature often overlook electrical grid constraints, while grid-focused work frequently considers statistically modeled EV charging demand. To address these gaps, a new framework is proposed that combines spatial network planning with electrical grid considerations. This study approaches EV charger planning from a perspective of the distribution grid, starting with an estimation of EV charging demand and the identification of optimal candidate locations. It ensures that the capacity limits of newly established chargers are maintained within the limits of the power grid. This framework is applied in a test case for the Dallas area, integrating the existing EV charger network with an 8500-bus distribution system for comprehensive planning.
Robustness of 'small' networks
Modeling how networks change under structural perturbations can yield foundational insights into network robustness, which is critical in many real-world applications. The largest connected component is a popular measure of network performance. Percolation theory provides a theoretical framework to establish statistical properties of the largest connected component of large random graphs. However, this theoretical framework is typically only exact in the large-$\nodes$ limit, failing to capture the statistical properties of largest connected components in small networks, which many real-world networks are. We derive expected values for the largest connected component of small $G(\nodes,p)$ random graphs from which nodes are either removed uniformly at random or targeted by highest degree and compare these values with existing theory. We also visualize the performance of our expected values compared to existing theory for predicting the largest connected component of various real-world, small graphs.
comment: 10 pages, 7 figures
Communication-aware Wide-Area Damping Control using Risk-Constrained Reinforcement Learning
Non-ideal communication links, especially delays, critically affect fast networked controls in power systems, such as the wide-area damping control (WADC). Traditionally, a delay estimation and compensation approach is adopted to address this cyber-physical coupling, but it demands very high accuracy for the fast WADC and cannot handle other cyber concerns like link failures or {cyber perturbations}. Hence, we propose a new risk-constrained framework that can target the communication delays, yet amenable to general uncertainty under the cyber-physical couplings. Our WADC model includes the synchronous generators (SGs), and also voltage source converters (VSCs) for additional damping capabilities. To mitigate uncertainty, a mean-variance risk constraint is introduced to the classical optimal control cost of the linear quadratic regulator (LQR). Unlike estimating delays, our approach can effectively mitigate large communication delays by improving the worst-case performance. A reinforcement learning (RL)-based algorithm, namely, stochastic gradient-descent with max-oracle (SGDmax), is developed to solve the risk-constrained problem. We further show its guaranteed convergence to stationarity at a high probability, even using the simple zero-order policy gradient (ZOPG). Numerical tests on the IEEE 68-bus system not only verify SGDmax's convergence and VSCs' damping capabilities, but also demonstrate that our approach outperforms conventional delay compensator-based methods under estimation error. While focusing on performance improvement under large delays, our proposed risk-constrained design can effectively mitigate the worst-case oscillations, making it equally effective for addressing other communication issues and cyber perturbations.
comment: 12 pages, 14 figures, Accepted for publication in IEEE Transactions on Smart Grid, 2025
AW-EL-PINNs: A Multi-Task Learning Physics-Informed Neural Network for Euler-Lagrange Systems in Optimal Control Problems
This paper presents adaptive weighted Euler-Lagrange theorem combined physics-informed neural networks (AW-EL-PINNs) for solving Euler-Lagrange systems in optimal control problems. The framework systematically converts optimal control frameworks into two-point boundary value problems (TPBVPs) while establishing a multi-task learning paradigm through innovative integration of the Euler-Lagrange theorem with deep learning architecture. An adaptive loss weighting mechanism dynamically balances loss function components during training, decreasing tedious manual tuning of weighting the loss functions compared to the conventional physics-informed neural networks (PINNs). Based on six numerical examples, it's clear that AW-EL-PINNs achieve enhanced solution accuracy compared to baseline methods while maintaining stability throughout the optimization process. These results highlight the framework's capability to improve precision and ensure stability in solving Euler-Lagrange systems in optimal control problems, offering potential strategies for problems under physical applications.
Safety-Critical Control with Guaranteed Lipschitz Continuity via Filtered Control Barrier Functions
In safety-critical control systems, ensuring both system safety and smooth control input is essential for practical deployment. Existing Control Barrier Function (CBF) frameworks, especially High-Order CBFs (HOCBFs), effectively enforce safety constraints, but also raise concerns about the smoothness of the resulting control inputs. While smoothness typically refers to continuity and differentiability, it does not by itself ensure bounded input variation. In contrast, Lipschitz continuity is a stronger form of continuity that not only is necessary for the theoretical guarantee of safety, but also bounds the rate of variation and eliminates abrupt changes in the control input. Such abrupt changes can degrade system performance or even violate actuator limitations, yet current CBF-based methods do not provide Lipschitz continuity guarantees. This paper introduces Filtered Control Barrier Functions (FCBFs), which extend HOCBFs by incorporating an auxiliary dynamic system-referred to as an input regularization filter-to produce Lipschitz continuous control inputs. The proposed framework ensures safety, control bounds, and Lipschitz continuity of the control inputs simultaneously by integrating FCBFs and HOCBFs within a unified quadratic program (QP). Theoretical guarantees are provided and simulations on a unicycle model demonstrate the effectiveness of the proposed method compared to standard and smoothness-penalized HOCBF approaches.
comment: 8 pages, 4 figures
Second-Order Algorithms for Finding Local Nash Equilibria in Zero-Sum Games
Zero-sum games arise in a wide variety of problems, including robust optimization and adversarial learning. However, algorithms deployed for finding a local Nash equilibrium in these games often converge to non-Nash stationary points. This highlights a key challenge: for any algorithm, the stability properties of its underlying dynamical system can cause non-Nash points to be potential attractors. To overcome this challenge, algorithms must account for subtleties involving the curvatures of players' costs. To this end, we leverage dynamical system theory and develop a second-order algorithm for finding a local Nash equilibrium in the smooth, possibly nonconvex-nonconcave, zero-sum game setting. First, we prove that this novel method guarantees convergence to only local Nash equilibria with an asymptotic local \textit{linear} convergence rate. We then interpret a version of this method as a modified Gauss-Newton algorithm with local \textit{superlinear} convergence to the neighborhood of a point that satisfies first-order local Nash equilibrium conditions. In comparison, current related state-of-the-art methods with similar guarantees do not offer convergence rates in the nonconvex-nonconcave setting. Furthermore, we show that this approach naturally generalizes to settings with convex and potentially coupled constraints while retaining earlier guarantees of convergence to only local (generalized) Nash equilibria. Code for our experiments can be found at https://github.com/CLeARoboticsLab/ZeroSumGameSolve.jl.
Finite-Time Bounds for Two-Time-Scale Stochastic Approximation with Arbitrary Norm Contractions and Markovian Noise
Two-time-scale Stochastic Approximation (SA) is an iterative algorithm with applications in reinforcement learning and optimization. Prior finite time analysis of such algorithms has focused on fixed point iterations with mappings contractive under Euclidean norm. Motivated by applications in reinforcement learning, we give the first mean square bound on non linear two-time-scale SA where the iterations have arbitrary norm contractive mappings and Markovian noise. We show that the mean square error decays at a rate of $O(1/n^{2/3})$ in the general case, and at a rate of $O(1/n)$ in a special case where the slower timescale is noiseless. Our analysis uses the generalized Moreau envelope to handle the arbitrary norm contractions and solutions of Poisson equation to deal with the Markovian noise. By analyzing the SSP Q-Learning algorithm, we give the first $O(1/n)$ bound for an algorithm for asynchronous control of MDPs under the average reward criterion. We also obtain a rate of $O(1/n)$ for Q-Learning with Polyak-averaging and provide an algorithm for learning Generalized Nash Equilibrium (GNE) for strongly monotone games which converges at a rate of $O(1/n^{2/3})$.
comment: To be presented at IEEE Conference on Decision and Control (CDC) 2025
Data-informativity conditions for structured linear systems with implications for dynamic networks
When estimating a single subsystem (module) in a linear dynamic network with a prediction error method, a data-informativity condition needs to be satisfied for arriving at a consistent module estimate. This concerns a condition on input signals in the constructed, possibly MIMO (multiple input multiple output) predictor model being persistently exciting, which is typically guaranteed if the input spectrum is positive definite for a sufficient number of frequencies. Generically, the condition can be formulated as a path-based condition on the graph of the network model. The current condition has two elements of possible conservatism: (a) rather than focussing on the full MIMO model, one would like to be able to focus on consistently estimating the target module only, and (b) structural information, such as structural zero elements in the interconnection structure or known subsystems, should be taken into account. In this paper relaxed conditions for data-informativity are derived addressing these two issues, leading to relaxed path-based conditions on the network graph. This leads to experimental conditions that are less strict, i.e. require a smaller number of external excitation signals. Additionally, the new expressions for data-informativity in identification are shown to be closely related to earlier derived conditions for (generic) single module identifiability.
comment: 17 pages, 5 figures
FuzzyLight: A Robust Two-Stage Fuzzy Approach for Traffic Signal Control Works in Real Cities
Effective traffic signal control (TSC) is crucial in mitigating urban congestion and reducing emissions. Recently, reinforcement learning (RL) has been the research trend for TSC. However, existing RL algorithms face several real-world challenges that hinder their practical deployment in TSC: (1) Sensor accuracy deteriorates with increased sensor detection range, and data transmission is prone to noise, potentially resulting in unsafe TSC decisions. (2) During the training of online RL, interactions with the environment could be unstable, potentially leading to inappropriate traffic signal phase (TSP) selection and traffic congestion. (3) Most current TSC algorithms focus only on TSP decisions, overlooking the critical aspect of phase duration, affecting safety and efficiency. To overcome these challenges, we propose a robust two-stage fuzzy approach called FuzzyLight, which integrates compressed sensing and RL for TSC deployment. FuzzyLight offers several key contributions: (1) It employs fuzzy logic and compressed sensing to address sensor noise and enhances the efficiency of TSP decisions. (2) It maintains stable performance during training and combines fuzzy logic with RL to generate precise phases. (3) It works in real cities across 22 intersections and demonstrates superior performance in both real-world and simulated environments. Experimental results indicate that FuzzyLight enhances traffic efficiency by 48% compared to expert-designed timings in the real world. Furthermore, it achieves state-of-the-art (SOTA) performance in simulated environments using six real-world datasets with transmission noise. The code and deployment video are available at the URL1
Autonomous Close-Proximity Photovoltaic Panel Coating Using a Quadcopter
Photovoltaic (PV) panels are becoming increasingly widespread in the domain of renewable energy, and thus, small efficiency gains can have massive effects. Anti-reflective and self-cleaning coatings enhance panel performance but degrade over time, requiring periodic reapplication. Uncrewed Aerial Vehicles (UAVs) offer a flexible and autonomous way to apply protective coatings more often and at lower cost compared to traditional manual coating methods. In this letter, we propose a quadcopter-based system, equipped with a liquid dispersion mechanism, designed to automate such tasks. The localization stack only uses onboard sensors, relying on visual-inertial odometry and the relative position of the PV panel detected with respect to the quadcopter. The control relies on a model-based controller that accounts for the ground effect and the mass decrease of the quadcopter during liquid dispersion. We validate the autonomy capabilities of our system through extensive indoor and outdoor experiments.
comment: 7 pages, 10 figures. Submitted to IEEE RA-L
Robotics
Ask, Reason, Assist: Decentralized Robot Collaboration via Language and Logic
Increased robot deployment, such as in warehousing, has revealed a need for seamless collaboration among heterogeneous robot teams to resolve unforeseen conflicts. To address this challenge, we propose a novel decentralized framework that enables robots to request and provide help. The process begins when a robot detects a conflict and uses a Large Language Model (LLM) to decide whether external assistance is required. If so, it crafts and broadcasts a natural language (NL) help request. Potential helper robots reason over the request and respond with offers of assistance, including information about the effect on their ongoing tasks. Helper reasoning is implemented via an LLM grounded in Signal Temporal Logic (STL) using a Backus-Naur Form (BNF) grammar, ensuring syntactically valid NL-to-STL translations, which are then solved as a Mixed Integer Linear Program (MILP). Finally, the requester robot selects a helper by reasoning over the expected increase in system-level total task completion time. We evaluated our framework through experiments comparing different helper-selection strategies and found that considering multiple offers allows the requester to minimize added makespan. Our approach significantly outperforms heuristics such as selecting the nearest available candidate helper robot, and achieves performance comparable to a centralized "Oracle" baseline but without heavy information demands.
Multi-Modal Manipulation via Multi-Modal Policy Consensus
Effectively integrating diverse sensory modalities is crucial for robotic manipulation. However, the typical approach of feature concatenation is often suboptimal: dominant modalities such as vision can overwhelm sparse but critical signals like touch in contact-rich tasks, and monolithic architectures cannot flexibly incorporate new or missing modalities without retraining. Our method factorizes the policy into a set of diffusion models, each specialized for a single representation (e.g., vision or touch), and employs a router network that learns consensus weights to adaptively combine their contributions, enabling incremental of new representations. We evaluate our approach on simulated manipulation tasks in {RLBench}, as well as real-world tasks such as occluded object picking, in-hand spoon reorientation, and puzzle insertion, where it significantly outperforms feature-concatenation baselines on scenarios requiring multimodal reasoning. Our policy further demonstrates robustness to physical perturbations and sensor corruption. We further conduct perturbation-based importance analysis, which reveals adaptive shifts between modalities.
comment: 9 pages, 7 figures
Robust Orientation Estimation with TRIAD-aided Manifold EKF
The manifold extended Kalman filter (Manifold EKF) has found extensive application for attitude determination. Magnetometers employed as sensors for such attitude determination are easily prone to disturbances by their sensitivity to calibration and external magnetic fields. The TRIAD (Tri-Axial Attitude Determination) algorithm is well known as a sub-optimal attitude estimator. In this article, we incorporate this sub-optimal feature of the TRIAD in mitigating the influence of the magnetometer reading in the pitch and roll axis determination in the Manifold EKF algorithm. We substantiate our results with experiments.
Space Robotics Bench: Robot Learning Beyond Earth
The growing ambition for space exploration demands robust autonomous systems that can operate in unstructured environments under extreme extraterrestrial conditions. The adoption of robot learning in this domain is severely hindered by the prohibitive cost of technology demonstrations and the limited availability of data. To bridge this gap, we introduce the Space Robotics Bench, an open-source simulation framework for robot learning in space. It offers a modular architecture that integrates on-demand procedural generation with massively parallel simulation environments to support the creation of vast and diverse training distributions for learning-based agents. To ground research and enable direct comparison, the framework includes a comprehensive suite of benchmark tasks that span a wide range of mission-relevant scenarios. We establish performance baselines using standard reinforcement learning algorithms and present a series of experimental case studies that investigate key challenges in generalization, end-to-end learning, adaptive control, and sim-to-real transfer. Our results reveal insights into the limitations of current methods and demonstrate the utility of the framework in producing policies capable of real-world operation. These contributions establish the Space Robotics Bench as a valuable resource for developing, benchmarking, and deploying the robust autonomous systems required for the final frontier.
comment: The source code is available at https://github.com/AndrejOrsula/space_robotics_bench
GUARD: Toward a Compromise between Traditional Control and Learning for Safe Robot Systems IROS 2025
This paper presents the framework \textbf{GUARD} (\textbf{G}uided robot control via \textbf{U}ncertainty attribution and prob\textbf{A}bilistic kernel optimization for \textbf{R}isk-aware \textbf{D}ecision making) that combines traditional control with an uncertainty-aware perception technique using active learning with real-time capability for safe robot collision avoidance. By doing so, this manuscript addresses the central challenge in robotics of finding a reasonable compromise between traditional methods and learning algorithms to foster the development of safe, yet efficient and flexible applications. By unifying a reactive model predictive countouring control (RMPCC) with an Iterative Closest Point (ICP) algorithm that enables the attribution of uncertainty sources online using active learning with real-time capability via a probabilistic kernel optimization technique, \emph{GUARD} inherently handles the existing ambiguity of the term \textit{safety} that exists in robotics literature. Experimental studies indicate the high performance of \emph{GUARD}, thereby highlighting the relevance and need to broaden its applicability in future.
comment: Submitted as workshop paper to IEEE IROS 2025
Distributed Multi-Robot Multi-Target Simultaneous Search and Tracking in an Unknown Non-convex Environment
In unknown non-convex environments, such as indoor and underground spaces, deploying a fleet of robots to explore the surroundings while simultaneously searching for and tracking targets of interest to maintain high-precision data collection represents a fundamental challenge that urgently requires resolution in applications such as environmental monitoring and rescue operations. Current research has made significant progress in addressing environmental exploration, information search, and target tracking problems, but has yet to establish a framework for simultaneously optimizing these tasks in complex environments. In this paper, we propose a novel motion planning algorithm framework that integrates three control strategies: a frontier-based exploration strategy, a guaranteed coverage strategy based on Lloyd's algorithm, and a sensor-based multi-target tracking strategy. By incorporating these three strategies, the proposed algorithm balances coverage search and high-precision active tracking during exploration. Our approach is validated through a series of MATLAB simulations, demonstrating validity and superiority over standard approaches.
A Novel Narrow Region Detector for Sampling-Based Planners' Efficiency: Match Based Passage Identifier
Autonomous technology, which has become widespread today, appears in many different configurations such as mobile robots, manipulators, and drones. One of the most important tasks of these vehicles during autonomous operations is path planning. In the literature, path planners are generally divided into two categories: probabilistic and deterministic methods. In the analysis of probabilistic methods, the common problem of almost all methods is observed in narrow passage environments. In this paper, a novel sampler is proposed that deterministically identifies narrow passage environments using occupancy grid maps and accordingly increases the amount of sampling in these regions. The codes of the algorithm is provided as open source. To evaluate the performance of the algorithm, benchmark studies are conducted in three distinct categories: specific and random simulation environments, and a real-world environment. As a result, it is observed that our algorithm provides higher performance in planning time and number of milestones compared to the baseline samplers.
Preventing Robotic Jailbreaking via Multimodal Domain Adaptation
Large Language Models (LLMs) and Vision-Language Models (VLMs) are increasingly deployed in robotic environments but remain vulnerable to jailbreaking attacks that bypass safety mechanisms and drive unsafe or physically harmful behaviors in the real world. Data-driven defenses such as jailbreak classifiers show promise, yet they struggle to generalize in domains where specialized datasets are scarce, limiting their effectiveness in robotics and other safety-critical contexts. To address this gap, we introduce J-DAPT, a lightweight framework for multimodal jailbreak detection through attention-based fusion and domain adaptation. J-DAPT integrates textual and visual embeddings to capture both semantic intent and environmental grounding, while aligning general-purpose jailbreak datasets with domain-specific reference data. Evaluations across autonomous driving, maritime robotics, and quadruped navigation show that J-DAPT boosts detection accuracy to nearly 100% with minimal overhead. These results demonstrate that J-DAPT provides a practical defense for securing VLMs in robotic applications. Additional materials are made available at: https://j-dapt.github.io.
comment: Project page: https://j-dapt.github.io/. 9 pages, 6 figures
Online Dynamic Goal Recognition in Gym Environments
Goal Recognition (GR) is the task of inferring an agent's intended goal from partial observations of its behavior, typically in an online and one-shot setting. Despite recent advances in model-free GR, particularly in applications such as human-robot interaction, surveillance, and assistive systems, the field remains fragmented due to inconsistencies in benchmarks, domains, and evaluation protocols. To address this, we introduce gr-libs (https://github.com/MatanShamir1/gr_libs) and gr-envs (https://github.com/MatanShamir1/gr_envs), two complementary open-source frameworks that support the development, evaluation, and comparison of GR algorithms in Gym-compatible environments. gr-libs includes modular implementations of MDP-based GR baselines, diagnostic tools, and evaluation utilities. gr-envs provides a curated suite of environments adapted for dynamic and goal-directed behavior, along with wrappers that ensure compatibility with standard reinforcement learning toolkits. Together, these libraries offer a standardized, extensible, and reproducible platform for advancing GR research. Both packages are open-source and available on GitHub and PyPI.
Leave No Observation Behind: Real-time Correction for VLA Action Chunks
To improve efficiency and temporal coherence, Vision-Language-Action (VLA) models often predict action chunks; however, this action chunking harms reactivity under inference delay and long horizons. We introduce Asynchronous Action Chunk Correction (A2C2), which is a lightweight real-time chunk correction head that runs every control step and adds a time-aware correction to any off-the-shelf VLA's action chunk. The module combines the latest observation, the predicted action from VLA (base action), a positional feature that encodes the index of the base action within the chunk, and some features from the base policy, then outputs a per-step correction. This preserves the base model's competence while restoring closed-loop responsiveness. The approach requires no retraining of the base policy and is orthogonal to asynchronous execution schemes such as Real Time Chunking (RTC). On the dynamic Kinetix task suite (12 tasks) and LIBERO Spatial, our method yields consistent success rate improvements across increasing delays and execution horizons (+23% point and +7% point respectively, compared to RTC), and also improves robustness for long horizons even with zero injected delay. Since the correction head is small and fast, there is minimal overhead compared to the inference of large VLA models. These results indicate that A2C2 is an effective, plug-in mechanism for deploying high-capacity chunking policies in real-time control.
Constrained Decoding for Robotics Foundation Models
Recent advances in the development of robotic foundation models have led to promising end-to-end and general-purpose capabilities in robotic systems. Trained on vast datasets of simulated and real-world trajectories, these models map multimodal observations directly to action sequences for physical execution. Despite promising real-world capabilities, these models are still data-driven and, therefore, lack explicit notions of behavioral correctness. We address this gap by introducing SafeDec, a constrained decoding framework for autoregressive, robot foundation models that enforces invariant safety specifications on candidate action trajectories. Task-specific safety rules are expressed as Signal Temporal Logic (STL) formulas and are enforced at inference time with minimal overhead. Our method ensures that generated actions provably satisfy STL specifications under assumed dynamics at runtime without retraining , while remaining agnostic of the underlying policy. We evaluate SafeDec on tasks from the CHORES benchmark for state-of-the-art generalist policies (e.g., SPOC, Flare, PoliFormer) across hundreds of procedurally generated environments and show that our decoding-time interventions are useful not only for filtering unsafe actions but also for conditional action generation. Videos are available at constrained-robot-fms.github.io.
Automotive-ENV: Benchmarking Multimodal Agents in Vehicle Interface Systems
Multimodal agents have demonstrated strong performance in general GUI interactions, but their application in automotive systems has been largely unexplored. In-vehicle GUIs present distinct challenges: drivers' limited attention, strict safety requirements, and complex location-based interaction patterns. To address these challenges, we introduce Automotive-ENV, the first high-fidelity benchmark and interaction environment tailored for vehicle GUIs. This platform defines 185 parameterized tasks spanning explicit control, implicit intent understanding, and safety-aware tasks, and provides structured multimodal observations with precise programmatic checks for reproducible evaluation. Building on this benchmark, we propose ASURADA, a geo-aware multimodal agent that integrates GPS-informed context to dynamically adjust actions based on location, environmental conditions, and regional driving norms. Experiments show that geo-aware information significantly improves success on safety-aware tasks, highlighting the importance of location-based context in automotive environments. We will release Automotive-ENV, complete with all tasks and benchmarking tools, to further the development of safe and adaptive in-vehicle agents.
comment: 10 pages, 5 figures,
Efficient Iterative Proximal Variational Inference Motion Planning
We cast motion planning under uncertainty as a stochastic optimal control problem, where the optimal posterior distribution has an explicit form. To approximate this posterior, this work frames an optimization problem in the space of Gaussian distributions by solving a Variational Inference (VI) in the path distribution space. For linear-Gaussian stochastic dynamics, a proximal algorithm is proposed to solve for an optimal Gaussian proposal iteratively. The computational bottleneck is evaluating the gradients with respect to the proposal over a dense trajectory. To tackle this issue, the sparse planning factor graph and Gaussian Belief Propagation (GBP) are exploited, allowing for parallel computation of these gradients on Graphics Processing Units (GPUs). We term the novel paradigm the \textit{Parallel Gaussian Variational Inference Motion Planning (P-GVIMP)}. Building on the efficient algorithm for linear Gaussian systems, we then propose an iterative paradigm based on Statistical Linear Regression (SLR) techniques to solve planning problems for nonlinear stochastic systems, where the P-GVIMP serves as a sub-routine for the linearized time-varying system at each iteration. The proposed framework is validated on various robotic systems, demonstrating significant speed acceleration achieved by leveraging parallel computation and successful planning solutions for nonlinear systems under uncertainty. An open-sourced implementation is presented at \href{https://github.com/hzyu17/VIMP}{https://github.com/hzyu17/VIMP}.
comment: 13 pages
Delta-Triplane Transformers as Occupancy World Models
Occupancy World Models (OWMs) aim to predict future scenes via 3D voxelized representations of the environment to support intelligent motion planning. Existing approaches typically generate full future occupancy states from VAE-style latent encodings, which can be computationally expensive and redundant. We propose Delta-Triplane Transformers (DTT), a novel 4D OWM for autonomous driving, that introduces two key innovations: (1) a triplane based representation that encodes 3D occupancy more compactly than previous approaches, and (2) an incremental prediction strategy for OWM that models {\em changes} in occupancy rather than dealing with full states. The core insight is that changes in the compact 3D latent space are naturally sparser and easier to model, enabling higher accuracy with a lighter-weight architecture. Building on this representation, DTT extracts multi-scale motion features from historical data and iteratively predict future triplane deltas. These deltas are combined with past states to decode future occupancy and ego-motion trajectories. Extensive experiments demonstrate that DTT delivers a 1.44$\times$ speedup (26 FPS) over the state of the art, improves mean IoU to 30.85, and reduces the mean absolute planning error to 1.0 meters. Demo videos are provided in the supplementary material.
Hypo-paradoxical Linkages: Linkages That Should Move-But Don't
While paradoxical linkages famously violate the Chebyshev-Grubler-Kutzbach criterion by exhibiting unexpected mobility, we identify an opposing phenomenon: a class of linkages that appear mobile according to the same criterion, yet are in fact rigid. We refer to these as hypo-paradoxical linkages, and proceed to analyze and illustrate their behavior. We use the same tools to further explain the unexpected positive mobility of Bennet mechanism.
comment: 15 pages, 8 figures
MARG: MAstering Risky Gap Terrains for Legged Robots with Elevation Mapping
Deep Reinforcement Learning (DRL) controllers for quadrupedal locomotion have demonstrated impressive performance on challenging terrains, allowing robots to execute complex skills such as climbing, running, and jumping. However, existing blind locomotion controllers often struggle to ensure safety and efficient traversal through risky gap terrains, which are typically highly complex, requiring robots to perceive terrain information and select appropriate footholds during locomotion accurately. Meanwhile, existing perception-based controllers still present several practical limitations, including a complex multi-sensor deployment system and expensive computing resource requirements. This paper proposes a DRL controller named MAstering Risky Gap Terrains (MARG), which integrates terrain maps and proprioception to dynamically adjust the action and enhance the robot's stability in these tasks. During the training phase, our controller accelerates policy optimization by selectively incorporating privileged information (e.g., center of mass, friction coefficients) that are available in simulation but unmeasurable directly in real-world deployments due to sensor limitations. We also designed three foot-related rewards to encourage the robot to explore safe footholds. More importantly, a terrain map generation (TMG) model is proposed to reduce the drift existing in mapping and provide accurate terrain maps using only one LiDAR, providing a foundation for zero-shot transfer of the learned policy. The experimental results indicate that MARG maintains stability in various risky terrain tasks.
Deep Reinforcement Learning for Bipedal Locomotion: A Brief Survey
Bipedal robots are gaining global recognition due to their potential applications and advancements in artificial intelligence, particularly through Deep Reinforcement Learning (DRL). While DRL has significantly advanced bipedal locomotion, the development of a unified framework capable of handling a wide range of tasks remains an ongoing challenge. This survey systematically categorises, compares, and analyses existing DRL frameworks for bipedal locomotion, organising them into end-to-end and hierarchical control schemes. End-to-end frameworks are evaluated based on their learning approaches, while hierarchical frameworks are examined in terms of layered structures that integrate learning-based or traditional model-based methods. We provide a detailed evaluation of the composition, strengths, limitations, and capabilities of each framework. Additionally, this survey identifies key research gaps and proposes future directions aimed at creating a more integrated and efficient framework for bipedal locomotion, with wide-ranging applications in real-world environments.
comment: 17 pages, 8 figures
Hierarchical Intention-Aware Expressive Motion Generation for Humanoid Robots
Effective human-robot interaction requires robots to identify human intentions and generate expressive, socially appropriate motions in real-time. Existing approaches often rely on fixed motion libraries or computationally expensive generative models. We propose a hierarchical framework that combines intention-aware reasoning via in-context learning (ICL) with real-time motion generation using diffusion models. Our system introduces structured prompting with confidence scoring, fallback behaviors, and social context awareness to enable intention refinement and adaptive response. Leveraging large-scale motion datasets and efficient latent-space denoising, the framework generates diverse, physically plausible gestures suitable for dynamic humanoid interactions. Experimental validation on a physical platform demonstrates the robustness and social alignment of our method in realistic scenarios.
comment: 7 pages, 2 figures, IEEE conference paper
Communication-Efficient Desire Alignment for Embodied Agent-Human Adaptation
While embodied agents have made significant progress in performing complex physical tasks, real-world applications demand more than pure task execution. The agents must collaborate with unfamiliar agents and human users, whose goals are often vague and implicit. In such settings, interpreting ambiguous instructions and uncovering underlying desires is essential for effective assistance. Therefore, fast and accurate desire alignment becomes a critical capability for embodied agents. In this work, we first develop a home assistance simulation environment HA-Desire that integrates an LLM-driven proxy human user exhibiting realistic value-driven goal selection and communication. The ego agent must interact with this proxy user to infer and adapt to the user's latent desires. To achieve this, we present a novel framework FAMER for fast desire alignment, which introduces a desire-based mental reasoning mechanism to identify user intent and filter desire-irrelevant actions. We further design a reflection-based communication module that reduces redundant inquiries, and incorporate goal-relevant information extraction with memory persistence to improve information reuse and reduce unnecessary exploration. Extensive experiments demonstrate that our framework significantly enhances both task execution and communication efficiency, enabling embodied agents to quickly adapt to user-specific desires in complex embodied environments.
RISE: Robust Imitation through Stochastic Encoding
Ensuring safety in robotic systems remains a fundamental challenge, especially when deploying offline policy-learning methods such as imitation learning in dynamic environments. Traditional behavior cloning (BC) often fails to generalize when deployed without fine-tuning as it does not account for disturbances in observations that arises in real-world, changing environments. To address this limitation, we propose RISE (Robust Imitation through Stochastic Encodings), a novel imitation-learning framework that explicitly addresses erroneous measurements of environment parameters into policy learning via a variational latent representation. Our framework encodes parameters such as obstacle state, orientation, and velocity into a smooth variational latent space to improve test time generalization. This enables an offline-trained policy to produce actions that are more robust to perceptual noise and environment uncertainty. We validate our approach on two robotic platforms, an autonomous ground vehicle and a Franka Emika Panda manipulator and demonstrate improved safety robustness while maintaining goal-reaching performance compared to baseline methods.
comment: 15 pages, 4 figures
Systems and Control (CS)
Distributionally robust LMI synthesis for LTI systems
This article shows that distributionally robust controller synthesis as investigated in \cite{taskesen2024distributionally} can be formulated as a convex linear matrix inequality (LMI) synthesis problem. To this end, we rely on well-established convexification techniques from robust control. The LMI synthesis problem we propose has the advantage that it can be solved efficiently using off-the-shelf semi-definite programming (SDP) solvers. In addition, our formulation exposes the studied distributionally robust controller synthesis problem as an instance of robust $H_2$ synthesis.
Robust Orientation Estimation with TRIAD-aided Manifold EKF
The manifold extended Kalman filter (Manifold EKF) has found extensive application for attitude determination. Magnetometers employed as sensors for such attitude determination are easily prone to disturbances by their sensitivity to calibration and external magnetic fields. The TRIAD (Tri-Axial Attitude Determination) algorithm is well known as a sub-optimal attitude estimator. In this article, we incorporate this sub-optimal feature of the TRIAD in mitigating the influence of the magnetometer reading in the pitch and roll axis determination in the Manifold EKF algorithm. We substantiate our results with experiments.
Optimizing the Network Topology of a Linear Reservoir Computer
Machine learning has become a fundamental approach for modeling, prediction, and control, enabling systems to learn from data and perform complex tasks. Reservoir computing is a machine learning tool that leverages high-dimensional dynamical systems to efficiently process temporal data for prediction and observation tasks. Traditionally, the connectivity of a reservoir computer (RC) is generated at random, lacking a principled design. Here, we focus on optimizing the topology of a linear RC to improve its performance and interpretability, which we achieve by decoupling the RC dynamics into a number of independent modes. We then proceed to optimize each one of these modes to perform a given task, which corresponds to selecting an optimal RC connectivity in terms of a given set of eigenvalues of the RC adjacency matrix. Simulations on networks of varying sizes show that the optimized RC significantly outperforms randomly constructed reservoirs in both the training and testing phases and also often surpasses nonlinear reservoirs of comparable size. This approach provides both practical performance advantages and theoretical guidelines for designing efficient, task-specific, and analytically transparent RC architectures.
Efficient Norm-Based Reachable Sets via Iterative Dynamic Programming
In this work, we present a numerical optimal control framework for reachable set computation using \emph{normotopes}, a new set representation as a norm ball with a shaping matrix. In reachable set computations, we expect to continuously vary the shape matrix as a function of time. Incorporating the shape dynamics as an input, we build a \emph{controlled embedding system} using a linear differential inclusion overapproximating the dynamics of the system, where a single forward simulation of this embedding system always provides an overapproximating reachable set of the system, no matter the choice of \emph{hypercontrol}. By iteratively solving a linear quadratic approximation of the nonlinear optimal hypercontrol problem, we synthesize less conservative final reachable sets, providing a natural tradeoff between runtime and accuracy. Terminating our algorithm at any point always returns a valid reachable set overapproximation.
Sensitivity Analysis for Antenna Devices through Interval Arithmetic -- A Generalized Approach
This paper presents a novel method for the sensitivity analysis of electromagnetic (EM) systems whose transfer function (TF), that is the input-output (I/O) relationship between the input parameters affected by tolerance and the system response (i.e., the arising EM performance of interest), is not available in closed (or explicit) form. The method is a generalized analytic technique based on the Interval Analysis (IA). First, an analytic surrogate model (SM) of the TF is defined by means of a learning-by-example (LBE) approach starting from a set of available I/O examples. Then, the LBE-derived SM is extended to intervals through IA to yield inclusive, yet finite, performance bounds of the output system response when the control parameters are affected by unknown, but bounded, tolerances. A set of representative numerical examples is reported to validate the proposed IA-LBE method as well as to assess its effectiveness and reliability when dealing with realistic EM systems (e.g., antennas) for which the TF is not explicitly known.
Dual-Function Beam Pattern Design for Multi-Target ISAC Systems: A Decoupled Approach
We investigate the beampattern design problem for mono-static multi-user (MU) multi-point-target integrated sensing and communication (ISAC) systems, where a dual-function multiple-input multiple-output (DF-MIMO) base station (BS) performs downlink communication and radar sensing simultaneously. In ISAC systems, sensing and communication inherently compete for resources. As communication demand increases, the beam pattern is reshaped, which might degrade the direction of arrival (DoA) sensing accuracy, measured in terms of mean-squared error (MSE) and lower-bounded by the Cramer-Rao lower bound (CRLB). Since conventional joint formulations of the sensing-based problem often overlook this trade-off, our work addresses it by decomposing the sensing-based problem into two subproblems (SPs). This decomposition enables a more effective exploitation of the beam pattern's physical properties, which we refer to as the Sensing-Guided Communication Dual-Function (SGCDF) beam pattern design. We further develop a low-complexity extension using the Riemannian Manifold Optimization (RMO) and convex closed-set projection. Simulation results confirm that the proposed method improves multi-target estimation accuracy, compared to traditional joint optimization strategies, by preserving the beam pattern, while the low-complexity version offers an excellent performance-complexity tradeoff, maintaining high accuracy with significantly reduced computational cost.
comment: 13 pages; 7 figures; tables; 3 tables; manuscript submitted to IEEE journal
Incorporating flexibility and resilience demand into capacity market considering the guidance on generation investment
The capacity market provides economic guidance for generation investment and ensures the adequacy of generation capability for power systems. With the rapidly increasing proportion of renewable energy, the adequacy of flexibility and resilience becomes more crucial for the secure operation of power systems. In this context, this paper incorporates the flexibility and resilience demand into the capacity market by formulating the capacity demand curves for ramping capability, inertia and recovery capabilities besides the generation capability. The guidance on generation investment of the capacity market is also taken into account by solving the generation investment equilibrium among generation companies with a Nash Cournot model employing an equivalent quadratic programming formulation. The overall problem is established as a trilevel game and an iterative algorithm is devised to formulate the capacity demand curves in the upper level based on Genco's investment acquired from the middle and lower levels. The case study further demonstrates that to incorporate flexibility and resilience demand into the capacity market could stimulate proper generation investment and ensure the adequacy of flexibility and resilience in power systems.
Leave No Observation Behind: Real-time Correction for VLA Action Chunks
To improve efficiency and temporal coherence, Vision-Language-Action (VLA) models often predict action chunks; however, this action chunking harms reactivity under inference delay and long horizons. We introduce Asynchronous Action Chunk Correction (A2C2), which is a lightweight real-time chunk correction head that runs every control step and adds a time-aware correction to any off-the-shelf VLA's action chunk. The module combines the latest observation, the predicted action from VLA (base action), a positional feature that encodes the index of the base action within the chunk, and some features from the base policy, then outputs a per-step correction. This preserves the base model's competence while restoring closed-loop responsiveness. The approach requires no retraining of the base policy and is orthogonal to asynchronous execution schemes such as Real Time Chunking (RTC). On the dynamic Kinetix task suite (12 tasks) and LIBERO Spatial, our method yields consistent success rate improvements across increasing delays and execution horizons (+23% point and +7% point respectively, compared to RTC), and also improves robustness for long horizons even with zero injected delay. Since the correction head is small and fast, there is minimal overhead compared to the inference of large VLA models. These results indicate that A2C2 is an effective, plug-in mechanism for deploying high-capacity chunking policies in real-time control.
SAC-Loco: Safe and Adjustable Compliant Quadrupedal Locomotion
Quadruped robots are designed to achieve agile locomotion by mimicking legged animals. However, existing control methods for quadrupeds often lack one of the key capabilities observed in animals: adaptive and adjustable compliance in response to external disturbances. Most locomotion controllers do not provide tunable compliance and tend to fail under large perturbations. In this work, we propose a switched policy framework for compliant and safe quadruped locomotion. First, we train a force compliant policy with adjustable compliance levels using a teacher student reinforcement learning framework, eliminating the need for explicit force sensing. Next, we develop a safe policy based on the capture point concept to stabilize the robot when the compliant policy fails. Finally, we introduce a recoverability network that predicts the likelihood of failure and switches between the compliant and safe policies. Together, this framework enables quadruped robots to achieve both force compliance and robust safety when subjected to severe external disturbances.
Markov Modeling for Licensed and Unlicensed Band Allocation in Underlay and Overlay D2D
In this paper, a novel analytical model for resource allocation is proposed for a device-to-device (D2D) assisted cellular network. The proposed model can be applied to underlay and overlay D2D systems for sharing licensed bands and offloading cellular traffic. The developed model also takes into account the problem of unlicensed band sharing with Wi-Fi systems. In the proposed model, a global system state reflects the interaction among D2D, conventional cellular, and Wi-Fi packets. Under the standard traffic model assumptions, a threshold-based flow control is proposed for guaranteeing the quality-of-service (QoS) of Wi-Fi. The packet blockage probability is then derived. Simulation results show the proposed scheme sacrifices conventional cellular performance slightly to improve overlay D2D performance significantly while maintaining the performance for Wi-Fi users. Meanwhile, the proposed scheme has more flexible adjustments between D2D and Wi-Fi than the underlay scheme.
comment: 10 pages, 5 figures, published in 2024 IEEE ICC
Modeling the Unlicensed Band Allocation for LAA With Buffering Mechanism
In this letter, we propose an analytical model and conduct simulation experiments to study listen-before-talk-based unlicensed band allocation with the buffering mechanism for the License-Assisted Access (LAA) packets in the heterogeneous networks. In such a network, unlicensed band allocation for LAA and Wi-Fi is an important issue, which may affect the quality of service for both systems significantly. We evaluate the performance of these unlicensed band allocations in terms of the acceptance rate of both LAA and Wi-Fi packets. This letter provides the guidelines for designing the channel occupation phase and buffer threshold of the LAA systems.
comment: 5 pages, 3 figures, 2 tables, published in IEEE Communications Letters
Unlicensed Band Allocation for Heterogeneous Networks
Based on the License-Assisted Access (LAA) small cell architecture, the LAA coexisting with Wi-Fi heterogeneous networks provides LTE mobile users with high bandwidth efficiency as the unlicensed channels are shared among LAA and Wi-Fi. However, LAA and Wi-Fi interfere with each other when both systems use the same unlicensed channel in heterogeneous networks. In such a network, unlicensed band allocation for LAA and Wi-Fi is an important issue that may affect the quality of service (QoS) of both systems significantly. In this paper, we propose an analytical model and conduct simulation experiments to study four allocations for the unlicensed band: unlicensed full allocation (UFA), unlicensed time-division allocation (UTA), and UFA/UTA with buffering mechanism (UFAB and UTAB) for the LAA data packets. We evaluate the performance of these unlicensed band allocation schemes in terms of the acceptance rate of both LAA and Wi-Fi packet data in the LAA buffer queue. Our study provides guidelines for designing the channel occupation phase and the buffer size of the LAA small cell.
comment: 14 pages, 12 figures, 1 table, published in IEICE Transactions on Communications
Simulated Annealing for Multi-Robot Ergodic Information Acquisition Using Graph-Based Discretization
One of the goals of active information acquisition using multi-robot teams is to keep the relative uncertainty in each region at the same level to maintain identical acquisition quality (e.g., consistent target detection) in all the regions. To achieve this goal, ergodic coverage can be used to assign the number of samples according to the quality of observation, i.e., sampling noise levels. However, the noise levels are unknown to the robots. Although this noise can be estimated from samples, the estimates are unreliable at first and can generate fluctuating values. The main contribution of this paper is to use simulated annealing to generate the target sampling distribution, starting from uniform and gradually shifting to an estimated optimal distribution, by varying the coldness parameter of a Boltzmann distribution with the estimated sampling entropy as energy. Simulation results show a substantial improvement of both transient and asymptotic entropy compared to both uniform and direct-ergodic searches. Finally, a demonstration is performed with a TurtleBot swarm system to validate the physical applicability of the algorithm.
MathBode: Frequency-Domain Fingerprints of LLM Mathematical Reasoning
This paper presents MathBode, a dynamic diagnostic for mathematical reasoning in large language models (LLMs). Instead of one-shot accuracy, MathBode treats each parametric problem as a system: we drive a single parameter sinusoidally and fit first-harmonic responses of model outputs and exact solutions. This yields interpretable, frequency-resolved metrics -- gain (amplitude tracking) and phase (lag) -- that form Bode-style fingerprints. Across five closed-form families (linear solve, ratio/saturation, compound interest, 2x2 linear systems, similar triangles), the diagnostic surfaces systematic low-pass behavior and growing phase lag that accuracy alone obscures. We compare several models against a symbolic baseline that calibrates the instrument ($G \approx 1$, $\phi \approx 0$). Results separate frontier from mid-tier models on dynamics, providing a compact, reproducible protocol that complements standard benchmarks with actionable measurements of reasoning fidelity and consistency. We open-source the dataset and code to enable further research and adoption.
Fractional Logistic Growth with Memory Effects: A Tool for Industry-Oriented Modeling
The logistic growth model is a classical framework for describing constrained growth phenomena, widely applied in areas such as population dynamics, epidemiology, and resource management. This study presents a generalized extension using Atangana-Baleanu in Caputo sense (ABC)-type fractional derivatives. Proportional time delay is also included, allowing the model to capture memory-dependent and nonlocal dynamics not addressed in classical formulations. Free parameters provide flexibility for modeling complex growth in industrial, medical, and social systems. The Hybrid Sumudu Variational (HSV) method is employed to efficiently obtain semi-analytical solutions. Results highlight the combined effects of fractional order and delay on system behavior. This approach demonstrates the novelty of integrating ABC-type derivatives, proportional delay, and HSV-based solutions for real-world applications.
comment: 15
Optimality of Linear Policies in Distributionally Robust Linear Quadratic Control
We study a generalization of the classical discrete-time, Linear-Quadratic-Gaussian (LQG) control problem where the noise distributions affecting the states and observations are unknown and chosen adversarially from divergence-based ambiguity sets centered around a known nominal distribution. For a finite horizon model with Gaussian nominal noise and a structural assumption on the divergence that is satisfied by many examples -- including 2-Wasserstein distance, Kullback-Leibler divergence, moment-based divergences, entropy-regularized optimal transport, or Fisher (score-matching) divergence -- we prove that a control policy that is affine in the observations is optimal and the adversary's corresponding worst-case optimal distribution is Gaussian. When the nominal means are zero (as in the classical LQG model), we show that the adversary should optimally set the distribution's mean to zero and the optimal control policy becomes linear. Moreover, the adversary should optimally ``inflate" the noise by choosing covariance matrices that dominate the nominal covariance in Loewner order. Exploiting these structural properties, we develop a Frank-Wolfe algorithm whose inner step solves standard LQG subproblems via Kalman filtering and dynamic programming and show that the implementation consistently outperforms semidefinite-programming reformulations of the problem. All structural and algorithmic results extend to an infinite-horizon, average-cost formulation, yielding stationary linear policies and a time-invariant Gaussian distribution for the adversary. Lastly, we show that when the divergence is 2-Wasserstein, the entire framework remains valid when the nominal distributions are elliptical rather than Gaussian.
Geometry-Aware Decentralized Sinkhorn for Wasserstein Barycenters
Distributed systems require fusing heterogeneous local probability distributions into a global summary over sparse and unreliable communication networks. Traditional consensus algorithms, which average distributions in Euclidean space, ignore their inherent geometric structure, leading to misleading results. Wasserstein barycenters offer a geometry-aware alternative by minimizing optimal transport costs, but their entropic approximations via the Sinkhorn algorithm typically require centralized coordination. This paper proposes a fully decentralized Sinkhorn algorithm that reformulates the centralized geometric mean as an arithmetic average in the log-domain, enabling approximation through local gossip protocols. Agents exchange log-messages with neighbors, interleaving consensus phases with local updates to mimic centralized iterations without a coordinator. To optimize bandwidth, we integrate event-triggered transmissions and b-bit quantization, providing tunable trade-offs between accuracy and communication while accommodating asynchrony and packet loss. Under mild assumptions, we prove convergence to a neighborhood of the centralized entropic barycenter, with bias linearly dependent on consensus tolerance, trigger threshold, and quantization error. Complexity scales near-linearly with network size. Simulations confirm near-centralized accuracy with significantly fewer messages, across various topologies and conditions.
Koopman-based control using sum-of-squares optimization: Improved stability guarantees and data efficiency
In this paper, we propose a novel controller design approach for unknown nonlinear systems using the Koopman operator. In particular, we use the recently proposed stability- and feedback-oriented extended dynamic mode decomposition (SafEDMD) architecture to generate a data-driven bilinear surrogate model with certified error bounds. Then, by accounting for the obtained error bounds in a controller design based on the bilinear system, one can guarantee closed-loop stability for the true nonlinear system. While existing approaches over-approximate the bilinearity of the surrogate model, thus introducing conservatism and providing only local guarantees, we explicitly account for the bilinearity by using sum-of-squares (SOS) optimization in the controller design. More precisely, we parametrize a rational controller stabilizing the error-affected bilinear surrogate model and, consequently, the underlying nonlinear system. The resulting SOS optimization problem provides explicit data-driven controller design conditions for unknown nonlinear systems based on semidefinite programming. Our approach significantly reduces conservatism by establishing a larger region of attraction and improved data efficiency. The proposed method is evaluated using numerical examples, demonstrating its advantages over existing approaches.
comment: Accepted for publication in the European Journal of Control, 2025
A Hybrid TDMA/CSMA Protocol for Time-Sensitive Traffic in Robot Applications
Recent progress in robotics has underscored the demand for real-time control in applications such as manufacturing, healthcare, and autonomous systems, where the timely delivery of mission-critical commands under heterogeneous robotic traffic is paramount for operational efficacy and safety. In these scenarios, mission-critical traffic follows a strict deadline-constrained communication pattern: commands must arrive within defined QoS deadlines, otherwise late arrivals can degrade performance or destabilize control loops.In this work, we demonstrate on a real-time SDR platform that CSMA, widely adopted in robotic communications,suffers severe degradation under high robot traffic loads, with contention-induced collisions and delays disrupting the on-time arrival of mission-critical packets. To address this problem, we propose an IEEE 802.11-compatible hybrid TDMA/CSMA protocol that combines TDMA's deterministic slot scheduling with CSMA's adaptability for heterogeneous robot traffic.The protocol achieves collision-free, low-latency mission-critical command delivery and IEEE 802.11 compatibility through the synergistic integration of sub-microsecond PTP-based slot synchronization-essential for establishing precise timing for TDMA, a three-session superframe with dynamic TDMA allocation for structured and adaptable traffic management,and beacon-NAV protection to preemptively secure these critical communication sessions from interference. Emulation experiments on real-time SDR testbed and Robot Operating System (ROS) simulation show that the proposed protocol reduces missed-deadline errors by 93% compared to the CSMA baseline. In high-speed robot path-tracking ROS simulations, the protocol lowers Root Mean Square (RMS) trajectory error by up to 90% compared with a CSMA baseline, all while maintaining throughput for non-critical traffic within +-2%.
Rapidly Converging Time-Discounted Ergodicity on Graphs for Active Inspection of Confined Spaces
Ergodic exploration has spawned a lot of interest in mobile robotics due to its ability to design time trajectories that match desired spatial coverage statistics. However, current ergodic approaches are for continuous spaces, which require detailed sensory information at each point and can lead to fractal-like trajectories that cannot be tracked easily. This paper presents a new ergodic approach for graph-based discretization of continuous spaces. It also introduces a new time-discounted ergodicity metric, wherein early visitations of information-rich nodes are weighted more than late visitations. A Markov chain synthesized using a convex program is shown to converge more rapidly to time-discounted ergodicity than the traditional fastest mixing Markov chain. The resultant ergodic traversal method is used within a hierarchical framework for active inspection of confined spaces with the goal of detecting anomalies robustly using SLAM-driven Bayesian hypothesis testing. Experiments on a ground robot show the advantages of this framework over three continuous space ergodic planners as well as greedy and random exploration methods for left-behind foreign object debris detection in a ballast tank.
Frequency-Domain Bounds for the Multiconductor Telegrapher's Equation
We establish mathematical bounds on the chain, ABCD and immittance matrices of a multiconductor transmission line, based on the Telegrapher's equation. Closed-form expressions for those matrices are also presented. Existing results that hold on the imaginary axis are extended to the complex plane, without reliance on a diagonalizability assumption that is prevalent in the literature. Therefore, the results remain valid under arbitrary arrangements of the conducting wires, which include electrical faults. The analysis ultimately reveals important system-theoretic implications of the Telegrapher's equation, which are of general relevance to control, power systems, and signal processing involving multiconductor transmission lines.
comment: First revision: 10 pages, 1 figure
Multiagent Systems
Situational Awareness for Safe and Robust Multi-Agent Interactions Under Uncertainty
Multi-agent systems are prevalent in a wide range of domains including power systems, vehicular networks, and robotics. Two important problems to solve in these types of systems are how the intentions of non-coordinating agents can be determined to predict future behavior and how the agents can achieve their objectives under resource constraints without significantly sacrificing performance. To study this, we develop a model where an autonomous agent observes the environment within a safety radius of observation, determines the state of a surrounding agent of interest (within the observation radius), estimates future actions to be taken, and acts in an optimal way. In the absence of observations, agents are able to utilize an estimation algorithm to predict the future actions of other agents based on historical trajectory. The use of the proposed estimation algorithm introduces uncertainty, which is managed via risk analysis. The proposed approach in this study is validated using two different learning-based decision making frameworks: reinforcement learning and game theoretic algorithms.
Grouped Satisficing Paths in Pure Strategy Games: a Topological Perspective
In game theory and multi-agent reinforcement learning (MARL), each agent selects a strategy, interacts with the environment and other agents, and subsequently updates its strategy based on the received payoff. This process generates a sequence of joint strategies $(s^t)_{t \geq 0}$, where $s^t$ represents the strategy profile of all agents at time step $t$. A widely adopted principle in MARL algorithms is "win-stay, lose-shift", which dictates that an agent retains its current strategy if it achieves the best response. This principle exhibits a fixed-point property when the joint strategy has become an equilibrium. The sequence of joint strategies under this principle is referred to as a satisficing path, a concept first introduced in [40] and explored in the context of $N$-player games in [39]. A fundamental question arises regarding this principle: Under what conditions does every initial joint strategy $s$ admit a finite-length satisficing path $(s^t)_{0 \leq t \leq T}$ where $s^0=s$ and $s^T$ is an equilibrium? This paper establishes a sufficient condition for such a property, and demonstrates that any finite-state Markov game, as well as any $N$-player game, guarantees the existence of a finite-length satisficing path from an arbitrary initial strategy to some equilibrium. These results provide a stronger theoretical foundation for the design of MARL algorithms.
Coordination Requires Simplification: Thermodynamic Bounds on Multi-Objective Compromise in Natural and Artificial Intelligence
Information-processing systems coordinating across multiple agents and objectives face fundamental thermodynamic constraints. We show that solutions with maximum utility to act as coordination focal points have much higher selection pressure for being findable across agents rather than accuracy. We derive that the information-theoretic minimum description length of coordination protocols to precision $\varepsilon$ scales as $L(P)\geq NK\log_2 K+N^2d^2\log (1/\varepsilon)$ for $N$ agents with $d$ potentially conflicting objectives and internal model complexity $K$. This scaling forces progressive simplification, with coordination dynamics changing the environment itself and shifting optimization across hierarchical levels. Moving from established focal points requires re-coordination, creating persistent metastable states and hysteresis until significant environmental shifts trigger phase transitions through spontaneous symmetry breaking. We operationally define coordination temperature to predict critical phenomena and estimate coordination work costs, identifying measurable signatures across systems from neural networks to restaurant bills to bureaucracies. Extending the topological version of Arrow's theorem on the impossibility of consistent preference aggregation, we find it recursively binds whenever preferences are combined. This potentially explains the indefinite cycling in multi-objective gradient descent and alignment faking in Large Language Models trained with reinforcement learning with human feedback. We term this framework Thermodynamic Coordination Theory (TCT), which demonstrates that coordination requires radical information loss.
comment: 11 pages, 1 figure, 6 pages supplementary material, submitted to Physical Review E
Game-Theoretic Understandings of Multi-Agent Systems with Multiple Objectives
In practical multi-agent systems, agents often have diverse objectives, which makes the system more complex, as each agent's performance across multiple criteria depends on the joint actions of all agents, creating intricate strategic trade-offs. To address this, we introduce the Multi-Objective Markov Game (MOMG), a framework for multi-agent reinforcement learning with multiple objectives. We propose the Pareto-Nash Equilibrium (PNE) as the primary solution concept, where no agent can unilaterally improve one objective without sacrificing performance on another. We prove existence of PNE, and establish an equivalence between the PNE and the set of Nash Equilibria of MOMG's corresponding linearly scalarized games, enabling solutions of MOMG by transferring to a standard single-objective Markov game. However, we note that computing a PNE is theoretically and computationally challenging, thus we propose and study weaker but more tractable solution concepts. Building on these foundations, we develop online learning algorithm that identify a single solution to MOMGs. Furthermore, we propose a two-phase, preference-free algorithm that decouples exploration from planning. Our algorithm enables computation of a PNE for any given preference profile without collecting new samples, providing an efficient methodological characterization of the entire Pareto-Nash front.
comment: Preprint, work in progress
Communication-Efficient Desire Alignment for Embodied Agent-Human Adaptation
While embodied agents have made significant progress in performing complex physical tasks, real-world applications demand more than pure task execution. The agents must collaborate with unfamiliar agents and human users, whose goals are often vague and implicit. In such settings, interpreting ambiguous instructions and uncovering underlying desires is essential for effective assistance. Therefore, fast and accurate desire alignment becomes a critical capability for embodied agents. In this work, we first develop a home assistance simulation environment HA-Desire that integrates an LLM-driven proxy human user exhibiting realistic value-driven goal selection and communication. The ego agent must interact with this proxy user to infer and adapt to the user's latent desires. To achieve this, we present a novel framework FAMER for fast desire alignment, which introduces a desire-based mental reasoning mechanism to identify user intent and filter desire-irrelevant actions. We further design a reflection-based communication module that reduces redundant inquiries, and incorporate goal-relevant information extraction with memory persistence to improve information reuse and reduce unnecessary exploration. Extensive experiments demonstrate that our framework significantly enhances both task execution and communication efficiency, enabling embodied agents to quickly adapt to user-specific desires in complex embodied environments.
Cost-Aware Opinion Dynamics in Multi-Agents Systems under Malicious Agent Influence
In many MASs, links to malicious agents cannot be severed immediately. Under these conditions, averaging-only consensus mechanisms typically lack sufficient resistance, leaving the system vulnerable to harmful deviations. To address this challenge, this brief leverages the Boomerang Effect from sociology, which drives normal agents to firmly reject malicious inputs, although this strategy may appear overly cautious. Thus, this brief emphasizes the necessity of acknowledging the resulting trade-off between cost and convergence speed in practice. To address this, the additional costs induced by Boomerang-style fusion is analyzed and a cost aware evolution rate adjustment mechanism is proposed. Multi-robot simulations demonstrate that this mechanism suppresses excess costs while maintaining resilience to extremist disruptions and ensuring stable convergence, enabling MAS to efficiently develop in a ethical order.
comment: 7 pages, 9 figures, 2 tables. This work has been submitted to the IEEE for possible publication
Systems and Control (EESS)
Distributionally robust LMI synthesis for LTI systems
This article shows that distributionally robust controller synthesis as investigated in \cite{taskesen2024distributionally} can be formulated as a convex linear matrix inequality (LMI) synthesis problem. To this end, we rely on well-established convexification techniques from robust control. The LMI synthesis problem we propose has the advantage that it can be solved efficiently using off-the-shelf semi-definite programming (SDP) solvers. In addition, our formulation exposes the studied distributionally robust controller synthesis problem as an instance of robust $H_2$ synthesis.
Robust Orientation Estimation with TRIAD-aided Manifold EKF
The manifold extended Kalman filter (Manifold EKF) has found extensive application for attitude determination. Magnetometers employed as sensors for such attitude determination are easily prone to disturbances by their sensitivity to calibration and external magnetic fields. The TRIAD (Tri-Axial Attitude Determination) algorithm is well known as a sub-optimal attitude estimator. In this article, we incorporate this sub-optimal feature of the TRIAD in mitigating the influence of the magnetometer reading in the pitch and roll axis determination in the Manifold EKF algorithm. We substantiate our results with experiments.
Optimizing the Network Topology of a Linear Reservoir Computer
Machine learning has become a fundamental approach for modeling, prediction, and control, enabling systems to learn from data and perform complex tasks. Reservoir computing is a machine learning tool that leverages high-dimensional dynamical systems to efficiently process temporal data for prediction and observation tasks. Traditionally, the connectivity of a reservoir computer (RC) is generated at random, lacking a principled design. Here, we focus on optimizing the topology of a linear RC to improve its performance and interpretability, which we achieve by decoupling the RC dynamics into a number of independent modes. We then proceed to optimize each one of these modes to perform a given task, which corresponds to selecting an optimal RC connectivity in terms of a given set of eigenvalues of the RC adjacency matrix. Simulations on networks of varying sizes show that the optimized RC significantly outperforms randomly constructed reservoirs in both the training and testing phases and also often surpasses nonlinear reservoirs of comparable size. This approach provides both practical performance advantages and theoretical guidelines for designing efficient, task-specific, and analytically transparent RC architectures.
Efficient Norm-Based Reachable Sets via Iterative Dynamic Programming
In this work, we present a numerical optimal control framework for reachable set computation using \emph{normotopes}, a new set representation as a norm ball with a shaping matrix. In reachable set computations, we expect to continuously vary the shape matrix as a function of time. Incorporating the shape dynamics as an input, we build a \emph{controlled embedding system} using a linear differential inclusion overapproximating the dynamics of the system, where a single forward simulation of this embedding system always provides an overapproximating reachable set of the system, no matter the choice of \emph{hypercontrol}. By iteratively solving a linear quadratic approximation of the nonlinear optimal hypercontrol problem, we synthesize less conservative final reachable sets, providing a natural tradeoff between runtime and accuracy. Terminating our algorithm at any point always returns a valid reachable set overapproximation.
Sensitivity Analysis for Antenna Devices through Interval Arithmetic -- A Generalized Approach
This paper presents a novel method for the sensitivity analysis of electromagnetic (EM) systems whose transfer function (TF), that is the input-output (I/O) relationship between the input parameters affected by tolerance and the system response (i.e., the arising EM performance of interest), is not available in closed (or explicit) form. The method is a generalized analytic technique based on the Interval Analysis (IA). First, an analytic surrogate model (SM) of the TF is defined by means of a learning-by-example (LBE) approach starting from a set of available I/O examples. Then, the LBE-derived SM is extended to intervals through IA to yield inclusive, yet finite, performance bounds of the output system response when the control parameters are affected by unknown, but bounded, tolerances. A set of representative numerical examples is reported to validate the proposed IA-LBE method as well as to assess its effectiveness and reliability when dealing with realistic EM systems (e.g., antennas) for which the TF is not explicitly known.
Dual-Function Beam Pattern Design for Multi-Target ISAC Systems: A Decoupled Approach
We investigate the beampattern design problem for mono-static multi-user (MU) multi-point-target integrated sensing and communication (ISAC) systems, where a dual-function multiple-input multiple-output (DF-MIMO) base station (BS) performs downlink communication and radar sensing simultaneously. In ISAC systems, sensing and communication inherently compete for resources. As communication demand increases, the beam pattern is reshaped, which might degrade the direction of arrival (DoA) sensing accuracy, measured in terms of mean-squared error (MSE) and lower-bounded by the Cramer-Rao lower bound (CRLB). Since conventional joint formulations of the sensing-based problem often overlook this trade-off, our work addresses it by decomposing the sensing-based problem into two subproblems (SPs). This decomposition enables a more effective exploitation of the beam pattern's physical properties, which we refer to as the Sensing-Guided Communication Dual-Function (SGCDF) beam pattern design. We further develop a low-complexity extension using the Riemannian Manifold Optimization (RMO) and convex closed-set projection. Simulation results confirm that the proposed method improves multi-target estimation accuracy, compared to traditional joint optimization strategies, by preserving the beam pattern, while the low-complexity version offers an excellent performance-complexity tradeoff, maintaining high accuracy with significantly reduced computational cost.
comment: 13 pages; 7 figures; tables; 3 tables; manuscript submitted to IEEE journal
Incorporating flexibility and resilience demand into capacity market considering the guidance on generation investment
The capacity market provides economic guidance for generation investment and ensures the adequacy of generation capability for power systems. With the rapidly increasing proportion of renewable energy, the adequacy of flexibility and resilience becomes more crucial for the secure operation of power systems. In this context, this paper incorporates the flexibility and resilience demand into the capacity market by formulating the capacity demand curves for ramping capability, inertia and recovery capabilities besides the generation capability. The guidance on generation investment of the capacity market is also taken into account by solving the generation investment equilibrium among generation companies with a Nash Cournot model employing an equivalent quadratic programming formulation. The overall problem is established as a trilevel game and an iterative algorithm is devised to formulate the capacity demand curves in the upper level based on Genco's investment acquired from the middle and lower levels. The case study further demonstrates that to incorporate flexibility and resilience demand into the capacity market could stimulate proper generation investment and ensure the adequacy of flexibility and resilience in power systems.
Leave No Observation Behind: Real-time Correction for VLA Action Chunks
To improve efficiency and temporal coherence, Vision-Language-Action (VLA) models often predict action chunks; however, this action chunking harms reactivity under inference delay and long horizons. We introduce Asynchronous Action Chunk Correction (A2C2), which is a lightweight real-time chunk correction head that runs every control step and adds a time-aware correction to any off-the-shelf VLA's action chunk. The module combines the latest observation, the predicted action from VLA (base action), a positional feature that encodes the index of the base action within the chunk, and some features from the base policy, then outputs a per-step correction. This preserves the base model's competence while restoring closed-loop responsiveness. The approach requires no retraining of the base policy and is orthogonal to asynchronous execution schemes such as Real Time Chunking (RTC). On the dynamic Kinetix task suite (12 tasks) and LIBERO Spatial, our method yields consistent success rate improvements across increasing delays and execution horizons (+23% point and +7% point respectively, compared to RTC), and also improves robustness for long horizons even with zero injected delay. Since the correction head is small and fast, there is minimal overhead compared to the inference of large VLA models. These results indicate that A2C2 is an effective, plug-in mechanism for deploying high-capacity chunking policies in real-time control.
SAC-Loco: Safe and Adjustable Compliant Quadrupedal Locomotion
Quadruped robots are designed to achieve agile locomotion by mimicking legged animals. However, existing control methods for quadrupeds often lack one of the key capabilities observed in animals: adaptive and adjustable compliance in response to external disturbances. Most locomotion controllers do not provide tunable compliance and tend to fail under large perturbations. In this work, we propose a switched policy framework for compliant and safe quadruped locomotion. First, we train a force compliant policy with adjustable compliance levels using a teacher student reinforcement learning framework, eliminating the need for explicit force sensing. Next, we develop a safe policy based on the capture point concept to stabilize the robot when the compliant policy fails. Finally, we introduce a recoverability network that predicts the likelihood of failure and switches between the compliant and safe policies. Together, this framework enables quadruped robots to achieve both force compliance and robust safety when subjected to severe external disturbances.
Markov Modeling for Licensed and Unlicensed Band Allocation in Underlay and Overlay D2D
In this paper, a novel analytical model for resource allocation is proposed for a device-to-device (D2D) assisted cellular network. The proposed model can be applied to underlay and overlay D2D systems for sharing licensed bands and offloading cellular traffic. The developed model also takes into account the problem of unlicensed band sharing with Wi-Fi systems. In the proposed model, a global system state reflects the interaction among D2D, conventional cellular, and Wi-Fi packets. Under the standard traffic model assumptions, a threshold-based flow control is proposed for guaranteeing the quality-of-service (QoS) of Wi-Fi. The packet blockage probability is then derived. Simulation results show the proposed scheme sacrifices conventional cellular performance slightly to improve overlay D2D performance significantly while maintaining the performance for Wi-Fi users. Meanwhile, the proposed scheme has more flexible adjustments between D2D and Wi-Fi than the underlay scheme.
comment: 10 pages, 5 figures, published in 2024 IEEE ICC
Modeling the Unlicensed Band Allocation for LAA With Buffering Mechanism
In this letter, we propose an analytical model and conduct simulation experiments to study listen-before-talk-based unlicensed band allocation with the buffering mechanism for the License-Assisted Access (LAA) packets in the heterogeneous networks. In such a network, unlicensed band allocation for LAA and Wi-Fi is an important issue, which may affect the quality of service for both systems significantly. We evaluate the performance of these unlicensed band allocations in terms of the acceptance rate of both LAA and Wi-Fi packets. This letter provides the guidelines for designing the channel occupation phase and buffer threshold of the LAA systems.
comment: 5 pages, 3 figures, 2 tables, published in IEEE Communications Letters
Unlicensed Band Allocation for Heterogeneous Networks
Based on the License-Assisted Access (LAA) small cell architecture, the LAA coexisting with Wi-Fi heterogeneous networks provides LTE mobile users with high bandwidth efficiency as the unlicensed channels are shared among LAA and Wi-Fi. However, LAA and Wi-Fi interfere with each other when both systems use the same unlicensed channel in heterogeneous networks. In such a network, unlicensed band allocation for LAA and Wi-Fi is an important issue that may affect the quality of service (QoS) of both systems significantly. In this paper, we propose an analytical model and conduct simulation experiments to study four allocations for the unlicensed band: unlicensed full allocation (UFA), unlicensed time-division allocation (UTA), and UFA/UTA with buffering mechanism (UFAB and UTAB) for the LAA data packets. We evaluate the performance of these unlicensed band allocation schemes in terms of the acceptance rate of both LAA and Wi-Fi packet data in the LAA buffer queue. Our study provides guidelines for designing the channel occupation phase and the buffer size of the LAA small cell.
comment: 14 pages, 12 figures, 1 table, published in IEICE Transactions on Communications
Simulated Annealing for Multi-Robot Ergodic Information Acquisition Using Graph-Based Discretization
One of the goals of active information acquisition using multi-robot teams is to keep the relative uncertainty in each region at the same level to maintain identical acquisition quality (e.g., consistent target detection) in all the regions. To achieve this goal, ergodic coverage can be used to assign the number of samples according to the quality of observation, i.e., sampling noise levels. However, the noise levels are unknown to the robots. Although this noise can be estimated from samples, the estimates are unreliable at first and can generate fluctuating values. The main contribution of this paper is to use simulated annealing to generate the target sampling distribution, starting from uniform and gradually shifting to an estimated optimal distribution, by varying the coldness parameter of a Boltzmann distribution with the estimated sampling entropy as energy. Simulation results show a substantial improvement of both transient and asymptotic entropy compared to both uniform and direct-ergodic searches. Finally, a demonstration is performed with a TurtleBot swarm system to validate the physical applicability of the algorithm.
MathBode: Frequency-Domain Fingerprints of LLM Mathematical Reasoning
This paper presents MathBode, a dynamic diagnostic for mathematical reasoning in large language models (LLMs). Instead of one-shot accuracy, MathBode treats each parametric problem as a system: we drive a single parameter sinusoidally and fit first-harmonic responses of model outputs and exact solutions. This yields interpretable, frequency-resolved metrics -- gain (amplitude tracking) and phase (lag) -- that form Bode-style fingerprints. Across five closed-form families (linear solve, ratio/saturation, compound interest, 2x2 linear systems, similar triangles), the diagnostic surfaces systematic low-pass behavior and growing phase lag that accuracy alone obscures. We compare several models against a symbolic baseline that calibrates the instrument ($G \approx 1$, $\phi \approx 0$). Results separate frontier from mid-tier models on dynamics, providing a compact, reproducible protocol that complements standard benchmarks with actionable measurements of reasoning fidelity and consistency. We open-source the dataset and code to enable further research and adoption.
Optimality of Linear Policies in Distributionally Robust Linear Quadratic Control
We study a generalization of the classical discrete-time, Linear-Quadratic-Gaussian (LQG) control problem where the noise distributions affecting the states and observations are unknown and chosen adversarially from divergence-based ambiguity sets centered around a known nominal distribution. For a finite horizon model with Gaussian nominal noise and a structural assumption on the divergence that is satisfied by many examples -- including 2-Wasserstein distance, Kullback-Leibler divergence, moment-based divergences, entropy-regularized optimal transport, or Fisher (score-matching) divergence -- we prove that a control policy that is affine in the observations is optimal and the adversary's corresponding worst-case optimal distribution is Gaussian. When the nominal means are zero (as in the classical LQG model), we show that the adversary should optimally set the distribution's mean to zero and the optimal control policy becomes linear. Moreover, the adversary should optimally ``inflate" the noise by choosing covariance matrices that dominate the nominal covariance in Loewner order. Exploiting these structural properties, we develop a Frank-Wolfe algorithm whose inner step solves standard LQG subproblems via Kalman filtering and dynamic programming and show that the implementation consistently outperforms semidefinite-programming reformulations of the problem. All structural and algorithmic results extend to an infinite-horizon, average-cost formulation, yielding stationary linear policies and a time-invariant Gaussian distribution for the adversary. Lastly, we show that when the divergence is 2-Wasserstein, the entire framework remains valid when the nominal distributions are elliptical rather than Gaussian.
Geometry-Aware Decentralized Sinkhorn for Wasserstein Barycenters
Distributed systems require fusing heterogeneous local probability distributions into a global summary over sparse and unreliable communication networks. Traditional consensus algorithms, which average distributions in Euclidean space, ignore their inherent geometric structure, leading to misleading results. Wasserstein barycenters offer a geometry-aware alternative by minimizing optimal transport costs, but their entropic approximations via the Sinkhorn algorithm typically require centralized coordination. This paper proposes a fully decentralized Sinkhorn algorithm that reformulates the centralized geometric mean as an arithmetic average in the log-domain, enabling approximation through local gossip protocols. Agents exchange log-messages with neighbors, interleaving consensus phases with local updates to mimic centralized iterations without a coordinator. To optimize bandwidth, we integrate event-triggered transmissions and b-bit quantization, providing tunable trade-offs between accuracy and communication while accommodating asynchrony and packet loss. Under mild assumptions, we prove convergence to a neighborhood of the centralized entropic barycenter, with bias linearly dependent on consensus tolerance, trigger threshold, and quantization error. Complexity scales near-linearly with network size. Simulations confirm near-centralized accuracy with significantly fewer messages, across various topologies and conditions.
Koopman-based control using sum-of-squares optimization: Improved stability guarantees and data efficiency
In this paper, we propose a novel controller design approach for unknown nonlinear systems using the Koopman operator. In particular, we use the recently proposed stability- and feedback-oriented extended dynamic mode decomposition (SafEDMD) architecture to generate a data-driven bilinear surrogate model with certified error bounds. Then, by accounting for the obtained error bounds in a controller design based on the bilinear system, one can guarantee closed-loop stability for the true nonlinear system. While existing approaches over-approximate the bilinearity of the surrogate model, thus introducing conservatism and providing only local guarantees, we explicitly account for the bilinearity by using sum-of-squares (SOS) optimization in the controller design. More precisely, we parametrize a rational controller stabilizing the error-affected bilinear surrogate model and, consequently, the underlying nonlinear system. The resulting SOS optimization problem provides explicit data-driven controller design conditions for unknown nonlinear systems based on semidefinite programming. Our approach significantly reduces conservatism by establishing a larger region of attraction and improved data efficiency. The proposed method is evaluated using numerical examples, demonstrating its advantages over existing approaches.
comment: Accepted for publication in the European Journal of Control, 2025
A Hybrid TDMA/CSMA Protocol for Time-Sensitive Traffic in Robot Applications
Recent progress in robotics has underscored the demand for real-time control in applications such as manufacturing, healthcare, and autonomous systems, where the timely delivery of mission-critical commands under heterogeneous robotic traffic is paramount for operational efficacy and safety. In these scenarios, mission-critical traffic follows a strict deadline-constrained communication pattern: commands must arrive within defined QoS deadlines, otherwise late arrivals can degrade performance or destabilize control loops.In this work, we demonstrate on a real-time SDR platform that CSMA, widely adopted in robotic communications,suffers severe degradation under high robot traffic loads, with contention-induced collisions and delays disrupting the on-time arrival of mission-critical packets. To address this problem, we propose an IEEE 802.11-compatible hybrid TDMA/CSMA protocol that combines TDMA's deterministic slot scheduling with CSMA's adaptability for heterogeneous robot traffic.The protocol achieves collision-free, low-latency mission-critical command delivery and IEEE 802.11 compatibility through the synergistic integration of sub-microsecond PTP-based slot synchronization-essential for establishing precise timing for TDMA, a three-session superframe with dynamic TDMA allocation for structured and adaptable traffic management,and beacon-NAV protection to preemptively secure these critical communication sessions from interference. Emulation experiments on real-time SDR testbed and Robot Operating System (ROS) simulation show that the proposed protocol reduces missed-deadline errors by 93% compared to the CSMA baseline. In high-speed robot path-tracking ROS simulations, the protocol lowers Root Mean Square (RMS) trajectory error by up to 90% compared with a CSMA baseline, all while maintaining throughput for non-critical traffic within +-2%.
Rapidly Converging Time-Discounted Ergodicity on Graphs for Active Inspection of Confined Spaces
Ergodic exploration has spawned a lot of interest in mobile robotics due to its ability to design time trajectories that match desired spatial coverage statistics. However, current ergodic approaches are for continuous spaces, which require detailed sensory information at each point and can lead to fractal-like trajectories that cannot be tracked easily. This paper presents a new ergodic approach for graph-based discretization of continuous spaces. It also introduces a new time-discounted ergodicity metric, wherein early visitations of information-rich nodes are weighted more than late visitations. A Markov chain synthesized using a convex program is shown to converge more rapidly to time-discounted ergodicity than the traditional fastest mixing Markov chain. The resultant ergodic traversal method is used within a hierarchical framework for active inspection of confined spaces with the goal of detecting anomalies robustly using SLAM-driven Bayesian hypothesis testing. Experiments on a ground robot show the advantages of this framework over three continuous space ergodic planners as well as greedy and random exploration methods for left-behind foreign object debris detection in a ballast tank.
Frequency-Domain Bounds for the Multiconductor Telegrapher's Equation
We establish mathematical bounds on the chain, ABCD and immittance matrices of a multiconductor transmission line, based on the Telegrapher's equation. Closed-form expressions for those matrices are also presented. Existing results that hold on the imaginary axis are extended to the complex plane, without reliance on a diagonalizability assumption that is prevalent in the literature. Therefore, the results remain valid under arbitrary arrangements of the conducting wires, which include electrical faults. The analysis ultimately reveals important system-theoretic implications of the Telegrapher's equation, which are of general relevance to control, power systems, and signal processing involving multiconductor transmission lines.
comment: First revision: 10 pages, 1 figure
Robotics
Pixel Motion Diffusion is What We Need for Robot Control
We present DAWN (Diffusion is All We Need for robot control), a unified diffusion-based framework for language-conditioned robotic manipulation that bridges high-level motion intent and low-level robot action via structured pixel motion representation. In DAWN, both the high-level and low-level controllers are modeled as diffusion processes, yielding a fully trainable, end-to-end system with interpretable intermediate motion abstractions. DAWN achieves state-of-the-art results on the challenging CALVIN benchmark, demonstrating strong multi-task performance, and further validates its effectiveness on MetaWorld. Despite the substantial domain gap between simulation and reality and limited real-world data, we demonstrate reliable real-world transfer with only minimal finetuning, illustrating the practical viability of diffusion-based motion abstractions for robotic control. Our results show the effectiveness of combining diffusion modeling with motion-centric representations as a strong baseline for scalable and robust robot learning. Project page: https://nero1342.github.io/DAWN/
comment: 16 pages, 7 figures
See, Point, Fly: A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation
We present See, Point, Fly (SPF), a training-free aerial vision-and-language navigation (AVLN) framework built atop vision-language models (VLMs). SPF is capable of navigating to any goal based on any type of free-form instructions in any kind of environment. In contrast to existing VLM-based approaches that treat action prediction as a text generation task, our key insight is to consider action prediction for AVLN as a 2D spatial grounding task. SPF harnesses VLMs to decompose vague language instructions into iterative annotation of 2D waypoints on the input image. Along with the predicted traveling distance, SPF transforms predicted 2D waypoints into 3D displacement vectors as action commands for UAVs. Moreover, SPF also adaptively adjusts the traveling distance to facilitate more efficient navigation. Notably, SPF performs navigation in a closed-loop control manner, enabling UAVs to follow dynamic targets in dynamic environments. SPF sets a new state of the art in DRL simulation benchmark, outperforming the previous best method by an absolute margin of 63%. In extensive real-world evaluations, SPF outperforms strong baselines by a large margin. We also conduct comprehensive ablation studies to highlight the effectiveness of our design choice. Lastly, SPF shows remarkable generalization to different VLMs. Project page: https://spf-web.pages.dev
comment: CoRL 2025. Project page: https://spf-web.pages.dev
VLA-Reasoner: Empowering Vision-Language-Action Models with Reasoning via Online Monte Carlo Tree Search
Vision-Language-Action models (VLAs) achieve strong performance in general robotic manipulation tasks by scaling imitation learning. However, existing VLAs are limited to predicting short-sighted next-action, which struggle with long-horizon trajectory tasks due to incremental deviations. To address this problem, we propose a plug-in framework named VLA-Reasoner that effectively empowers off-the-shelf VLAs with the capability of foreseeing future states via test-time scaling. Specifically, VLA-Reasoner samples and rolls out possible action trajectories where involved actions are rationales to generate future states via a world model, which enables VLA-Reasoner to foresee and reason potential outcomes and search for the optimal actions. We further leverage Monte Carlo Tree Search (MCTS) to improve search efficiency in large action spaces, where stepwise VLA predictions seed the root. Meanwhile, we introduce a confidence sampling mechanism based on Kernel Density Estimation (KDE), to enable efficient exploration in MCTS without redundant VLA queries. We evaluate intermediate states in MCTS via an offline reward shaping strategy, to score predicted futures and correct deviations with long-term feedback. We conducted extensive experiments in both simulators and the real world, demonstrating that our proposed VLA-Reasoner achieves significant improvements over the state-of-the-art VLAs. Our method highlights a potential pathway toward scalable test-time computation of robotic manipulation.
comment: 9 pages
WoW: Towards a World omniscient World model Through Embodied Interaction
Humans develop an understanding of intuitive physics through active interaction with the world. This approach is in stark contrast to current video models, such as Sora, which rely on passive observation and therefore struggle with grasping physical causality. This observation leads to our central hypothesis: authentic physical intuition of the world model must be grounded in extensive, causally rich interactions with the real world. To test this hypothesis, we present WoW, a 14-billion-parameter generative world model trained on 2 million robot interaction trajectories. Our findings reveal that the model's understanding of physics is a probabilistic distribution of plausible outcomes, leading to stochastic instabilities and physical hallucinations. Furthermore, we demonstrate that this emergent capability can be actively constrained toward physical realism by SOPHIA, where vision-language model agents evaluate the DiT-generated output and guide its refinement by iteratively evolving the language instructions. In addition, a co-trained Inverse Dynamics Model translates these refined plans into executable robotic actions, thus closing the imagination-to-action loop. We establish WoWBench, a new benchmark focused on physical consistency and causal reasoning in video, where WoW achieves state-of-the-art performance in both human and autonomous evaluation, demonstrating strong ability in physical causality, collision dynamics, and object permanence. Our work provides systematic evidence that large-scale, real-world interaction is a cornerstone for developing physical intuition in AI. Models, data, and benchmarks will be open-sourced.
EgoDemoGen: Novel Egocentric Demonstration Generation Enables Viewpoint-Robust Manipulation
Imitation learning based policies perform well in robotic manipulation, but they often degrade under *egocentric viewpoint shifts* when trained from a single egocentric viewpoint. To address this issue, we present **EgoDemoGen**, a framework that generates *paired* novel egocentric demonstrations by retargeting actions in the novel egocentric frame and synthesizing the corresponding egocentric observation videos with proposed generative video repair model **EgoViewTransfer**, which is conditioned by a novel-viewpoint reprojected scene video and a robot-only video rendered from the retargeted joint actions. EgoViewTransfer is finetuned from a pretrained video generation model using self-supervised double reprojection strategy. We evaluate EgoDemoGen on both simulation (RoboTwin2.0) and real-world robot. After training with a mixture of EgoDemoGen-generated novel egocentric demonstrations and original standard egocentric demonstrations, policy success rate improves **absolutely** by **+17.0%** for standard egocentric viewpoint and by **+17.7%** for novel egocentric viewpoints in simulation. On real-world robot, the **absolute** improvements are **+18.3%** and **+25.8%**. Moreover, performance continues to improve as the proportion of EgoDemoGen-generated demonstrations increases, with diminishing returns. These results demonstrate that EgoDemoGen provides a practical route to egocentric viewpoint-robust robotic manipulation.
MINT-RVAE: Multi-Cues Intention Prediction of Human-Robot Interaction using Human Pose and Emotion Information from RGB-only Camera Data
Efficiently detecting human intent to interact with ubiquitous robots is crucial for effective human-robot interaction (HRI) and collaboration. Over the past decade, deep learning has gained traction in this field, with most existing approaches relying on multimodal inputs, such as RGB combined with depth (RGB-D), to classify time-sequence windows of sensory data as interactive or non-interactive. In contrast, we propose a novel RGB-only pipeline for predicting human interaction intent with frame-level precision, enabling faster robot responses and improved service quality. A key challenge in intent prediction is the class imbalance inherent in real-world HRI datasets, which can hinder the model's training and generalization. To address this, we introduce MINT-RVAE, a synthetic sequence generation method, along with new loss functions and training strategies that enhance generalization on out-of-sample data. Our approach achieves state-of-the-art performance (AUROC: 0.95) outperforming prior works (AUROC: 0.90-0.912), while requiring only RGB input and supporting precise frame onset prediction. Finally, to support future research, we openly release our new dataset with frame-level labeling of human interaction intent.
An Intention-driven Lane Change Framework Considering Heterogeneous Dynamic Cooperation in Mixed-traffic Environment
In mixed-traffic environments, where autonomous vehicles (AVs) interact with diverse human-driven vehicles (HVs), unpredictable intentions and heterogeneous behaviors make safe and efficient lane change maneuvers highly challenging. Existing methods often oversimplify these interactions by assuming uniform patterns. We propose an intention-driven lane change framework that integrates driving-style recognition, cooperation-aware decision-making, and coordinated motion planning. A deep learning classifier trained on the NGSIM dataset identifies human driving styles in real time. A cooperation score with intrinsic and interactive components estimates surrounding drivers' intentions and quantifies their willingness to cooperate with the ego vehicle. Decision-making combines behavior cloning with inverse reinforcement learning to determine whether a lane change should be initiated. For trajectory generation, model predictive control is integrated with IRL-based intention inference to produce collision-free and socially compliant maneuvers. Experiments show that the proposed model achieves 94.2\% accuracy and 94.3\% F1-score, outperforming rule-based and learning-based baselines by 4-15\% in lane change recognition. These results highlight the benefit of modeling inter-driver heterogeneity and demonstrate the potential of the framework to advance context-aware and human-like autonomous driving in complex traffic environments.
JanusVLN: Decoupling Semantics and Spatiality with Dual Implicit Memory for Vision-Language Navigation
Vision-and-Language Navigation requires an embodied agent to navigate through unseen environments, guided by natural language instructions and a continuous video stream. Recent advances in VLN have been driven by the powerful semantic understanding of Multimodal Large Language Models. However, these methods typically rely on explicit semantic memory, such as building textual cognitive maps or storing historical visual frames. This type of method suffers from spatial information loss, computational redundancy, and memory bloat, which impede efficient navigation. Inspired by the implicit scene representation in human navigation, analogous to the left brain's semantic understanding and the right brain's spatial cognition, we propose JanusVLN, a novel VLN framework featuring a dual implicit neural memory that models spatial-geometric and visual-semantic memory as separate, compact, and fixed-size neural representations. This framework first extends the MLLM to incorporate 3D prior knowledge from the spatial-geometric encoder, thereby enhancing the spatial reasoning capabilities of models based solely on RGB input. Then, the historical key-value caches from the spatial-geometric and visual-semantic encoders are constructed into a dual implicit memory. By retaining only the KVs of tokens in the initial and sliding window, redundant computation is avoided, enabling efficient incremental updates. Extensive experiments demonstrate that JanusVLN outperforms over 20 recent methods to achieve SOTA performance. For example, the success rate improves by 10.5-35.5 compared to methods using multiple data types as input and by 3.6-10.8 compared to methods using more RGB training data. This indicates that the proposed dual implicit neural memory, as a novel paradigm, explores promising new directions for future VLN research. Ours project page: https://miv-xjtu.github.io/JanusVLN.github.io/.
comment: Project page: https://miv-xjtu.github.io/JanusVLN.github.io/
HELIOS: Hierarchical Exploration for Language-grounded Interaction in Open Scenes
Language-specified mobile manipulation tasks in novel environments simultaneously face challenges interacting with a scene which is only partially observed, grounding semantic information from language instructions to the partially observed scene, and actively updating knowledge of the scene with new observations. To address these challenges, we propose HELIOS, a hierarchical scene representation and associated search objective to perform language specified pick and place mobile manipulation tasks. We construct 2D maps containing the relevant semantic and occupancy information for navigation while simultaneously actively constructing 3D Gaussian representations of task-relevant objects. We fuse observations across this multi-layered representation while explicitly modeling the multi-view consistency of the detections of each object. In order to efficiently search for the target object, we formulate an objective function balancing exploration of unobserved or uncertain regions with exploitation of scene semantic information. We evaluate HELIOS on the OVMM benchmark in the Habitat simulator, a pick and place benchmark in which perception is challenging due to large and complex scenes with comparatively small target objects. HELIOS achieves state-of-the-art results on OVMM. As our approach is zero-shot, HELIOS can also transfer to the real world without requiring additional data, as we illustrate by demonstrating it in a real world office environment on a Spot robot.
Ontological foundations for contrastive explanatory narration of robot plans
Mutual understanding of artificial agents' decisions is key to ensuring a trustworthy and successful human-robot interaction. Hence, robots are expected to make reasonable decisions and communicate them to humans when needed. In this article, the focus is on an approach to modeling and reasoning about the comparison of two competing plans, so that robots can later explain the divergent result. First, a novel ontological model is proposed to formalize and reason about the differences between competing plans, enabling the classification of the most appropriate one (e.g., the shortest, the safest, the closest to human preferences, etc.). This work also investigates the limitations of a baseline algorithm for ontology-based explanatory narration. To address these limitations, a novel algorithm is presented, leveraging divergent knowledge between plans and facilitating the construction of contrastive narratives. Through empirical evaluation, it is observed that the explanations excel beyond the baseline method.
comment: This version was submitted to the journal Information Sciences and is under review since October 2024
Uncertainty-Aware Multi-Robot Task Allocation With Strongly Coupled Inter-Robot Rewards
This paper proposes a task allocation algorithm for teams of heterogeneous robots in environments with uncertain task requirements. We model these requirements as probability distributions over capabilities and use this model to allocate tasks such that robots with complementary skills naturally position near uncertain tasks, proactively mitigating task failures without wasting resources. We introduce a market-based approach that optimizes the joint team objective while explicitly capturing coupled rewards between robots, offering a polynomial-time solution in decentralized settings with strict communication assumptions. Comparative experiments against benchmark algorithms demonstrate the effectiveness of our approach and highlight the challenges of incorporating coupled rewards in a decentralized formulation.
comment: 8 pages
Learning to Ball: Composing Policies for Long-Horizon Basketball Moves SIGGRAPH
Learning a control policy for a multi-phase, long-horizon task, such as basketball maneuvers, remains challenging for reinforcement learning approaches due to the need for seamless policy composition and transitions between skills. A long-horizon task typically consists of distinct subtasks with well-defined goals, separated by transitional subtasks with unclear goals but critical to the success of the entire task. Existing methods like the mixture of experts and skill chaining struggle with tasks where individual policies do not share significant commonly explored states or lack well-defined initial and terminal states between different phases. In this paper, we introduce a novel policy integration framework to enable the composition of drastically different motor skills in multi-phase long-horizon tasks with ill-defined intermediate states. Based on that, we further introduce a high-level soft router to enable seamless and robust transitions between the subtasks. We evaluate our framework on a set of fundamental basketball skills and challenging transitions. Policies trained by our approach can effectively control the simulated character to interact with the ball and accomplish the long-horizon task specified by real-time user commands, without relying on ball trajectory references.
comment: ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia 2025). Website: http://pei-xu.github.io/basketball. Video: https://youtu.be/2RBFIjjmR2I. Code: https://github.com/xupei0610/basketball
UnderwaterVLA: Dual-brain Vision-Language-Action architecture for Autonomous Underwater Navigation
This paper presents UnderwaterVLA, a novel framework for autonomous underwater navigation that integrates multimodal foundation models with embodied intelligence systems. Underwater operations remain difficult due to hydrodynamic disturbances, limited communication bandwidth, and degraded sensing in turbid waters. To address these challenges, we introduce three innovations. First, a dual-brain architecture decouples high-level mission reasoning from low-level reactive control, enabling robust operation under communication and computational constraints. Second, we apply Vision-Language-Action(VLA) models to underwater robotics for the first time, incorporating structured chain-of-thought reasoning for interpretable decision-making. Third, a hydrodynamics-informed Model Predictive Control(MPC) scheme compensates for fluid effects in real time without costly task-specific training. Experimental results in field tests show that UnderwaterVLA reduces navigation errors in degraded visual conditions while maintaining higher task completion by 19% to 27% over baseline. By minimizing reliance on underwater-specific training data and improving adaptability across environments, UnderwaterVLA provides a scalable and cost-effective path toward the next generation of intelligent AUVs.
comment: This paper introduces the first VLA framework for AUVs, featuring a dual-brain architecture and zero-data MPC for real-world underwater navigation
An Ontology for Unified Modeling of Tasks, Actions, Environments, and Capabilities in Personal Service Robotics
Personal service robots are increasingly used in domestic settings to assist older adults and people requiring support. Effective operation involves not only physical interaction but also the ability to interpret dynamic environments, understand tasks, and choose appropriate actions based on context. This requires integrating both hardware components (e.g. sensors, actuators) and software systems capable of reasoning about tasks, environments, and robot capabilities. Frameworks such as the Robot Operating System (ROS) provide open-source tools that help connect low-level hardware with higher-level functionalities. However, real-world deployments remain tightly coupled to specific platforms. As a result, solutions are often isolated and hard-coded, limiting interoperability, reusability, and knowledge sharing. Ontologies and knowledge graphs offer a structured way to represent tasks, environments, and robot capabilities. Existing ontologies, such as the Socio-physical Model of Activities (SOMA) and the Descriptive Ontology for Linguistic and Cognitive Engineering (DOLCE), provide models for activities, spatial relationships, and reasoning structures. However, they often focus on specific domains and do not fully capture the connection between environment, action, robot capabilities, and system-level integration. In this work, we propose the Ontology for roBOts and acTions (OntoBOT), which extends existing ontologies to provide a unified representation of tasks, actions, environments, and capabilities. Our contributions are twofold: (1) we unify these aspects into a cohesive ontology to support formal reasoning about task execution, and (2) we demonstrate its generalizability by evaluating competency questions across four embodied agents - TIAGo, HSR, UR3, and Stretch - showing how OntoBOT enables context-aware reasoning, task-oriented execution, and knowledge sharing in service robotics.
EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer
Vision-language-action (VLA) models increasingly rely on diverse training data to achieve robust generalization. However, collecting large-scale real-world robot manipulation data across varied object appearances and environmental conditions remains prohibitively time-consuming and expensive. To overcome this bottleneck, we propose Embodied Manipulation Media Adaptation (EMMA), a VLA policy enhancement framework that integrates a generative data engine with an effective training pipeline. We introduce DreamTransfer, a diffusion Transformer-based framework for generating multi-view consistent, geometrically grounded embodied manipulation videos. DreamTransfer enables text-controlled visual editing of robot videos, transforming foreground, background, and lighting conditions without compromising 3D structure or geometrical plausibility. Furthermore, we explore hybrid training with real and generated data, and introduce AdaMix, a hard-sample-aware training strategy that dynamically reweights training batches to focus optimization on perceptually or kinematically challenging samples. Extensive experiments show that videos generated by DreamTransfer significantly outperform prior video generation methods in multi-view consistency, geometric fidelity, and text-conditioning accuracy. Crucially, VLAs trained with generated data enable robots to generalize to unseen object categories and novel visual domains using only demonstrations from a single appearance. In real-world robotic manipulation tasks with zero-shot visual domains, our approach achieves over a 200% relative performance gain compared to training on real data alone, and further improves by 13% with AdaMix, demonstrating its effectiveness in boosting policy generalization.
ReLAM: Learning Anticipation Model for Rewarding Visual Robotic Manipulation
Reward design remains a critical bottleneck in visual reinforcement learning (RL) for robotic manipulation. In simulated environments, rewards are conventionally designed based on the distance to a target position. However, such precise positional information is often unavailable in real-world visual settings due to sensory and perceptual limitations. In this study, we propose a method that implicitly infers spatial distances through keypoints extracted from images. Building on this, we introduce Reward Learning with Anticipation Model (ReLAM), a novel framework that automatically generates dense, structured rewards from action-free video demonstrations. ReLAM first learns an anticipation model that serves as a planner and proposes intermediate keypoint-based subgoals on the optimal path to the final goal, creating a structured learning curriculum directly aligned with the task's geometric objectives. Based on the anticipated subgoals, a continuous reward signal is provided to train a low-level, goal-conditioned policy under the hierarchical reinforcement learning (HRL) framework with provable sub-optimality bound. Extensive experiments on complex, long-horizon manipulation tasks show that ReLAM significantly accelerates learning and achieves superior performance compared to state-of-the-art methods.
A Multi-Modality Evaluation of the Reality Gap in Autonomous Driving Systems
Simulation-based testing is a cornerstone of Autonomous Driving System (ADS) development, offering safe and scalable evaluation across diverse driving scenarios. However, discrepancies between simulated and real-world behavior, known as the reality gap, challenge the transferability of test results to deployed systems. In this paper, we present a comprehensive empirical study comparing four representative testing modalities: Software-in-the-Loop (SiL), Vehicle-in-the-Loop (ViL), Mixed-Reality (MR), and full real-world testing. Using a small-scale physical vehicle equipped with real sensors (camera and LiDAR) and its digital twin, we implement each setup and evaluate two ADS architectures (modular and end-to-end) across diverse indoor driving scenarios involving real obstacles, road topologies, and indoor environments. We systematically assess the impact of each testing modality along three dimensions of the reality gap: actuation, perception, and behavioral fidelity. Our results show that while SiL and ViL setups simplify critical aspects of real-world dynamics and sensing, MR testing improves perceptual realism without compromising safety or control. Importantly, we identify the conditions under which failures do not transfer across testing modalities and isolate the underlying dimensions of the gap responsible for these discrepancies. Our findings offer actionable insights into the respective strengths and limitations of each modality and outline a path toward more robust and transferable validation of autonomous driving systems.
comment: In proceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering (ASE '25)
RoboView-Bias: Benchmarking Visual Bias in Embodied Agents for Robotic Manipulation
The safety and reliability of embodied agents rely on accurate and unbiased visual perception. However, existing benchmarks mainly emphasize generalization and robustness under perturbations, while systematic quantification of visual bias remains scarce. This gap limits a deeper understanding of how perception influences decision-making stability. To address this issue, we propose RoboView-Bias, the first benchmark specifically designed to systematically quantify visual bias in robotic manipulation, following a principle of factor isolation. Leveraging a structured variant-generation framework and a perceptual-fairness validation protocol, we create 2,127 task instances that enable robust measurement of biases induced by individual visual factors and their interactions. Using this benchmark, we systematically evaluate three representative embodied agents across two prevailing paradigms and report three key findings: (i) all agents exhibit significant visual biases, with camera viewpoint being the most critical factor; (ii) agents achieve their highest success rates on highly saturated colors, indicating inherited visual preferences from underlying VLMs; and (iii) visual biases show strong, asymmetric coupling, with viewpoint strongly amplifying color-related bias. Finally, we demonstrate that a mitigation strategy based on a semantic grounding layer substantially reduces visual bias by approximately 54.5\% on MOKA. Our results highlight that systematic analysis of visual bias is a prerequisite for developing safe and reliable general-purpose embodied agents.
Trust and Human Autonomy after Cobot Failures: Communication is Key for Industry 5.0
Collaborative robots (cobots) are a core technology of Industry 4.0. Industry 4.0 uses cyber-physical systems, IoT and smart automation to improve efficiency and data-driven decision-making. Cobots, as cyber-physical systems, enable the introduction of lightweight automation to smaller companies through their flexibility, low cost and ability to work alongside humans, while keeping humans and their skills in the loop. Industry 5.0, the evolution of Industry 4.0, places the worker at the centre of its principles: The physical and mental well-being of the worker is the main goal of new technology design, not just productivity, efficiency and safety standards. Within this concept, human trust in cobots and human autonomy are important. While trust is essential for effective and smooth interaction, the workers' perception of autonomy is key to intrinsic motivation and overall well-being. As failures are an inevitable part of technological systems, this study aims to answer the question of how system failures affect trust in cobots as well as human autonomy, and how they can be recovered afterwards. Therefore, a VR experiment (n = 39) was set up to investigate the influence of a cobot failure and its severity on human autonomy and trust in the cobot. Furthermore, the influence of transparent communication about the failure and next steps was investigated. The results show that both trust and autonomy suffer after cobot failures, with the severity of the failure having a stronger negative impact on trust, but not on autonomy. Both trust and autonomy can be partially restored by transparent communication.
Beyond Detection -- Orchestrating Human-Robot-Robot Assistance via an Internet of Robotic Things Paradigm
Hospital patient falls remain a critical and costly challenge worldwide. While conventional fall prevention systems typically rely on post-fall detection or reactive alerts, they also often suffer from high false positive rates and fail to address the underlying patient needs that lead to bed-exit attempts. This paper presents a novel system architecture that leverages the Internet of Robotic Things (IoRT) to orchestrate human-robot-robot interaction for proactive and personalized patient assistance. The system integrates a privacy-preserving thermal sensing model capable of real-time bed-exit prediction, with two coordinated robotic agents that respond dynamically based on predicted intent and patient input. This orchestrated response could not only reduce fall risk but also attend to the patient's underlying motivations for movement, such as thirst, discomfort, or the need for assistance, before a hazardous situation arises. Our contributions with this pilot study are three-fold: (1) a modular IoRT-based framework enabling distributed sensing, prediction, and multi-robot coordination; (2) a demonstration of low-resolution thermal sensing for accurate, privacy-preserving preemptive bed-exit detection; and (3) results from a user study and systematic error analysis that inform the design of situationally aware, multi-agent interactions in hospital settings. The findings highlight how interactive and connected robotic systems can move beyond passive monitoring to deliver timely, meaningful assistance, empowering safer, more responsive care environments.
comment: ICSR 2025, 8 pages, 3 figures
IMU-Preintegrated Radar Factors for Asynchronous Radar-LiDAR-Inertial SLAM
Fixed-lag Radar-LiDAR-Inertial smoothers conventionally create one factor graph node per measurement to compensate for the lack of time synchronization between radar and LiDAR. For a radar-LiDAR sensor pair with equal rates, this strategy results in a state creation rate of twice the individual sensor frequencies. This doubling of the number of states per second yields high optimization costs, inhibiting real-time performance on resource-constrained hardware. We introduce IMU-preintegrated radar factors that use high-rate inertial data to propagate the most recent LiDAR state to the radar measurement timestamp. This strategy maintains the node creation rate at the LiDAR measurement frequency. Assuming equal sensor rates, this lowers the number of nodes by 50 % and consequently the computational costs. Experiments on a single board computer (which has 4 cores each of 2.2 GHz A73 and 2 GHz A53 with 8 GB RAM) show that our method preserves the absolute pose error of a conventional baseline while simultaneously lowering the aggregated factor graph optimization time by up to 56 %.
comment: 8 pages, 7 figures, accepted by The 22nd International Conference on Advanced Robotics (ICAR 2025). Supplementary video: https://youtu.be/95jeWXBMN7c
Leveraging Large Language Models for Robot-Assisted Learning of Morphological Structures in Preschool Children with Language Vulnerabilities
Preschool children with language vulnerabilities -- such as developmental language disorders or immigration related language challenges -- often require support to strengthen their expressive language skills. Based on the principle of implicit learning, speech-language therapists (SLTs) typically embed target morphological structures (e.g., third person -s) into everyday interactions or game-based learning activities. Educators are recommended by SLTs to do the same. This approach demands precise linguistic knowledge and real-time production of various morphological forms (e.g., "Daddy wears these when he drives to work"). The task becomes even more demanding when educators or parent also must keep children engaged and manage turn-taking in a game-based activity. In the TalBot project our multiprofessional team have developed an application in which the Furhat conversational robot plays the word retrieval game "Alias" with children to improve language skills. Our application currently employs a large language model (LLM) to manage gameplay, dialogue, affective responses, and turn-taking. Our next step is to further leverage the capacity of LLMs so the robot can generate and deliver specific morphological targets during the game. We hypothesize that a robot could outperform humans at this task. Novel aspects of this approach are that the robot could ultimately serve as a model and tutor for both children and professionals and that using LLM capabilities in this context would support basic communication needs for children with language vulnerabilities. Our long-term goal is to create a robust LLM-based Robot-Assisted Language Learning intervention capable of teaching a variety of morphological structures across different languages.
comment: 12 pages, 2 figures, Preprint of: Sundstedt, S., Wingren, M., H\"agglund, S. & Ventus, D. (2025). Leveraging Large Language Models for Robot-Assisted Learning of Morphological Structures in Preschool Children with Language Vulnerabilities. In: Stephanidis, C., Antona, M., Ntoa, S. & Salvendy, G. (eds.), Communications in Computer and Information Science, vol. 2523, pp. 415-425. Springer
MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning NeurIPS 2025
The ability of robots to interpret human instructions and execute manipulation tasks necessitates the availability of task-relevant tabletop scenes for training. However, traditional methods for creating these scenes rely on time-consuming manual layout design or purely randomized layouts, which are limited in terms of plausibility or alignment with the tasks. In this paper, we formulate a novel task, namely task-oriented tabletop scene generation, which poses significant challenges due to the substantial gap between high-level task instructions and the tabletop scenes. To support research on such a challenging task, we introduce MesaTask-10K, a large-scale dataset comprising approximately 10,700 synthetic tabletop scenes with manually crafted layouts that ensure realistic layouts and intricate inter-object relations. To bridge the gap between tasks and scenes, we propose a Spatial Reasoning Chain that decomposes the generation process into object inference, spatial interrelation reasoning, and scene graph construction for the final 3D layout. We present MesaTask, an LLM-based framework that utilizes this reasoning chain and is further enhanced with DPO algorithms to generate physically plausible tabletop scenes that align well with given task descriptions. Exhaustive experiments demonstrate the superior performance of MesaTask compared to baselines in generating task-conforming tabletop scenes with realistic layouts. Project page is at https://mesatask.github.io/
comment: Accepted by NeurIPS 2025; Project page: https://mesatask.github.io/
Human Autonomy and Sense of Agency in Human-Robot Interaction: A Systematic Literature Review
Human autonomy and sense of agency are increasingly recognised as critical for user well-being, motivation, and the ethical deployment of robots in human-robot interaction (HRI). Given the rapid development of artificial intelligence, robot capabilities and their potential to function as colleagues and companions are growing. This systematic literature review synthesises 22 empirical studies selected from an initial pool of 728 articles published between 2011 and 2024. Articles were retrieved from major scientific databases and identified based on empirical focus and conceptual relevance, namely, how to preserve and promote human autonomy and sense of agency in HRI. Derived through thematic synthesis, five clusters of potentially influential factors are revealed: robot adaptiveness, communication style, anthropomorphism, presence of a robot and individual differences. Measured through psychometric scales or the intentional binding paradigm, perceptions of autonomy and agency varied across industrial, educational, healthcare, care, and hospitality settings. The review underscores the theoretical differences between both concepts, but their yet entangled use in HRI. Despite increasing interest, the current body of empirical evidence remains limited and fragmented, underscoring the necessity for standardised definitions, more robust operationalisations, and further exploratory and qualitative research. By identifying existing gaps and highlighting emerging trends, this review contributes to the development of human-centered, autonomy-supportive robot design strategies that uphold ethical and psychological principles, ultimately supporting well-being in human-robot interaction.
From Watch to Imagine: Steering Long-horizon Manipulation via Human Demonstration and Future Envisionment
Generalizing to long-horizon manipulation tasks in a zero-shot setting remains a central challenge in robotics. Current multimodal foundation based approaches, despite their capabilities, typically fail to decompose high-level commands into executable action sequences from static visual input alone. To address this challenge, we introduce Super-Mimic, a hierarchical framework that enables zero-shot robotic imitation by directly inferring procedural intent from unscripted human demonstration videos. Our framework is composed of two sequential modules. First, a Human Intent Translator (HIT) parses the input video using multimodal reasoning to produce a sequence of language-grounded subtasks. These subtasks then condition a Future Dynamics Predictor (FDP), which employs a generative model that synthesizes a physically plausible video rollout for each step. The resulting visual trajectories are dynamics-aware, explicitly modeling crucial object interactions and contact points to guide the low-level controller. We validate this approach through extensive experiments on a suite of long-horizon manipulation tasks, where Super-Mimic significantly outperforms state-of-the-art zero-shot methods by over 20\%. These results establish that coupling video-driven intent parsing with prospective dynamics modeling is a highly effective strategy for developing general-purpose robotic systems.
MimicDreamer: Aligning Human and Robot Demonstrations for Scalable VLA Training
Vision Language Action (VLA) models derive their generalization capability from diverse training data, yet collecting embodied robot interaction data remains prohibitively expensive. In contrast, human demonstration videos are far more scalable and cost-efficient to collect, and recent studies confirm their effectiveness in training VLA models. However, a significant domain gap persists between human videos and robot-executed videos, including unstable camera viewpoints, visual discrepancies between human hands and robotic arms, and differences in motion dynamics. To bridge this gap, we propose MimicDreamer, a framework that turns fast, low-cost human demonstrations into robot-usable supervision by jointly aligning vision, viewpoint, and actions to directly support policy training. For visual alignment, we propose H2R Aligner, a video diffusion model that generates high-fidelity robot demonstration videos by transferring motion from human manipulation footage. For viewpoint stabilization, EgoStabilizer is proposed, which canonicalizes egocentric videos via homography and inpaints occlusions and distortions caused by warping. For action alignment, we map human hand trajectories to the robot frame and apply a constrained inverse kinematics solver to produce feasible, low-jitter joint commands with accurate pose tracking. Empirically, VLA models trained purely on our synthesized human-to-robot videos achieve few-shot execution on real robots. Moreover, scaling training with human data significantly boosts performance compared to models trained solely on real robot data; our approach improves the average success rate by 14.7\% across six representative manipulation tasks.
Actions as Language: Fine-Tuning VLMs into VLAs Without Catastrophic Forgetting
Fine-tuning vision-language models (VLMs) on robot teleoperation data to create vision-language-action (VLA) models is a promising paradigm for training generalist policies, but it suffers from a fundamental tradeoff: learning to produce actions often diminishes the VLM's foundational reasoning and multimodal understanding, hindering generalization to novel scenarios, instruction following, and semantic understanding. We argue that this catastrophic forgetting is due to a distribution mismatch between the VLM's internet-scale pretraining corpus and the robotics fine-tuning data. Inspired by this observation, we introduce VLM2VLA: a VLA training paradigm that first resolves this mismatch at the data level by representing low-level actions with natural language. This alignment makes it possible to train VLAs solely with Low-Rank Adaptation (LoRA), thereby minimally modifying the VLM backbone and averting catastrophic forgetting. As a result, the VLM can be fine-tuned on robot teleoperation data without fundamentally altering the underlying architecture and without expensive co-training on internet-scale VLM datasets. Through extensive Visual Question Answering (VQA) studies and over 800 real-world robotics experiments, we demonstrate that VLM2VLA preserves the VLM's core capabilities, enabling zero-shot generalization to novel tasks that require open-world semantic reasoning and multilingual instruction following.
DHAGrasp: Synthesizing Affordance-Aware Dual-Hand Grasps with Text Instructions
Learning to generate dual-hand grasps that respect object semantics is essential for robust hand-object interaction but remains largely underexplored due to dataset scarcity. Existing grasp datasets predominantly focus on single-hand interactions and contain only limited semantic part annotations. To address these challenges, we introduce a pipeline, SymOpt, that constructs a large-scale dual-hand grasp dataset by leveraging existing single-hand datasets and exploiting object and hand symmetries. Building on this, we propose a text-guided dual-hand grasp generator, DHAGrasp, that synthesizes Dual-Hand Affordance-aware Grasps for unseen objects. Our approach incorporates a novel dual-hand affordance representation and follows a two-stage design, which enables effective learning from a small set of segmented training objects while scaling to a much larger pool of unsegmented data. Extensive experiments demonstrate that our method produces diverse and semantically consistent grasps, outperforming strong baselines in both grasp quality and generalization to unseen objects. The project page is at https://quanzhou-li.github.io/DHAGrasp/.
DemoGrasp: Universal Dexterous Grasping from a Single Demonstration
Universal grasping with multi-fingered dexterous hands is a fundamental challenge in robotic manipulation. While recent approaches successfully learn closed-loop grasping policies using reinforcement learning (RL), the inherent difficulty of high-dimensional, long-horizon exploration necessitates complex reward and curriculum design, often resulting in suboptimal solutions across diverse objects. We propose DemoGrasp, a simple yet effective method for learning universal dexterous grasping. We start from a single successful demonstration trajectory of grasping a specific object and adapt to novel objects and poses by editing the robot actions in this trajectory: changing the wrist pose determines where to grasp, and changing the hand joint angles determines how to grasp. We formulate this trajectory editing as a single-step Markov Decision Process (MDP) and use RL to optimize a universal policy across hundreds of objects in parallel in simulation, with a simple reward consisting of a binary success term and a robot-table collision penalty. In simulation, DemoGrasp achieves a 95% success rate on DexGraspNet objects using the Shadow Hand, outperforming previous state-of-the-art methods. It also shows strong transferability, achieving an average success rate of 84.6% across diverse dexterous hand embodiments on six unseen object datasets, while being trained on only 175 objects. Through vision-based imitation learning, our policy successfully grasps 110 unseen real-world objects, including small, thin items. It generalizes to spatial, background, and lighting changes, supports both RGB and depth inputs, and extends to language-guided grasping in cluttered scenes.
Log2Plan: An Adaptive GUI Automation Framework Integrated with Task Mining Approach
GUI task automation streamlines repetitive tasks, but existing LLM or VLM-based planner-executor agents suffer from brittle generalization, high latency, and limited long-horizon coherence. Their reliance on single-shot reasoning or static plans makes them fragile under UI changes or complex tasks. Log2Plan addresses these limitations by combining a structured two-level planning framework with a task mining approach over user behavior logs, enabling robust and adaptable GUI automation. Log2Plan constructs high-level plans by mapping user commands to a structured task dictionary, enabling consistent and generalizable automation. To support personalization and reuse, it employs a task mining approach from user behavior logs that identifies user-specific patterns. These high-level plans are then grounded into low-level action sequences by interpreting real-time GUI context, ensuring robust execution across varying interfaces. We evaluated Log2Plan on 200 real-world tasks, demonstrating significant improvements in task success rate and execution time. Notably, it maintains over 60.0% success rate even on long-horizon task sequences, highlighting its robustness in complex, multi-step workflows.
Multi-stage robust nonlinear model predictive control of a lower-limb exoskeleton robot
The use of exoskeleton robots is increasing due to the rising number of musculoskeletal injuries. However, their effectiveness depends heavily on the design of control systems. Designing robust controllers is challenging because of uncertainties in human-robot systems. Among various control strategies, Model Predictive Control (MPC) is a powerful approach due to its ability to handle constraints and optimize performance. Previous studies have used linearization-based methods to implement robust MPC on exoskeletons, but these can degrade performance due to nonlinearities in the robot's dynamics. To address this gap, this paper proposes a Robust Nonlinear Model Predictive Control (RNMPC) method, called multi-stage NMPC, to control a two-degree-of-freedom exoskeleton by solving a nonlinear optimization problem. This method uses multiple scenarios to represent system uncertainties. The study focuses on minimizing human-robot interaction forces during the swing phase, particularly when the robot carries unknown loads. Simulations and experimental tests show that the proposed method significantly improves robustness, outperforming non-robust NMPC. It achieves lower tracking errors and interaction forces under various uncertainties. For instance, when a 2 kg unknown payload is combined with external disturbances, the RMS values of thigh and shank interaction forces for multi-stage NMPC are reduced by 77 and 94 percent, respectively, compared to non-robust NMPC.
comment: 12 pages, 11 figures, 2 tables, under review at the journal of "Transactions of the Canadian Society for Mechanical Engineering"
Action-aware Dynamic Pruning for Efficient Vision-Language-Action Manipulation
Robotic manipulation with Vision-Language-Action models requires efficient inference over long-horizon multi-modal context, where attention to dense visual tokens dominates computational cost. Existing methods optimize inference speed by reducing visual redundancy within VLA models, but they overlook the varying redundancy across robotic manipulation stages. We observe that the visual token redundancy is higher in coarse manipulation phase than in fine-grained operations, and is strongly correlated with the action dynamic. Motivated by this observation, we propose \textbf{A}ction-aware \textbf{D}ynamic \textbf{P}runing (\textbf{ADP}), a multi-modal pruning framework that integrates text-driven token selection with action-aware trajectory gating. Our method introduces a gating mechanism that conditions the pruning signal on recent action trajectories, using past motion windows to adaptively adjust token retention ratios in accordance with dynamics, thereby balancing computational efficiency and perceptual precision across different manipulation stages. Extensive experiments on the LIBERO suites and diverse real-world scenarios demonstrate that our method significantly reduces FLOPs and action inference latency (\textit{e.g.} $1.35 \times$ speed up on OpenVLA-OFT) while maintaining competitive success rates (\textit{e.g.} 25.8\% improvements with OpenVLA) compared to baselines, thereby providing a simple plug-in path to efficient robot policies that advances the efficiency and performance frontier of robotic manipulation. Our project website is: \href{https://vla-adp.github.io/}{ADP.com}.
Effect of Gait Design on Proprioceptive Sensing of Terrain Properties in a Quadrupedal Robot ICRA
In-situ robotic exploration is an important tool for advancing knowledge of geological processes that describe the Earth and other Planetary bodies. To inform and enhance operations for these roving laboratories, it is imperative to understand the terramechanical properties of their environments, especially for traversing on loose, deformable substrates. Recent research suggested that legged robots with direct-drive and low-gear ratio actuators can sensitively detect external forces, and therefore possess the potential to measure terrain properties with their legs during locomotion, providing unprecedented sampling speed and density while accessing terrains previously too risky to sample. This paper explores these ideas by investigating the impact of gait on proprioceptive terrain sensing accuracy, particularly comparing a sensing-oriented gait, Crawl N' Sense, with a locomotion-oriented gait, Trot-Walk. Each gait's ability to measure the strength and texture of deformable substrate is quantified as the robot locomotes over a laboratory transect consisting of a rigid surface, loose sand, and loose sand with synthetic surface crusts. Our results suggest that with both the sensing-oriented crawling gait and locomotion-oriented trot gait, the robot can measure a consistent difference in the strength (in terms of penetration resistance) between the low- and high-resistance substrates; however, the locomotion-oriented trot gait contains larger magnitude and variance in measurements. Furthermore, the slower crawl gait can detect brittle ruptures of the surface crusts with significantly higher accuracy than the faster trot gait. Our results offer new insights that inform legged robot "sensing during locomotion" gait design and planning for scouting the terrain and producing scientific measurements on other worlds to advance our understanding of their geology and formation.
comment: 7+1 pages, 5 figures, ICRA Submission This work has been submitted to the IEEE for possible publication
An Adaptive ICP LiDAR Odometry Based on Reliable Initial Pose
As a key technology for autonomous navigation and positioning in mobile robots, light detection and ranging (LiDAR) odometry is widely used in autonomous driving applications. The Iterative Closest Point (ICP)-based methods have become the core technique in LiDAR odometry due to their efficient and accurate point cloud registration capability. However, some existing ICP-based methods do not consider the reliability of the initial pose, which may cause the method to converge to a local optimum. Furthermore, the absence of an adaptive mechanism hinders the effective handling of complex dynamic environments, resulting in a significant degradation of registration accuracy. To address these issues, this paper proposes an adaptive ICP-based LiDAR odometry method that relies on a reliable initial pose. First, distributed coarse registration based on density filtering is employed to obtain the initial pose estimation. The reliable initial pose is then selected by comparing it with the motion prediction pose, reducing the initial error between the source and target point clouds. Subsequently, by combining the current and historical errors, the adaptive threshold is dynamically adjusted to accommodate the real-time changes in the dynamic environment. Finally, based on the reliable initial pose and the adaptive threshold, point-to-plane adaptive ICP registration is performed from the current frame to the local map, achieving high-precision alignment of the source and target point clouds. Extensive experiments on the public KITTI dataset demonstrate that the proposed method outperforms existing approaches and significantly enhances the accuracy of LiDAR odometry.
Lightweight Structured Multimodal Reasoning for Clinical Scene Understanding in Robotics
Healthcare robotics requires robust multimodal perception and reasoning to ensure safety in dynamic clinical environments. Current Vision-Language Models (VLMs) demonstrate strong general-purpose capabilities but remain limited in temporal reasoning, uncertainty estimation, and structured outputs needed for robotic planning. We present a lightweight agentic multimodal framework for video-based scene understanding. Combining the Qwen2.5-VL-3B-Instruct model with a SmolAgent-based orchestration layer, it supports chain-of-thought reasoning, speech-vision fusion, and dynamic tool invocation. The framework generates structured scene graphs and leverages a hybrid retrieval module for interpretable and adaptive reasoning. Evaluations on the Video-MME benchmark and a custom clinical dataset show competitive accuracy and improved robustness compared to state-of-the-art VLMs, demonstrating its potential for applications in robot-assisted surgery, patient monitoring, and decision support.
comment: 11 pages, 3 figures
One-DoF Robotic Design of Overconstrained Limbs with Energy-Efficient, Self-Collision-Free Motion
While it is expected to build robotic limbs with multiple degrees of freedom (DoF) inspired by nature, a single DoF design remains fundamental, providing benefits that include, but are not limited to, simplicity, robustness, cost-effectiveness, and efficiency. Mechanisms, especially those with multiple links and revolute joints connected in closed loops, play an enabling factor in introducing motion diversity for 1-DoF systems, which are usually constrained by self-collision during a full-cycle range of motion. This study presents a novel computational approach to designing one-degree-of-freedom (1-DoF) overconstrained robotic limbs for a desired spatial trajectory, while achieving energy-efficient, self-collision-free motion in full-cycle rotations. Firstly, we present the geometric optimization problem of linkage-based robotic limbs in a generalized formulation for self-collision-free design. Next, we formulate the spatial trajectory generation problem with the overconstrained linkages by optimizing the similarity and dynamic-related metrics. We further optimize the geometric shape of the overconstrained linkage to ensure smooth and collision-free motion driven by a single actuator. We validated our proposed method through various experiments, including personalized automata and bio-inspired hexapod robots. The resulting hexapod robot, featuring overconstrained robotic limbs, demonstrated outstanding energy efficiency during forward walking.
comment: 23 pages, 11 figures, 2 tables. Accepted by Fundamental Research. For Supplementary Videos, see https://bionicdl.ancorasir.com/?p=1668
Developing Vision-Language-Action Model from Egocentric Videos
Egocentric videos capture how humans manipulate objects and tools, providing diverse motion cues for learning object manipulation. Unlike the costly, expert-driven manual teleoperation commonly used in training Vision-Language-Action models (VLAs), egocentric videos offer a scalable alternative. However, prior studies that leverage such videos for training robot policies typically rely on auxiliary annotations, such as detailed hand-pose recordings. Consequently, it remains unclear whether VLAs can be trained directly from raw egocentric videos. In this work, we address this challenge by leveraging EgoScaler, a framework that extracts 6DoF object manipulation trajectories from egocentric videos without requiring auxiliary recordings. We apply EgoScaler to four large-scale egocentric video datasets and automatically refine noisy or incomplete trajectories, thereby constructing a new large-scale dataset for VLA pre-training. Our experiments with a state-of-the-art $\pi_0$ architecture in both simulated and real-robot environments yield three key findings: (i) pre-training on our dataset improves task success rates by over 20\% compared to training from scratch, (ii) the performance is competitive with that achieved using real-robot datasets, and (iii) combining our dataset with real-robot data yields further improvements. These results demonstrate that egocentric videos constitute a promising and scalable resource for advancing VLA research.
Hybrid Diffusion for Simultaneous Symbolic and Continuous Planning
Constructing robots to accomplish long-horizon tasks is a long-standing challenge within artificial intelligence. Approaches using generative methods, particularly Diffusion Models, have gained attention due to their ability to model continuous robotic trajectories for planning and control. However, we show that these models struggle with long-horizon tasks that involve complex decision-making and, in general, are prone to confusing different modes of behavior, leading to failure. To remedy this, we propose to augment continuous trajectory generation by simultaneously generating a high-level symbolic plan. We show that this requires a novel mix of discrete variable diffusion and continuous diffusion, which dramatically outperforms the baselines. In addition, we illustrate how this hybrid diffusion process enables flexible trajectory synthesis, allowing us to condition synthesized actions on partial and complete symbolic conditions.
comment: 10 pages, 11 figures. This work has been submitted to the IEEE for possible publication. See https://sigmundhh.com/hybrid_diffusion/ for the project website
FlowDrive: moderated flow matching with data balancing for trajectory planning
Learning-based planners are sensitive to the long-tailed distribution of driving data. Common maneuvers dominate datasets, while dangerous or rare scenarios are sparse. This imbalance can bias models toward the frequent cases and degrade performance on critical scenarios. To tackle this problem, we compare balancing strategies for sampling training data and find reweighting by trajectory pattern an effective approach. We then present FlowDrive, a flow-matching trajectory planner that learns a conditional rectified flow to map noise directly to trajectory distributions with few flow-matching steps. We further introduce moderated, in-the-loop guidance that injects small perturbation between flow steps to systematically increase trajectory diversity while remaining scene-consistent. On nuPlan and the interaction-focused interPlan benchmarks, FlowDrive achieves state-of-the-art results among learning-based planners and approaches methods with rule-based refinements. After adding moderated guidance and light post-processing (FlowDrive*), it achieves overall state-of-the-art performance across nearly all benchmark splits.
Learnable Conformal Prediction with Context-Aware Nonconformity Functions for Robotic Planning and Perception
Deep learning models in robotics often output point estimates with poorly calibrated confidences, offering no native mechanism to quantify predictive reliability under novel, noisy, or out-of-distribution inputs. Conformal prediction (CP) addresses this gap by providing distribution-free coverage guarantees, yet its reliance on fixed nonconformity scores ignores context and can yield intervals that are overly conservative or unsafe. We address this with Learnable Conformal Prediction (LCP), which replaces fixed scores with a lightweight neural function that leverages geometric, semantic, and task-specific features to produce context-aware uncertainty sets. LCP maintains CP's theoretical guarantees while reducing prediction set sizes by 18% in classification, tightening detection intervals by 52%, and improving path planning safety from 72% to 91% success with minimal overhead. Across three robotic tasks on seven benchmarks, LCP consistently outperforms Standard CP and ensemble baselines. In classification on CIFAR-100 and ImageNet, it achieves smaller set sizes (4.7-9.9% reduction) at target coverage. For object detection on COCO, BDD100K, and Cityscapes, it produces 46-54% tighter bounding boxes. In path planning through cluttered environments, it improves success to 91.5% with only 4.5% path inflation, compared to 12.2% for Standard CP. The method is lightweight (approximately 4.8% runtime overhead, 42 KB memory) and supports online adaptation, making it well suited to resource-constrained autonomous systems. Hardware evaluation shows LCP adds less than 1% memory and 15.9% inference overhead, yet sustains 39 FPS on detection tasks while being 7.4 times more energy-efficient than ensembles.
DynaNav: Dynamic Feature and Layer Selection for Efficient Visual Navigation NeurIPS 2025
Visual navigation is essential for robotics and embodied AI. However, existing foundation models, particularly those with transformer decoders, suffer from high computational overhead and lack interpretability, limiting their deployment in resource-tight scenarios. To address this, we propose DynaNav, a Dynamic Visual Navigation framework that adapts feature and layer selection based on scene complexity. It employs a trainable hard feature selector for sparse operations, enhancing efficiency and interpretability. Additionally, we integrate feature selection into an early-exit mechanism, with Bayesian Optimization determining optimal exit thresholds to reduce computational cost. Extensive experiments in real-world-based datasets and simulated environments demonstrate the effectiveness of DynaNav. Compared to ViNT, DynaNav achieves a 2.26x reduction in FLOPs, 42.3% lower inference time, and 32.8% lower memory usage, while improving navigation performance across four public datasets.
comment: Accepted as a poster in NeurIPS 2025
SAGE: Scene Graph-Aware Guidance and Execution for Long-Horizon Manipulation Tasks
Successfully solving long-horizon manipulation tasks remains a fundamental challenge. These tasks involve extended action sequences and complex object interactions, presenting a critical gap between high-level symbolic planning and low-level continuous control. To bridge this gap, two essential capabilities are required: robust long-horizon task planning and effective goal-conditioned manipulation. Existing task planning methods, including traditional and LLM-based approaches, often exhibit limited generalization or sparse semantic reasoning. Meanwhile, image-conditioned control methods struggle to adapt to unseen tasks. To tackle these problems, we propose SAGE, a novel framework for Scene Graph-Aware Guidance and Execution in Long-Horizon Manipulation Tasks. SAGE utilizes semantic scene graphs as a structural representation for scene states. A structural scene graph enables bridging task-level semantic reasoning and pixel-level visuo-motor control. This also facilitates the controllable synthesis of accurate, novel sub-goal images. SAGE consists of two key components: (1) a scene graph-based task planner that uses VLMs and LLMs to parse the environment and reason about physically-grounded scene state transition sequences, and (2) a decoupled structural image editing pipeline that controllably converts each target sub-goal graph into a corresponding image through image inpainting and composition. Extensive experiments have demonstrated that SAGE achieves state-of-the-art performance on distinct long-horizon tasks.
WAVE: Worm Gear-based Adaptive Variable Elasticity for Decoupling Actuators from External Forces
Robotic manipulators capable of regulating both compliance and stiffness offer enhanced operational safety and versatility. Here, we introduce Worm Gear-based Adaptive Variable Elasticity (WAVE), a variable stiffness actuator (VSA) that integrates a non-backdrivable worm gear. By decoupling the driving motor from external forces using this gear, WAVE enables precise force transmission to the joint, while absorbing positional discrepancies through compliance. WAVE is protected from excessive loads by converting impact forces into elastic energy stored in a spring. In addition, the actuator achieves continuous joint stiffness modulation by changing the spring's precompression length. We demonstrate these capabilities, experimentally validate the proposed stiffness model, show that motor loads approach zero at rest--even under external loading--and present applications using a manipulator with WAVE. This outcome showcases the successful decoupling of external forces. The protective attributes of this actuator allow for extended operation in contact-intensive tasks, and for robust robotic applications in challenging environments.
Improved Vehicle Maneuver Prediction using Game Theoretic Priors
Conventional maneuver prediction methods use some sort of classification model on temporal trajectory data to predict behavior of agents over a set time horizon. Despite of having the best precision and recall, these models cannot predict a lane change accurately unless they incorporate information about the entire scene. Level-k game theory can leverage the human-like hierarchical reasoning to come up with the most rational decisions each agent can make in a group. This can be leveraged to model interactions between different vehicles in presence of each other and hence compute the most rational decisions each agent would make. The result of game theoretic evaluation can be used as a "prior" or combined with a traditional motion-based classification model to achieve more accurate predictions. The proposed approach assumes that the states of the vehicles around the target lead vehicle are known. The module will output the most rational maneuver prediction of the target vehicle based on an online optimization solution. These predictions are instrumental in decision making systems like Adaptive Cruise Control (ACC) or Traxen's iQ-Cruise further improving the resulting fuel savings.
Learning Multi-Skill Legged Locomotion Using Conditional Adversarial Motion Priors
Despite growing interest in developing legged robots that emulate biological locomotion for agile navigation of complex environments, acquiring a diverse repertoire of skills remains a fundamental challenge in robotics. Existing methods can learn motion behaviors from expert data, but they often fail to acquire multiple locomotion skills through a single policy and lack smooth skill transitions. We propose a multi-skill learning framework based on Conditional Adversarial Motion Priors (CAMP), with the aim of enabling quadruped robots to efficiently acquire a diverse set of locomotion skills from expert demonstrations. Precise skill reconstruction is achieved through a novel skill discriminator and skill-conditioned reward design. The overall framework supports the active control and reuse of multiple skills, providing a practical solution for learning generalizable policies in complex environments.
The Turkish Ice Cream Robot: Examining Playful Deception in Social Human-Robot Interactions
Playful deception, a common feature in human social interactions, remains underexplored in Human-Robot Interaction (HRI). Inspired by the Turkish Ice Cream (TIC) vendor routine, we investigate how bounded, culturally familiar forms of deception influence user trust, enjoyment, and engagement during robotic handovers. We design a robotic manipulator equipped with a custom end-effector and implement five TIC-inspired trick policies that deceptively delay the handover of an ice cream-shaped object. Through a mixed-design user study with 91 participants, we evaluate the effects of playful deception and interaction duration on user experience. Results reveal that TIC-inspired deception significantly enhances enjoyment and engagement, though reduces perceived safety and trust, suggesting a structured trade-off across the multi-dimensional aspects. Our findings demonstrate that playful deception can be a valuable design strategy for interactive robots in entertainment and engagement-focused contexts, while underscoring the importance of deliberate consideration of its complex trade-offs. You can find more information, including demonstration videos, on https://hyeonseong-kim98.github.io/turkish-ice-cream-robot/ .
comment: for more videos, see https://hyeonseong-kim98.github.io/turkish-ice-cream-robot/
VLBiMan: Vision-Language Anchored One-Shot Demonstration Enables Generalizable Robotic Bimanual Manipulation
Achieving generalizable bimanual manipulation requires systems that can learn efficiently from minimal human input while adapting to real-world uncertainties and diverse embodiments. Existing approaches face a dilemma: imitation policy learning demands extensive demonstrations to cover task variations, while modular methods often lack flexibility in dynamic scenes. We introduce VLBiMan, a framework that derives reusable skills from a single human example through task-aware decomposition, preserving invariant primitives as anchors while dynamically adapting adjustable components via vision-language grounding. This adaptation mechanism resolves scene ambiguities caused by background changes, object repositioning, or visual clutter without policy retraining, leveraging semantic parsing and geometric feasibility constraints. Moreover, the system inherits human-like hybrid control capabilities, enabling mixed synchronous and asynchronous use of both arms. Extensive experiments validate VLBiMan across tool-use and multi-object tasks, demonstrating: (1) a drastic reduction in demonstration requirements compared to imitation baselines, (2) compositional generalization through atomic skill splicing for long-horizon tasks, (3) robustness to novel but semantically similar objects and external disturbances, and (4) strong cross-embodiment transfer, showing that skills learned from human demonstrations can be instantiated on different robotic platforms without retraining. By bridging human priors with vision-language anchored adaptation, our work takes a step toward practical and versatile dual-arm manipulation in unstructured settings.
comment: under review
mindmap: Spatial Memory in Deep Feature Maps for 3D Action Policies
End-to-end learning of robot control policies, structured as neural networks, has emerged as a promising approach to robotic manipulation. To complete many common tasks, relevant objects are required to pass in and out of a robot's field of view. In these settings, spatial memory - the ability to remember the spatial composition of the scene - is an important competency. However, building such mechanisms into robot learning systems remains an open research problem. We introduce mindmap (Spatial Memory in Deep Feature Maps for 3D Action Policies), a 3D diffusion policy that generates robot trajectories based on a semantic 3D reconstruction of the environment. We show in simulation experiments that our approach is effective at solving tasks where state-of-the-art approaches without memory mechanisms struggle. We release our reconstruction system, training code, and evaluation tasks to spur research in this direction.
comment: Accepted to CoRL 2025 Workshop RemembeRL
AnchDrive: Bootstrapping Diffusion Policies with Hybrid Trajectory Anchors for End-to-End Driving
End-to-end multi-modal planning has become a transformative paradigm in autonomous driving, effectively addressing behavioral multi-modality and the generalization challenge in long-tail scenarios. We propose AnchDrive, a framework for end-to-end driving that effectively bootstraps a diffusion policy to mitigate the high computational cost of traditional generative models. Rather than denoising from pure noise, AnchDrive initializes its planner with a rich set of hybrid trajectory anchors. These anchors are derived from two complementary sources: a static vocabulary of general driving priors and a set of dynamic, context-aware trajectories. The dynamic trajectories are decoded in real-time by a Transformer that processes dense and sparse perceptual features. The diffusion model then learns to refine these anchors by predicting a distribution of trajectory offsets, enabling fine-grained refinement. This anchor-based bootstrapping design allows for efficient generation of diverse, high-quality trajectories. Experiments on the NAVSIM benchmark confirm that AnchDrive sets a new state-of-the-art and shows strong generalizability
MapExRL: Human-Inspired Indoor Exploration with Predicted Environment Context and Reinforcement Learning
Path planning for robotic exploration is challenging, requiring reasoning over unknown spaces and anticipating future observations. Efficient exploration requires selecting budget-constrained paths that maximize information gain. Despite advances in autonomous exploration, existing algorithms still fall short of human performance, particularly in structured environments where predictive cues exist but are underutilized. Guided by insights from our user study, we introduce MapExRL, which improves robot exploration efficiency in structured indoor environments by enabling longer-horizon planning through a learned policy and global map predictions. Unlike many learning-based exploration methods that use motion primitives as the action space, our approach leverages frontiers for more efficient model learning and longer horizon reasoning. Our framework generates global map predictions from the observed map, which our policy utilizes, along with the prediction uncertainty, estimated sensor coverage, frontier distance, and remaining distance budget, to assess the strategic long-term value of frontiers. By leveraging multiple frontier scoring methods and additional context, our policy makes more informed decisions at each stage of the exploration. We evaluate our framework on a real-world indoor map dataset, achieving up to an 18.8% improvement over the strongest state-of-the-art baseline, with even greater gains compared to conventional frontier-based algorithms. Website: https://mapexrl.github.io
comment: 8 pages, 6 figures, ICAR 2025
Learning Personalized Driving Styles via Reinforcement Learning from Human Feedback
Generating human-like and adaptive trajectories is essential for autonomous driving in dynamic environments. While generative models have shown promise in synthesizing feasible trajectories, they often fail to capture the nuanced variability of personalized driving styles due to dataset biases and distributional shifts. To address this, we introduce TrajHF, a human feedback-driven finetuning framework for generative trajectory models, designed to align motion planning with diverse driving styles. TrajHF incorporates multi-conditional denoiser and reinforcement learning with human feedback to refine multi-modal trajectory generation beyond conventional imitation learning. This enables better alignment with human driving preferences while maintaining safety and feasibility constraints. TrajHF achieves performance comparable to the state-of-the-art on NavSim benchmark. TrajHF sets a new paradigm for personalized and adaptable trajectory generation in autonomous driving.
comment: 20 pages, 6 figures
Mobi-$π$: Mobilizing Your Robot Learning Policy
Learned visuomotor policies are capable of performing increasingly complex manipulation tasks. However, most of these policies are trained on data collected from limited robot positions and camera viewpoints. This leads to poor generalization to novel robot positions, which limits the use of these policies on mobile platforms, especially for precise tasks like pressing buttons or turning faucets. In this work, we formulate the policy mobilization problem: find a mobile robot base pose in a novel environment that is in distribution with respect to a manipulation policy trained on a limited set of camera viewpoints. Compared to retraining the policy itself to be more robust to unseen robot base pose initializations, policy mobilization decouples navigation from manipulation and thus does not require additional demonstrations. Crucially, this problem formulation complements existing efforts to improve manipulation policy robustness to novel viewpoints and remains compatible with them. We propose a novel approach for policy mobilization that bridges navigation and manipulation by optimizing the robot's base pose to align with an in-distribution base pose for a learned policy. Our approach utilizes 3D Gaussian Splatting for novel view synthesis, a score function to evaluate pose suitability, and sampling-based optimization to identify optimal robot poses. To understand policy mobilization in more depth, we also introduce the Mobi-$\pi$ framework, which includes: (1) metrics that quantify the difficulty of mobilizing a given policy, (2) a suite of simulated mobile manipulation tasks based on RoboCasa to evaluate policy mobilization, and (3) visualization tools for analysis. In both our developed simulation task suite and the real world, we show that our approach outperforms baselines, demonstrating its effectiveness for policy mobilization.
comment: CoRL 2025. Project website: https://mobipi.github.io/
Excavating in the Wild: The GOOSE-Ex Dataset for Semantic Segmentation ICRA
The successful deployment of deep learning-based techniques for autonomous systems is highly dependent on the data availability for the respective system in its deployment environment. Especially for unstructured outdoor environments, very few datasets exist for even fewer robotic platforms and scenarios. In an earlier work, we presented the German Outdoor and Offroad Dataset (GOOSE) framework along with 10000 multimodal frames from an offroad vehicle to enhance the perception capabilities in unstructured environments. In this work, we address the generalizability of the GOOSE framework. To accomplish this, we open-source the GOOSE-Ex dataset, which contains additional 5000 labeled multimodal frames from various completely different environments, recorded on a robotic excavator and a quadruped platform. We perform a comprehensive analysis of the semantic segmentation performance on different platforms and sensor modalities in unseen environments. In addition, we demonstrate how the combined datasets can be utilized for different downstream applications or competitions such as offroad navigation, object manipulation or scene completion. The dataset, its platform documentation and pre-trained state-of-the-art models for offroad perception will be made available on https://goose-dataset.de/. \
comment: Accepted for publication at 2025 IEEE International Conference on Robotics and Automation (ICRA)
GraphSCENE: On-Demand Critical Scenario Generation for Autonomous Vehicles in Simulation IROS 2025
Testing and validating Autonomous Vehicle (AV) performance in safety-critical and diverse scenarios is crucial before real-world deployment. However, manually creating such scenarios in simulation remains a significant and time-consuming challenge. This work introduces a novel method that generates dynamic temporal scene graphs corresponding to diverse traffic scenarios, on-demand, tailored to user-defined preferences, such as AV actions, sets of dynamic agents, and criticality levels. A temporal Graph Neural Network (GNN) model learns to predict relationships between ego-vehicle, agents, and static structures, guided by real-world spatiotemporal interaction patterns and constrained by an ontology that restricts predictions to semantically valid links. Our model consistently outperforms the baselines in accurately generating links corresponding to the requested scenarios. We render the predicted scenarios in simulation to further demonstrate their effectiveness as testing environments for AV agents.
comment: Accepted to the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025)
The need for and feasibility of alternative ground robots to traverse sandy and rocky extraterrestrial terrain
Robotic spacecraft have helped expand our reach for many planetary exploration missions. Most ground mobile planetary exploration robots use wheeled or modified wheeled platforms. Although extraordinarily successful at completing intended mission goals, because of the limitations of wheeled locomotion, they have been largely limited to benign, solid terrain and avoided extreme terrain with loose soil/sand and large rocks. Unfortunately, such challenging terrain is often scientifically interesting for planetary geology. Although many animals traverse such terrain at ease, robots have not matched their performance and robustness. This is in major part due to a lack of fundamental understanding of how effective locomotion can be generated from controlled interaction with complex terrain on the same level of flight aerodynamics and underwater vehicle hydrodynamics. Early fundamental understanding of legged and limbless locomotor-ground interaction has already enabled stable and efficient bio-inspired robot locomotion on relatively flat ground with small obstacles. Recent progress in the new field of terradynamics of locomotor-terrain interaction begins to reveal the principles of bio-inspired locomotion on loose soil/sand and over large obstacles. Multi-legged and limbless platforms using terradynamics insights hold the promise for serving as robust alternative platforms for traversing extreme extraterrestrial terrain and expanding our reach in planetary exploration.
Multi-CAP: A Multi-Robot Connectivity-Aware Hierarchical Coverage Path Planning Algorithm for Unknown Environments
Efficient coordination of multiple robots for coverage of large, unknown environments is a significant challenge that involves minimizing the total coverage path length while reducing inter-robot conflicts. In this paper, we introduce a Multi-robot Connectivity-Aware Planner (Multi-CAP), a hierarchical coverage path planning algorithm that facilitates multi-robot coordination through a novel connectivity-aware approach. The algorithm constructs and dynamically maintains an adjacency graph that represents the environment as a set of connected subareas. Critically, we make the assumption that the environment, while unknown, is bounded. This allows for incremental refinement of the adjacency graph online to ensure its structure represents the physical layout of the space, both in observed and unobserved areas of the map as robots explore the environment. We frame the task of assigning subareas to robots as a Vehicle Routing Problem (VRP), a well-studied problem for finding optimal routes for a fleet of vehicles. This is used to compute disjoint tours that minimize redundant travel, assigning each robot a unique, non-conflicting set of subareas. Each robot then executes its assigned tour, independently adapting its coverage strategy within each subarea to minimize path length based on real-time sensor observations of the subarea. We demonstrate through simulations and multi-robot hardware experiments that Multi-CAP significantly outperforms state-of-the-art methods in key metrics, including coverage time, total path length, and path overlap ratio. Ablation studies further validate the critical role of our connectivity-aware graph and the global tour planner in achieving these performance gains.
Decentralized Aerial Manipulation of a Cable-Suspended Load using Multi-Agent Reinforcement Learning
This paper presents the first decentralized method to enable real-world 6-DoF manipulation of a cable-suspended load using a team of Micro-Aerial Vehicles (MAVs). Our method leverages multi-agent reinforcement learning (MARL) to train an outer-loop control policy for each MAV. Unlike state-of-the-art controllers that utilize a centralized scheme, our policy does not require global states, inter-MAV communications, nor neighboring MAV information. Instead, agents communicate implicitly through load pose observations alone, which enables high scalability and flexibility. It also significantly reduces computing costs during inference time, enabling onboard deployment of the policy. In addition, we introduce a new action space design for the MAVs using linear acceleration and body rates. This choice, combined with a robust low-level controller, enables reliable sim-to-real transfer despite significant uncertainties caused by cable tension during dynamic 3D motion. We validate our method in various real-world experiments, including full-pose control under load model uncertainties, showing setpoint tracking performance comparable to the state-of-the-art centralized method. We also demonstrate cooperation amongst agents with heterogeneous control policies, and robustness to the complete in-flight loss of one MAV. Videos of experiments: https://autonomousrobots.nl/paper_websites/aerial-manipulation-marl
Gaussian-Process-based Adaptive Tracking Control with Dynamic Active Learning for Autonomous Ground Vehicles
This article proposes an active-learning-based adaptive trajectory tracking control method for autonomous ground vehicles to compensate for modeling errors and unmodeled dynamics. The nominal vehicle model is decoupled into lateral and longitudinal subsystems, which are augmented with online Gaussian Processes (GPs), using measurement data. The estimated mean functions of the GPs are used to construct a feedback compensator, which, together with an LPV state feedback controller designed for the nominal system, gives the adaptive control structure. To assist exploration of the dynamics, the paper proposes a new, dynamic active learning method to collect the most informative samples to accelerate the training process. To analyze the performance of the overall learning tool-chain provided controller, a novel iterative, counterexample-based algorithm is proposed for calculating the induced L2 gain between the reference trajectory and the tracking error. The analysis can be executed for a set of possible realizations of the to-be-controlled system, giving robust performance certificate of the learning method under variation of the vehicle dynamics. The efficiency of the proposed control approach is shown on a high-fidelity physics simulator and in real experiments using a 1/10 scale F1TENTH electric car.
comment: Submitted to IEEE Transactions on Control Systems Technology (revised)
Flexible and Foldable: Workspace Analysis and Object Manipulation Using a Soft, Interconnected, Origami-Inspired Actuator Array
Object manipulation is a fundamental challenge in robotics, where systems must balance trade-offs among manipulation capabilities, system complexity, and throughput. Distributed manipulator systems (DMS) use the coordinated motion of actuator arrays to perform complex object manipulation tasks, seeing widespread exploration within the literature and in industry. However, existing DMS designs typically rely on high actuator densities and impose constraints on object-to-actuator scale ratios, limiting their adaptability. We present a novel DMS design utilizing an array of 3-DoF, origami-inspired robotic tiles interconnected by a compliant surface layer. Unlike conventional DMS, our approach enables manipulation not only at the actuator end effectors but also across a flexible surface connecting all actuators; creating a continuous, controllable manipulation surface. We analyse the combined workspace of such a system, derive simple motion primitives, and demonstrate its capabilities to translate simple geometric objects across an array of tiles. By leveraging the inter-tile connective material, our approach significantly reduces actuator density, increasing the area over which an object can be manipulated by x1.84 without an increase in the number of actuators. This design offers a lower cost and complexity alternative to traditional high-density arrays, and introduces new opportunities for manipulation strategies that leverage the flexibility of the interconnected surface.
Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities ICCV 2025
Recent Vision-and-Language Navigation (VLN) advancements are promising, but their idealized assumptions about robot movement and control fail to reflect physically embodied deployment challenges. To bridge this gap, we introduce VLN-PE, a physically realistic VLN platform supporting humanoid, quadruped, and wheeled robots. For the first time, we systematically evaluate several ego-centric VLN methods in physical robotic settings across different technical pipelines, including classification models for single-step discrete action prediction, a diffusion model for dense waypoint prediction, and a train-free, map-based large language model (LLM) integrated with path planning. Our results reveal significant performance degradation due to limited robot observation space, environmental lighting variations, and physical challenges like collisions and falls. This also exposes locomotion constraints for legged robots in complex environments. VLN-PE is highly extensible, allowing seamless integration of new scenes beyond MP3D, thereby enabling more comprehensive VLN evaluation. Despite the weak generalization of current models in physical deployment, VLN-PE provides a new pathway for improving cross-embodiment's overall adaptability. We hope our findings and tools inspire the community to rethink VLN limitations and advance robust, practical VLN models. The code is available at https://crystalsixone.github.io/vln_pe.github.io/.
comment: Accepted by ICCV 2025
AVR: Active Vision-Driven Precise Robot Manipulation with Viewpoint and Focal Length Optimization
Robotic manipulation in complex scenes demands precise perception of task-relevant details, yet fixed or suboptimal viewpoints often impair fine-grained perception and induce occlusions, constraining imitation-learned policies. We present AVR (Active Vision-driven Robotics), a bimanual teleoperation and learning framework that unifies head-tracked viewpoint control (HMD-to-2-DoF gimbal) with motorized optical zoom to keep targets centered at an appropriate scale during data collection and deployment. In simulation, an AVR plugin augments RoboTwin demonstrations by emulating active vision (ROI-conditioned viewpoint change, aspect-ratio-preserving crops with explicit zoom ratios, and super-resolution), yielding 5-17% gains in task success across diverse manipulations. On our real-world platform, AVR improves success on most tasks, with over 25% gains compared to the static-view baseline, and extended studies further demonstrate robustness under occlusion, clutter, and lighting disturbances, as well as generalization to unseen environments and objects. These results pave the way for future robotic precision manipulation methods in the pursuit of human-level dexterity and precision.
comment: Project Page: https://AVR-robot.github.io
Learning Hierarchical Domain Models Through Environment-Grounded Interaction
Domain models enable autonomous agents to solve long-horizon tasks by producing interpretable plans. However, in open-world environments, a single general domain model cannot capture the variety of tasks, so agents must generate suitable task-specific models on the fly. Large Language Models (LLMs), with their implicit common knowledge, can generate such domains, but suffer from high error rates that limit their applicability. Hence, related work relies on extensive human feed-back or prior knowledge, which undermines autonomous, open-world deployment. In this work, we propose LODGE, a framework for autonomous domain learning from LLMs and environment grounding. LODGE builds on hierarchical abstractions and automated simulations to identify and correct inconsistencies between abstraction layers and between the model and environment. Our framework is task-agnostic, as it generates predicates, operators, and their preconditions and effects, while only assuming access to a simulator and a set of generic, executable low-level skills. Experiments on two International Planning Competition ( IPC) domains and a robotic assembly domain show that LODGE yields more accurate domain models and higher task success than existing methods, requiring remarkably few environment interactions and no human feedback or demonstrations.
Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling
Safe and feasible trajectory planning is critical for real-world autonomous driving systems. However, existing learning-based planners rely heavily on expert demonstrations, which not only lack explicit safety awareness but also risk inheriting undesirable behaviors such as speeding from suboptimal human driving data. Inspired by the success of large language models, we propose Plan-R1, a two-stage trajectory planning framework that decouples principle alignment from behavior learning. In the first stage, a general trajectory predictor is pre-trained on expert data to capture diverse, human-like driving behaviors. In the second stage, the model is fine-tuned with rule-based rewards using Group Relative Policy Optimization (GRPO), explicitly aligning ego planning with principles such as safety, comfort, and traffic rule compliance. This two-stage paradigm retains human-like behaviors while enhancing safety awareness and discarding undesirable patterns from demonstrations. Furthermore, we identify a key limitation of directly applying GRPO to planning: group-wise normalization erases cross-group scale differences, causing rare, high-variance safety-violation groups to have similar advantages as abundant low-variance safe groups, thereby suppressing optimization for safety-critical objectives. To address this, we propose Variance-Decoupled GRPO (VD-GRPO), which replaces normalization with centering and fixed scaling to preserve absolute reward magnitudes, ensuring that safety-critical objectives remain dominant throughout training. Experiments on the nuPlan benchmark demonstrate that Plan-R1 significantly improves planning safety and feasibility, achieving state-of-the-art performance, particularly in realistic reactive settings. Our code is available at https://github.com/XiaolongTang23/Plan-R1.
STHN: Deep Homography Estimation for UAV Thermal Geo-localization with Satellite Imagery
Accurate geo-localization of Unmanned Aerial Vehicles (UAVs) is crucial for outdoor applications including search and rescue operations, power line inspections, and environmental monitoring. The vulnerability of Global Navigation Satellite Systems (GNSS) signals to interference and spoofing necessitates the development of additional robust localization methods for autonomous navigation. Visual Geo-localization (VG), leveraging onboard cameras and reference satellite maps, offers a promising solution for absolute localization. Specifically, Thermal Geo-localization (TG), which relies on image-based matching between thermal imagery with satellite databases, stands out by utilizing infrared cameras for effective nighttime localization. However, the efficiency and effectiveness of current TG approaches, are hindered by dense sampling on satellite maps and geometric noises in thermal query images. To overcome these challenges, we introduce STHN, a novel UAV thermal geo-localization approach that employs a coarse-to-fine deep homography estimation method. This method attains reliable thermal geo-localization within a 512-meter radius of the UAV's last known location even with a challenging 11% size ratio between thermal and satellite images, despite the presence of indistinct textures and self-similar patterns. We further show how our research significantly enhances UAV thermal geo-localization performance and robustness against geometric noises under low-visibility conditions in the wild. The code is made publicly available.
comment: 8 pages, 7 figures. Accepted for IEEE Robotics and Automation Letters
Empowering Multi-Robot Cooperation via Sequential World Models
Model-based reinforcement learning (MBRL) has shown significant potential in robotics due to its high sample efficiency and planning capability. However, extending MBRL to multi-robot cooperation remains challenging due to the complexity of joint dynamics and the reliance on synchronous communication. SeqWM employs independent, autoregressive agent-wise world models to represent joint dynamics, where each agent generates its future trajectory and plans its actions based on the predictions of its predecessors. This design lowers modeling complexity, alleviates the reliance on communication synchronization, and enables the emergence of advanced cooperative behaviors through explicit intention sharing. Experiments in challenging simulated environments (Bi-DexHands and Multi-Quad) demonstrate that SeqWM outperforms existing state-of-the-art model-based and model-free baselines in both overall performance and sample efficiency, while exhibiting advanced cooperative behaviors such as predictive adaptation, temporal alignment, and role division. Furthermore, SeqWM has been success fully deployed on physical quadruped robots, demonstrating its effectiveness in real-world multi-robot systems. Demos and code are available at: https://sites.google.com/view/seqwm-marl
DSPv2: Improved Dense Policy for Effective and Generalizable Whole-body Mobile Manipulation
Learning whole-body mobile manipulation via imitation is essential for generalizing robotic skills to diverse environments and complex tasks. However, this goal is hindered by significant challenges, particularly in effectively processing complex observation, achieving robust generalization, and generating coherent actions. To address these issues, we propose DSPv2, a novel policy architecture. DSPv2 introduces an effective encoding scheme that aligns 3D spatial features with multi-view 2D semantic features. This fusion enables the policy to achieve broad generalization while retaining the fine-grained perception necessary for precise control. Furthermore, we extend the Dense Policy paradigm to the whole-body mobile manipulation domain, demonstrating its effectiveness in generating coherent and precise actions for the whole-body robotic platform. Extensive experiments show that our method significantly outperforms existing approaches in both task performance and generalization ability. Project page is available at: https://selen-suyue.github.io/DSPv2Net/.
Multiagent Systems
Voting-Bloc Entropy: A New Metric for DAO Decentralization USENIX Security 2025
Decentralized Autonomous Organizations (DAOs) use smart contracts to foster communities working toward common goals. Existing definitions of decentralization, however -- the 'D' in DAO -- fall short of capturing the key properties characteristic of diverse and equitable participation. This work proposes a new framework for measuring DAO decentralization called Voting-Bloc Entropy (VBE, pronounced ''vibe''). VBE is based on the idea that voters with closely aligned interests act as a centralizing force and should be modeled as such. VBE formalizes this notion by measuring the similarity of participants' utility functions across a set of voting rounds. Unlike prior, ad hoc definitions of decentralization, VBE derives from first principles: We introduce a simple (yet powerful) reinforcement learning-based conceptual model for voting, that in turn implies VBE. We first show VBE's utility as a theoretical tool. We prove a number of results about the (de)centralizing effects of vote delegation, proposal bundling, bribery, etc. that are overlooked in previous notions of DAO decentralization. Our results lead to practical suggestions for enhancing DAO decentralization. We also show how VBE can be used empirically by presenting measurement studies and VBE-based governance experiments. We make the tools we developed for these results available to the community in the form of open-source artifacts in order to facilitate future study of DAO decentralization.
comment: Full version of the paper published in USENIX Security 2025
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning
Reinforcement learning (RL) is the dominant paradigm for sharpening strategic tool use capabilities of LLMs on long-horizon, sparsely-rewarded agent tasks, yet it faces a fundamental challenge of exploration-exploitation trade-off. Existing studies stimulate exploration through the lens of policy entropy, but such mechanical entropy maximization is prone to RL training instability due to the multi-turn distribution shifting. In this paper, we target the progressive exploration-exploitation balance under the guidance of the agent own experiences without succumbing to either entropy collapsing or runaway divergence. We propose SPEAR, a curriculum-based self-imitation learning (SIL) recipe for training agentic LLMs. It extends the vanilla SIL framework, where a replay buffer stores self-generated promising trajectories for off-policy update, by gradually steering the policy evolution within a well-balanced range of entropy across stages. Specifically, our approach incorporates a curriculum to manage the exploration process, utilizing intrinsic rewards to foster skill-level exploration and facilitating action-level exploration through SIL. At first, the auxiliary tool call reward plays a critical role in the accumulation of tool-use skills, enabling broad exposure to the unfamiliar distributions of the environment feedback with an upward entropy trend. As training progresses, self-imitation gets strengthened to exploit existing successful patterns from replayed experiences for comparative action-level exploration, accelerating solution iteration without unbounded entropy growth. To further stabilize training, we recalibrate the advantages of experiences in the replay buffer to address the potential policy drift. Reugularizations such as the clipping of tokens with high covariance between probability and advantage are introduced to the trajectory-level entropy control to curb over-confidence.
comment: 26 pages, 11 figures
Effective Policy Learning for Multi-Agent Online Coordination Beyond Submodular Objectives NeurIPS 2025
In this paper, we present two effective policy learning algorithms for multi-agent online coordination(MA-OC) problem. The first one, \texttt{MA-SPL}, not only can achieve the optimal $(1-\frac{c}{e})$-approximation guarantee for the MA-OC problem with submodular objectives but also can handle the unexplored $\alpha$-weakly DR-submodular and $(\gamma,\beta)$-weakly submodular scenarios, where $c$ is the curvature of the investigated submodular functions, $\alpha$ denotes the diminishing-return(DR) ratio and the tuple $(\gamma,\beta)$ represents the submodularity ratios. Subsequently, in order to reduce the reliance on the unknown parameters $\alpha,\gamma,\beta$ inherent in the \texttt{MA-SPL} algorithm, we further introduce the second online algorithm named \texttt{MA-MPL}. This \texttt{MA-MPL} algorithm is entirely \emph{parameter-free} and simultaneously can maintain the same approximation ratio as the first \texttt{MA-SPL} algorithm. The core of our \texttt{MA-SPL} and \texttt{MA-MPL} algorithms is a novel continuous-relaxation technique termed as \emph{policy-based continuous extension}. Compared with the well-established \emph{multi-linear extension}, a notable advantage of this new \emph{policy-based continuous extension} is its ability to provide a lossless rounding scheme for any set function, thereby enabling us to tackle the challenging weakly submodular objectives. Finally, extensive simulations are conducted to validate the effectiveness of our proposed algorithms.
comment: Accepted to NeurIPS 2025
Learning from Delayed Feedback in Games via Extra Prediction
This study raises and addresses the problem of time-delayed feedback in learning in games. Because learning in games assumes that multiple agents independently learn their strategies, a discrepancy in optimization often emerges among the agents. To overcome this discrepancy, the prediction of the future reward is incorporated into algorithms, typically known as Optimistic Follow-the-Regularized-Leader (OFTRL). However, the time delay in observing the past rewards hinders the prediction. Indeed, this study firstly proves that even a single-step delay worsens the performance of OFTRL from the aspects of regret and convergence. This study proposes the weighted OFTRL (WOFTRL), where the prediction vector of the next reward in OFTRL is weighted $n$ times. We further capture an intuition that the optimistic weight cancels out this time delay. We prove that when the optimistic weight exceeds the time delay, our WOFTRL recovers the good performances that the regret is constant ($O(1)$-regret) in general-sum normal-form games, and the strategies converge to the Nash equilibrium as a subsequence (best-iterate convergence) in poly-matrix zero-sum games. The theoretical results are supported and strengthened by our experiments.
comment: 11 pages, 3 figures (main); 9 pages (appendix)
VizGen: Data Exploration and Visualization from Natural Language via a Multi-Agent AI Architecture
Data visualization is essential for interpreting complex datasets, yet traditional tools often require technical expertise, limiting accessibility. VizGen is an AI-assisted graph generation system that empowers users to create meaningful visualizations using natural language. Leveraging advanced NLP and LLMs like Claude 3.7 Sonnet and Gemini 2.0 Flash, it translates user queries into SQL and recommends suitable graph types. Built on a multi-agent architecture, VizGen handles SQL generation, graph creation, customization, and insight extraction. Beyond visualization, it analyzes data for patterns, anomalies, and correlations, and enhances user understanding by providing explanations enriched with contextual information gathered from the internet. The system supports real-time interaction with SQL databases and allows conversational graph refinement, making data analysis intuitive and accessible. VizGen democratizes data visualization by bridging the gap between technical complexity and user-friendly design.
Impact of Collective Behaviors of Autonomous Vehicles on Urban Traffic Dynamics: A Multi-Agent Reinforcement Learning Approach
This study examines the potential impact of reinforcement learning (RL)-enabled autonomous vehicles (AV) on urban traffic flow in a mixed traffic environment. We focus on a simplified day-to-day route choice problem in a multi-agent setting. We consider a city network where human drivers travel through their chosen routes to reach their destinations in minimum travel time. Then, we convert one-third of the population into AVs, which are RL agents employing Deep Q-learning algorithm. We define a set of optimization targets, or as we call them behaviors, namely selfish, collaborative, competitive, social, altruistic, and malicious. We impose a selected behavior on AVs through their rewards. We run our simulations using our in-house developed RL framework PARCOUR. Our simulations reveal that AVs optimize their travel times by up to 5\%, with varying impacts on human drivers' travel times depending on the AV behavior. In all cases where AVs adopt a self-serving behavior, they achieve shorter travel times than human drivers. Our findings highlight the complexity differences in learning tasks of each target behavior. We demonstrate that the multi-agent RL setting is applicable for collective routing on traffic networks, though their impact on coexisting parties greatly varies with the behaviors adopted.
comment: Work presented at the European Workshop on Reinforcement Learning (EWRL 2024)
Log2Plan: An Adaptive GUI Automation Framework Integrated with Task Mining Approach
GUI task automation streamlines repetitive tasks, but existing LLM or VLM-based planner-executor agents suffer from brittle generalization, high latency, and limited long-horizon coherence. Their reliance on single-shot reasoning or static plans makes them fragile under UI changes or complex tasks. Log2Plan addresses these limitations by combining a structured two-level planning framework with a task mining approach over user behavior logs, enabling robust and adaptable GUI automation. Log2Plan constructs high-level plans by mapping user commands to a structured task dictionary, enabling consistent and generalizable automation. To support personalization and reuse, it employs a task mining approach from user behavior logs that identifies user-specific patterns. These high-level plans are then grounded into low-level action sequences by interpreting real-time GUI context, ensuring robust execution across varying interfaces. We evaluated Log2Plan on 200 real-world tasks, demonstrating significant improvements in task success rate and execution time. Notably, it maintains over 60.0% success rate even on long-horizon task sequences, highlighting its robustness in complex, multi-step workflows.
Multi-Agent Path Finding via Offline RL and LLM Collaboration
Multi-Agent Path Finding (MAPF) poses a significant and challenging problem critical for applications in robotics and logistics, particularly due to its combinatorial complexity and the partial observability inherent in realistic environments. Decentralized reinforcement learning methods commonly encounter two substantial difficulties: first, they often yield self-centered behaviors among agents, resulting in frequent collisions, and second, their reliance on complex communication modules leads to prolonged training times, sometimes spanning weeks. To address these challenges, we propose an efficient decentralized planning framework based on the Decision Transformer (DT), uniquely leveraging offline reinforcement learning to substantially reduce training durations from weeks to mere hours. Crucially, our approach effectively handles long-horizon credit assignment and significantly improves performance in scenarios with sparse and delayed rewards. Furthermore, to overcome adaptability limitations inherent in standard RL methods under dynamic environmental changes, we integrate a large language model (GPT-4o) to dynamically guide agent policies. Extensive experiments in both static and dynamically changing environments demonstrate that our DT-based approach, augmented briefly by GPT-4o, significantly enhances adaptability and performance.
CoBel-World: Harnessing LLM Reasoning to Build a Collaborative Belief World for Optimizing Embodied Multi-Agent Collaboration
Effective real-world multi-agent collaboration requires not only accurate planning but also the ability to reason about collaborators' intents -- a crucial capability for avoiding miscoordination and redundant communication under partial observable environments. Due to their strong planning and reasoning capabilities, large language models (LLMs) have emerged as promising autonomous agents for collaborative task solving. However, existing collaboration frameworks for LLMs overlook their reasoning potential for dynamic intent inference, and thus produce inconsistent plans and redundant communication, reducing collaboration efficiency. To bridge this gap, we propose CoBel-World, a novel framework that equips LLM agents with a collaborative belief world -- an internal representation jointly modeling the physical environment and collaborators' mental states. CoBel-World enables agents to parse open-world task knowledge into structured beliefs via a symbolic belief language, and perform zero-shot Bayesian-style belief updates through LLM reasoning. This allows agents to proactively detect potential miscoordination (e.g., conflicting plans) and communicate adaptively. Evaluated on challenging embodied benchmarks (i.e., TDW-MAT and C-WAH), CoBel-World significantly reduces communication costs by 22-60% and improves task completion efficiency by 4-28% compared to the strongest baseline. Our results show that explicit, intent-aware belief modeling is essential for efficient and human-like collaboration in LLM-based multi-agent systems.
Reimagining Agent-based Modeling with Large Language Model Agents via Shachi
The study of emergent behaviors in large language model (LLM)-driven multi-agent systems is a critical research challenge, yet progress is limited by a lack of principled methodologies for controlled experimentation. To address this, we introduce Shachi, a formal methodology and modular framework that decomposes an agent's policy into core cognitive components: Configuration for intrinsic traits, Memory for contextual persistence, and Tools for expanded capabilities, all orchestrated by an LLM reasoning engine. This principled architecture moves beyond brittle, ad-hoc agent designs and enables the systematic analysis of how specific architectural choices influence collective behavior. We validate our methodology on a comprehensive 10-task benchmark and demonstrate its power through novel scientific inquiries. Critically, we establish the external validity of our approach by modeling a real-world U.S. tariff shock, showing that agent behaviors align with observed market reactions only when their cognitive architecture is appropriately configured with memory and tools. Our work provides a rigorous, open-source foundation for building and evaluating LLM agents, aimed at fostering more cumulative and scientifically grounded research.
Following the TRACE: A Structured Path to Empathetic Response Generation with Multi-Agent Models
Empathetic response generation is a crucial task for creating more human-like and supportive conversational agents. However, existing methods face a core trade-off between the analytical depth of specialized models and the generative fluency of Large Language Models (LLMs). To address this, we propose TRACE, Task-decomposed Reasoning for Affective Communication and Empathy, a novel framework that models empathy as a structured cognitive process by decomposing the task into a pipeline for analysis and synthesis. By building a comprehensive understanding before generation, TRACE unites deep analysis with expressive generation. Experimental results show that our framework significantly outperforms strong baselines in both automatic and LLM-based evaluations, confirming that our structured decomposition is a promising paradigm for creating more capable and interpretable empathetic agents. Our code is available at https://anonymous.4open.science/r/TRACE-18EF/README.md.
Axiomatic Choice and the Decision-Evaluation Paradox
We introduce a framework for modeling decisions with axioms that are statements about decisions, e.g., ethical constraints. Using our framework we define a taxonomy of decision axioms based on their structural properties and demonstrate a tension between the use of axioms to make decisions and the use of axioms to evaluate decisions which we call the Decision-Evaluation Paradox. We argue that the Decision-Evaluation Paradox arises with realistic axiom structures, and the paradox illuminates why one must be exceptionally careful when training models on decision data or applying axioms to make and evaluate decisions.
RobustFlow: Towards Robust Agentic Workflow Generation
The automated generation of agentic workflows is a promising frontier for enabling large language models (LLMs) to solve complex tasks. However, our investigation reveals that the robustness of agentic workflow remains a critical, unaddressed challenge. Current methods often generate wildly inconsistent workflows when provided with instructions that are semantically identical but differently phrased. This brittleness severely undermines their reliability and trustworthiness for real-world applications. To quantitatively diagnose this instability, we propose metrics based on nodal and topological similarity to evaluate workflow consistency against common semantic variations such as paraphrasing and noise injection. Subsequently, we further propose a novel training framework, RobustFlow, that leverages preference optimization to teach models invariance to instruction variations. By training on sets of synonymous task descriptions, RobustFlow boosts workflow robustness scores to 70\% - 90\%, which is a substantial improvement over existing approaches. The code is publicly available at https://github.com/DEFENSE-SEU/RobustFlow.
Preference-Guided Learning for Sparse-Reward Multi-Agent Reinforcement Learning
We study the problem of online multi-agent reinforcement learning (MARL) in environments with sparse rewards, where reward feedback is not provided at each interaction but only revealed at the end of a trajectory. This setting, though realistic, presents a fundamental challenge: the lack of intermediate rewards hinders standard MARL algorithms from effectively guiding policy learning. To address this issue, we propose a novel framework that integrates online inverse preference learning with multi-agent on-policy optimization into a unified architecture. At its core, our approach introduces an implicit multi-agent reward learning model, built upon a preference-based value-decomposition network, which produces both global and local reward signals. These signals are further used to construct dual advantage streams, enabling differentiated learning targets for the centralized critic and decentralized actors. In addition, we demonstrate how large language models (LLMs) can be leveraged to provide preference labels that enhance the quality of the learned reward model. Empirical evaluations on state-of-the-art benchmarks, including MAMuJoCo and SMACv2, show that our method achieves superior performance compared to existing baselines, highlighting its effectiveness in addressing sparse-reward challenges in online MARL.
Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow
Multi-Agent System (MAS) powered by Visual Language Models (VLMs) enables challenging tasks but suffers from a novel failure term, multi-agent visual hallucination snowballing, where hallucinations are seeded in a single agent and amplified by following ones due to the over-reliance on textual flow to relay visual information. Through turn-, layer-, and token-wise attention analyses, we provide detailed insights into the essence of hallucination snowballing regarding the reduction of visual attention allocation. It leads us to identify a subset of vision tokens with a unimodal attention peak in middle layers that best preserve visual evidence but gradually diminish in deeper agent turns, resulting in the visual hallucination snowballing in MAS. Thus, we propose ViF, a lightweight, plug-and-play mitigation paradigm that relays inter-agent messages with Visual Flow powered by the selected visual relay tokens and applies attention reallocation to amplify this pattern. The experiment results demonstrate that our method markedly reduces hallucination snowballing, consistently improving the performance across eight benchmarks based on four common MAS structures and ten base models. The source code will be available at: https://github.com/YU-deep/ViF.git.
AMANDA: Agentic Medical Knowledge Augmentation for Data-Efficient Medical Visual Question Answering EMNLP
Medical Multimodal Large Language Models (Med-MLLMs) have shown great promise in medical visual question answering (Med-VQA). However, when deployed in low-resource settings where abundant labeled data are unavailable, existing Med-MLLMs commonly fail due to their medical reasoning capability bottlenecks: (i) the intrinsic reasoning bottleneck that ignores the details from the medical image; (ii) the extrinsic reasoning bottleneck that fails to incorporate specialized medical knowledge. To address those limitations, we propose AMANDA, a training-free agentic framework that performs medical knowledge augmentation via LLM agents. Specifically, our intrinsic medical knowledge augmentation focuses on coarse-to-fine question decomposition for comprehensive diagnosis, while extrinsic medical knowledge augmentation grounds the reasoning process via biomedical knowledge graph retrieval. Extensive experiments across eight Med-VQA benchmarks demonstrate substantial improvements in both zero-shot and few-shot Med-VQA settings. The code is available at https://github.com/REAL-Lab-NU/AMANDA.
comment: EMNLP Findings
Triple-BERT: Do We Really Need MARL for Order Dispatch on Ride-Sharing Platforms?
On-demand ride-sharing platforms, such as Uber and Lyft, face the intricate real-time challenge of bundling and matching passengers-each with distinct origins and destinations-to available vehicles, all while navigating significant system uncertainties. Due to the extensive observation space arising from the large number of drivers and orders, order dispatching, though fundamentally a centralized task, is often addressed using Multi-Agent Reinforcement Learning (MARL). However, independent MARL methods fail to capture global information and exhibit poor cooperation among workers, while Centralized Training Decentralized Execution (CTDE) MARL methods suffer from the curse of dimensionality. To overcome these challenges, we propose Triple-BERT, a centralized Single Agent Reinforcement Learning (MARL) method designed specifically for large-scale order dispatching on ride-sharing platforms. Built on a variant TD3, our approach addresses the vast action space through an action decomposition strategy that breaks down the joint action probability into individual driver action probabilities. To handle the extensive observation space, we introduce a novel BERT-based network, where parameter reuse mitigates parameter growth as the number of drivers and orders increases, and the attention mechanism effectively captures the complex relationships among the large pool of driver and orders. We validate our method using a real-world ride-hailing dataset from Manhattan. Triple-BERT achieves approximately an 11.95% improvement over current state-of-the-art methods, with a 4.26% increase in served orders and a 22.25% reduction in pickup times. Our code, trained model parameters, and processed data are publicly available at the repository https://github.com/RS2002/Triple-BERT .
Constructive Conflict-Driven Multi-Agent Reinforcement Learning for Strategic Diversity IJCAI 2025
In recent years, diversity has emerged as a useful mechanism to enhance the efficiency of multi-agent reinforcement learning (MARL). However, existing methods predominantly focus on designing policies based on individual agent characteristics, often neglecting the interplay and mutual influence among agents during policy formation. To address this gap, we propose Competitive Diversity through Constructive Conflict (CoDiCon), a novel approach that incorporates competitive incentives into cooperative scenarios to encourage policy exchange and foster strategic diversity among agents. Drawing inspiration from sociological research, which highlights the benefits of moderate competition and constructive conflict in group decision-making, we design an intrinsic reward mechanism using ranking features to introduce competitive motivations. A centralized intrinsic reward module generates and distributes varying reward values to agents, ensuring an effective balance between competition and cooperation. By optimizing the parameterized centralized reward module to maximize environmental rewards, we reformulate the constrained bilevel optimization problem to align with the original task objectives. We evaluate our algorithm against state-of-the-art methods in the SMAC and GRF environments. Experimental results demonstrate that CoDiCon achieves superior performance, with competitive intrinsic rewards effectively promoting diverse and adaptive strategies among cooperative agents.
comment: Accepted by IJCAI 2025
Position: Simulating Society Requires Simulating Thought NeurIPS 2025
Simulating society with large language models (LLMs), we argue, requires more than generating plausible behavior; it demands cognitively grounded reasoning that is structured, revisable, and traceable. LLM-based agents are increasingly used to emulate individual and group behavior, primarily through prompting and supervised fine-tuning. Yet they often lack internal coherence, causal reasoning, and belief traceability, making them unreliable for simulating how people reason, deliberate, and respond to interventions. To address this, we present a conceptual modeling paradigm, Generative Minds (GenMinds), which draws from cognitive science to support structured belief representations in generative agents. To evaluate such agents, we introduce the RECAP (REconstructing CAusal Paths) framework, a benchmark designed to assess reasoning fidelity via causal traceability, demographic grounding, and intervention consistency. These contributions advance a broader shift: from surface-level mimicry to generative agents that simulate thought -- not just language -- for social simulations.
comment: To appear in NeurIPS 2025 (Position Paper Track)
Reducing Leximin Fairness to Utilitarian Optimization AAAI-2025
Two prominent objectives in social choice are utilitarian - maximizing the sum of agents' utilities, and leximin - maximizing the smallest agent's utility, then the second-smallest, etc. Utilitarianism is typically computationally easier to attain but is generally viewed as less fair. This paper presents a general reduction scheme that, given a utilitarian solver, produces a distribution over states (deterministic outcomes) that is leximin in expectation. Importantly, the scheme is robust in the sense that, given an approximate utilitarian solver, it produces a lottery that is approximately-leximin (in expectation) - with the same approximation factor. We apply our scheme to several social choice problems: stochastic allocations of indivisible goods, giveaway lotteries, and fair lotteries for participatory budgeting.
comment: Accepted to AAAI-2025
Decentralized Aerial Manipulation of a Cable-Suspended Load using Multi-Agent Reinforcement Learning
This paper presents the first decentralized method to enable real-world 6-DoF manipulation of a cable-suspended load using a team of Micro-Aerial Vehicles (MAVs). Our method leverages multi-agent reinforcement learning (MARL) to train an outer-loop control policy for each MAV. Unlike state-of-the-art controllers that utilize a centralized scheme, our policy does not require global states, inter-MAV communications, nor neighboring MAV information. Instead, agents communicate implicitly through load pose observations alone, which enables high scalability and flexibility. It also significantly reduces computing costs during inference time, enabling onboard deployment of the policy. In addition, we introduce a new action space design for the MAVs using linear acceleration and body rates. This choice, combined with a robust low-level controller, enables reliable sim-to-real transfer despite significant uncertainties caused by cable tension during dynamic 3D motion. We validate our method in various real-world experiments, including full-pose control under load model uncertainties, showing setpoint tracking performance comparable to the state-of-the-art centralized method. We also demonstrate cooperation amongst agents with heterogeneous control policies, and robustness to the complete in-flight loss of one MAV. Videos of experiments: https://autonomousrobots.nl/paper_websites/aerial-manipulation-marl
Neural Orchestration for Multi-Agent Systems: A Deep Learning Framework for Optimal Agent Selection in Multi-Domain Task Environments
Multi-agent systems (MAS) are foundational in simulating complex real-world scenarios involving autonomous, interacting entities. However, traditional MAS architectures often suffer from rigid coordination mechanisms and difficulty adapting to dynamic tasks. We propose MetaOrch, a neural orchestration framework for optimal agent selection in multi-domain task environments. Our system implements a supervised learning approach that models task context, agent histories, and expected response quality to select the most appropriate agent for each task. A novel fuzzy evaluation module scores agent responses along completeness, relevance, and confidence dimensions, generating soft supervision labels for training the orchestrator. Unlike previous methods that hard-code agent-task mappings, MetaOrch dynamically predicts the most suitable agent while estimating selection confidence. Experiments in simulated environments with heterogeneous agents demonstrate that our approach achieves 86.3% selection accuracy, significantly outperforming baseline strategies including random selection and round-robin scheduling. The modular architecture emphasizes extensibility, allowing agents to be registered, updated, and queried independently. Results suggest that neural orchestration offers a powerful approach to enhancing the autonomy, interpretability, and adaptability of multi-agent systems across diverse task domains.
comment: Accepted for Publication at PReMI 2025 - 11th International Conference on Pattern Recognition and Machine Intelligence
Towards Agentic OS: An LLM Agent Framework for Linux Schedulers
Operating system schedulers suffer from a fundamental semantic gap, where kernel policies fail to understand application-specific needs, leading to suboptimal performance. We introduce SchedCP, the first framework that enables fully autonomous Large Language Model (LLM) agents to safely and efficiently optimize Linux schedulers without human involvement. Our core insight is that the challenge is not merely to apply a better LLM, but to architect a decoupled control plane that separates the AI's role of semantic reasoning ("what to optimize") from the system's role of execution ("how to observe and act"). Implemented as Model Context Protocol(MCP) server, SchedCP provides a stable interface with three key services: a Workload Analysis Engine, an evolving Scheduler Policy Repository, and an Execution Verifier that validates all AI-generated code and configure before deployment with static and dynamic analysis. We demonstrate this architecture's power with sched-agent, a multi-agent system that autonomously analyzes workloads, synthesizes custom eBPF scheduling policies, and deploys them via the sched\_ext infrastructure. Our evaluation shows that SchedCP achieves up to an 1.79x performance improvement, and a 13x cost reduction compared to naive agentic approaches, all while maintaining high success rate. By bridging the semantic gap, SchedCP democratizes expert-level system optimization and represents a step towards creating truly self-optimizing, application-aware operating systems. The code is open-sourced in https://github.com/eunomia-bpf/schedcp
From Grunts to Lexicons: Emergent Language from Cooperative Foraging
Language is a powerful communicative and cognitive tool. It enables humans to express thoughts, share intentions, and reason about complex phenomena. Despite our fluency in using and understanding language, the question of how it arises and evolves over time remains unsolved. A leading hypothesis in linguistics and anthropology posits that language evolved to meet the ecological and social demands of early human cooperation. Language did not arise in isolation, but through shared survival goals. Inspired by this view, we investigate the emergence of language in multi-agent Foraging Games. These environments are designed to reflect the cognitive and ecological constraints believed to have influenced the evolution of communication. Agents operate in a shared grid world with only partial knowledge about other agents and the environment, and must coordinate to complete games like picking up high-value targets or executing temporally ordered actions. Using end-to-end deep reinforcement learning, agents learn both actions and communication strategies from scratch. We find that agents develop communication protocols with hallmark features of natural language: arbitrariness, interchangeability, displacement, cultural transmission, and compositionality. We quantify each property and analyze how different factors, such as population size, social dynamics, and temporal dependencies, shape specific aspects of the emergent language. Our framework serves as a platform for studying how language can evolve from partial observability, temporal reasoning, and cooperative goals in embodied multi-agent settings. We will release all data, code, and models publicly.
Beyond Shallow Behavior: Task-Efficient Value-Based Multi-Task Offline MARL via Skill Discovery
As a data-driven approach, offline MARL learns superior policies solely from offline datasets, ideal for domains rich in historical data but with high interaction costs and risks. However, most existing methods are task-specific, requiring retraining for new tasks, leading to redundancy and inefficiency. To address this issue, we propose a task-efficient value-based multi-task offline MARL algorithm, Skill-Discovery Conservative Q-Learning (SD-CQL). Unlike existing methods decoding actions from skills via behavior cloning, SD-CQL discovers skills in a latent space by reconstructing the next observation, evaluates fixed and variable actions separately, and uses conservative Q-learning with local value calibration to select the optimal action for each skill. It eliminates the need for local-global alignment and enables strong multi-task generalization from limited, small-scale source tasks. Substantial experiments on StarCraft II demonstrate the superior generalization performance and task-efficiency of SD-CQL. It achieves the best performance on $\textbf{13}$ out of $14$ task sets, with up to $\textbf{68.9%}$ improvement on individual task sets.
Systems and Control (CS)
A Preliminary Assessment of Shipboard Power System Architectures for LVDC Integration
The adoption of low-voltage direct current sections within grid architectures is emerging as a promising design option in the naval sector. This paper presents a preliminary comparative assessment of three different grid topologies, using an existing MVAC-LVAC shipboard power system as a reference: a conventional MVAC-LVAC radial distribution with an additional LVDC section, a full LVDC radial distribution and a zonal LVDC distribution. Each architecture includes typical elements such as synchronous generators, propulsion motors, energy storage system units, extra propulsive loads, and pulse power loads. The analysis exploits five key performance indicators: weight, volume, technology readiness level, average system interruption duration index, and pulsed power loads interruption index.
comment: Presented at the 10th IEEE Workshop on the Electronic Grid (eGrid 2025)
Safe-by-Design: Approximate Nonlinear Model Predictive Control with Real Time Feasibility
This paper establishes relationships between continuous-time, receding horizon, nonlinear model predictive control (MPC) and control Lyapunov and control barrier functions (CLF/CBF). We show that, if the cost function "behaves well" for points in the terminal set, then the optimal value function and the feasible set, respectively, define a compatible CLF/CBF pair on the MPC's region of attraction. We then proceed to prove that any approximation of the value function and the feasible set also define a CLF/CBF pair, as long as those approximations satisfy the same "well behavedness" condition; and that a feasible state feedback can be computed by solving an infinitesimal version of the MPC problem. This methodology permits the formulation of continuous-time small-sized quadratic programs for feedback and enables approximate solutions of the nonlinear model predictive controller with theoretical safety and convergence guarantee. Finally, we demonstrate the effectiveness of the proposed approach when compared to other constrained control techniques through numerical experiments for nonlinear constrained spacecraft control.
comment: This work has been submitted to the IEEE Transactions on Automatic Control for possible publication
Effect of Gait Design on Proprioceptive Sensing of Terrain Properties in a Quadrupedal Robot ICRA
In-situ robotic exploration is an important tool for advancing knowledge of geological processes that describe the Earth and other Planetary bodies. To inform and enhance operations for these roving laboratories, it is imperative to understand the terramechanical properties of their environments, especially for traversing on loose, deformable substrates. Recent research suggested that legged robots with direct-drive and low-gear ratio actuators can sensitively detect external forces, and therefore possess the potential to measure terrain properties with their legs during locomotion, providing unprecedented sampling speed and density while accessing terrains previously too risky to sample. This paper explores these ideas by investigating the impact of gait on proprioceptive terrain sensing accuracy, particularly comparing a sensing-oriented gait, Crawl N' Sense, with a locomotion-oriented gait, Trot-Walk. Each gait's ability to measure the strength and texture of deformable substrate is quantified as the robot locomotes over a laboratory transect consisting of a rigid surface, loose sand, and loose sand with synthetic surface crusts. Our results suggest that with both the sensing-oriented crawling gait and locomotion-oriented trot gait, the robot can measure a consistent difference in the strength (in terms of penetration resistance) between the low- and high-resistance substrates; however, the locomotion-oriented trot gait contains larger magnitude and variance in measurements. Furthermore, the slower crawl gait can detect brittle ruptures of the surface crusts with significantly higher accuracy than the faster trot gait. Our results offer new insights that inform legged robot "sensing during locomotion" gait design and planning for scouting the terrain and producing scientific measurements on other worlds to advance our understanding of their geology and formation.
comment: 7+1 pages, 5 figures, ICRA Submission This work has been submitted to the IEEE for possible publication
A Parallel Ultra-Low Power Silent Speech Interface based on a Wearable, Fully-dry EMG Neckband
We present a wearable, fully-dry, and ultra-low power EMG system for silent speech recognition, integrated into a textile neckband to enable comfortable, non-intrusive use. The system features 14 fully-differential EMG channels and is based on the BioGAP-Ultra platform for ultra-low power (22 mW) biosignal acquisition and wireless transmission. We evaluate its performance on eight speech commands under both vocalized and silent articulation, achieving average classification accuracies of 87$\pm$3% and 68$\pm$3% respectively, with a 5-fold CV approach. To mimic everyday-life conditions, we introduce session-to-session variability by repositioning the neckband between sessions, achieving leave-one-session-out accuracies of 64$\pm$18% and 54$\pm$7% for the vocalized and silent experiments, respectively. These results highlight the robustness of the proposed approach and the promise of energy-efficient silent-speech decoding.
comment: 4 pages, 4 figures
Distributed Time-Varying Optimization via Unbiased Extremum Seeking
This paper proposes a novel distributed optimization framework that addresses time-varying optimization problems without requiring explicit derivative information of the objective functions. Traditional distributed methods often rely on derivative computations, limiting their applicability when only real-time objective function measurements are available. Leveraging unbiased extremum seeking, we develop continuous-time algorithms that utilize local measurements and neighbor-shared data to collaboratively track time-varying optima. Key advancements include compatibility with directed communication graphs, customizable convergence rates (asymptotic, exponential, or prescribed-time), and the ability to handle dynamically evolving objectives. By integrating chirpy probing signals with time-varying frequencies, our unified framework achieves accelerated convergence while maintaining stability under mild assumptions. Theoretical guarantees are established through Lie bracket averaging and Lyapunov-based analysis, with linear matrix inequality conditions ensuring rigorous convergence. Numerical simulations validate the effectiveness of the algorithms.
comment: 13pages, extended version
Optimized Control of Duplex Networks
Many real-world complex systems can be modeled as multiplex networks, where each layer represents a distinct set of interactions among the same entities. Controlling such systems-steering them toward desired states using external inputs-is crucial across many domains. However, existing network control theory largely focuses on single-layer networks, and applying separate controls to each layer of a multiplex system often leads to redundant sets of driver nodes, increasing cost and complexity. To address this challenge, we formulate the Universal Minimum Union Driver Set (MinUDS) problem for duplex networks. The goal is to find the smallest set of driver nodes that can simultaneously control both layers. We propose a novel algorithm, Shortest Cross-Layer Augmenting Path Search (CLAP-S). This method introduces the concept of a Cross-Layer Augmenting Path (CLAP) and efficiently explores the combinatorial space of control configurations. CLAP-S iteratively realigns each layer's Minimum Driver Set (MDS) to maximize their overlap. We prove the algorithm's global optimality and demonstrate its efficiency on both synthetic networks and real-world multiplex systems. The results show that CLAP-S consistently outperforms baseline approaches by significantly reducing the number of required driver nodes and cutting computational time by an order of magnitude. This work provides a powerful, general-purpose tool for optimizing control strategies in multi-layer networks, enabling more economical interventions in diverse fields.
Reinforcement Learning Based Traffic Signal Design to Minimize Queue Lengths
Efficient traffic signal control (TSC) is crucial for reducing congestion, travel delays, pollution, and for ensuring road safety. Traditional approaches, such as fixed signal control and actuated control, often struggle to handle dynamic traffic patterns. In this study, we propose a novel adaptive TSC framework that leverages Reinforcement Learning (RL), using the Proximal Policy Optimization (PPO) algorithm, to minimize total queue lengths across all signal phases. The challenge of efficiently representing highly stochastic traffic conditions for an RL controller is addressed through multiple state representations, including an expanded state space, an autoencoder representation, and a K-Planes-inspired representation. The proposed algorithm has been implemented using the Simulation of Urban Mobility (SUMO) traffic simulator and demonstrates superior performance over both traditional methods and other conventional RL-based approaches in reducing queue lengths. The best performing configuration achieves an approximately 29% reduction in average queue lengths compared to the traditional Webster method. Furthermore, comparative evaluation of alternative reward formulations demonstrates the effectiveness of the proposed queue-based approach, showcasing the potential for scalable and adaptive urban traffic management.
On Suboptimal Safety-Critical Tracking Controller Design
This paper proposes a novel framework for safety-critical optimal trajectory tracking in nonlinear systems based on the state-dependent Riccati equation (SDRE) methodology. By embedding barrier states into the system dynamics, the proposed strategy simultaneously ensures safety and tracking requirements, even in scenarios where these objectives may be inherently conflicting. A discounted pseudo-quadratic cost function is formulated to achieve a suboptimal trade-off between tracking accuracy, control effort, and safety objective. We present two distinct controller designs: one utilizing a single barrier state to enforce overall safety constraints, and another employing multiple barrier states to individually tuning the system's conservatism with respect to each safety constraint, providing enhanced flexibility in tuning the system's conservatism toward individual constraints. We establish sufficient conditions to ensure the solvability of the associated Riccati equations. The proposed safe controller is well-suited for real-time implementation in practical systems, given its reasonable computational requirements and compatibility with widely available embedded microprocessors. This is supported by simulation studies involving a mechanical system and a mobile robot collision avoidance scenario, where the safe SDRE controller consistently maintained safety while achieving trajectory tracking objectives in challenging conditions. Additionally, experimental results on a cable-driven parallel robot further demonstrate the practical applicability and effectiveness of the proposed method in real-world control tasks.
comment: 14 pages, 7 figures. This work has been submitted to Automatica for possible publication
Physically Plausible Multi-System Trajectory Generation and Symmetry Discovery
From metronomes to celestial bodies, mechanics underpins how the world evolves in time and space. With consideration of this, a number of recent neural network models leverage inductive biases from classical mechanics to encourage model interpretability and ensure forecasted states are physical. However, in general, these models are designed to capture the dynamics of a single system with fixed physical parameters, from state-space measurements of a known configuration space. In this paper we introduce Symplectic Phase Space GAN (SPS-GAN) which can capture the dynamics of multiple systems, and generalize to unseen physical parameters from. Moreover, SPS-GAN does not require prior knowledge of the system configuration space. In fact, SPS-GAN can discover the configuration space structure of the system from arbitrary measurement types (e.g., state-space measurements, video frames). To achieve physically plausible generation, we introduce a novel architecture which embeds a Hamiltonian neural network recurrent module in a conditional GAN backbone. To discover the structure of the configuration space, we optimize the conditional time-series GAN objective with an additional physically motivated term to encourages a sparse representation of the configuration space. We demonstrate the utility of SPS-GAN for trajectory prediction, video generation and symmetry discovery. Our approach captures multiple systems and achieves performance on par with supervised models designed for single systems.
Safe Task Space Synchronization with Time-Delayed Information
In this paper, an adaptive controller is designed for the synchronization of the trajectory of a robot with unknown kinematics and dynamics to that of the current human trajectory in the task space using the delayed human trajectory information. The communication time delay may be a result of various factors that arise in human-robot collaboration tasks, such as sensor processing or fusion to estimate trajectory/intent, network delays, or computational limitations. The developed adaptive controller uses Barrier Lyapunov Function (BLF) to constrain the Cartesian coordinates of the robot to ensure safety, an ICL-based adaptive law to account for the unknown kinematics, and a gradient-based adaptive law to estimate unknown dynamics. Barrier Lyapunov-Krasovskii (LK) functionals are used for the stability analysis to show that the synchronization and parameter estimation errors remain semi-globally uniformly ultimately bounded (SGUUB). The simulation results based on a human-robot synchronization scenario with time delay are provided to demonstrate the effectiveness of the designed synchronization controller with safety constraints.
Hierarchical Control Design for Space Robots with Application to In-Orbit Servicing Missions
In-Orbit Servicing and Active Debris Removal require advanced robotic capabilities for capturing and detumbling uncooperative targets. This work presents a hierarchical control framework for autonomous robotic capture of tumbling objects in space. A simulation environment is developed, incorporating sloshing dynamics of the chaser, a rarely studied effect in space robotics. The proposed controller combines an inner Lyapunov-based robust control loop for multi-body dynamics with an outer loop addressing an extended inverse kinematics problem. Simulation results show improved robustness and adaptability compared to existing control schemes.
Stochastic Security Constrained AC Optimal Power Flow Using General Polynomial Chaos Expansion
Addressing the uncertainty introduced by increasing renewable integration is crucial for secure power system operation, yet capturing it while preserving the full nonlinear physics of the grid remains a significant challenge. This paper presents a stochastic security constrained optimal power flow model with chance constraints supporting nonlinear AC power flow equations and non Gaussian uncertainties. We use general polynomial chaos expansion to model arbitrary uncertainties of finite variance, enabling accurate moment computations and robust prediction of system states across diverse operating scenarios. The chance constraints probabilistically limit inequality violations, providing a more flexible representation of controllable variables and the consequent power system operation. Case studies validate the proposed models effectiveness in satisfying operational constraints and capturing uncertainty with high fidelity. Compared to the deterministic formulation, it also uncovers a wider set of unsecure contingencies, highlighting improved uncertainty capture and operational insight.
comment: 2025 IEEE International Conference on Energy Technologies for Future Grids (6 pages)
Generative Modeling and Decision Fusion for Unknown Event Detection and Classification Using Synchrophasor Data
Reliable detection and classification of power system events are critical for maintaining grid stability and situational awareness. Existing approaches often depend on limited labeled datasets, which restricts their ability to generalize to rare or unseen disturbances. This paper proposes a novel framework that integrates generative modeling, sliding-window temporal processing, and decision fusion to achieve robust event detection and classification using synchrophasor data. A variational autoencoder-generative adversarial network is employed to model normal operating conditions, where both reconstruction error and discriminator error are extracted as anomaly indicators. Two complementary decision strategies are developed: a threshold-based rule for computational efficiency and a convex hull-based method for robustness under complex error distributions. These features are organized into spatiotemporal detection and classification matrices through a sliding-window mechanism, and an identification and decision fusion stage integrates the outputs across PMUs. This design enables the framework to identify known events while systematically classifying previously unseen disturbances into a new category, addressing a key limitation of supervised classifiers. Experimental results demonstrate state-of-the-art accuracy, surpassing machine learning, deep learning, and envelope-based baselines. The ability to recognize unknown events further highlights the adaptability and practical value of the proposed approach for wide-area event analysis in modern power systems.
comment: 10 pages
Red Teaming Quantum-Resistant Cryptographic Standards: A Penetration Testing Framework Integrating AI and Quantum Security
This study presents a structured approach to evaluating vulnerabilities within quantum cryptographic protocols, focusing on the BB84 quantum key distribution method and National Institute of Standards and Technology (NIST) approved quantum-resistant algorithms. By integrating AI-driven red teaming, automated penetration testing, and real-time anomaly detection, the research develops a framework for assessing and mitigating security risks in quantum networks. The findings demonstrate that AI can be effectively used to simulate adversarial attacks, probe weaknesses in cryptographic implementations, and refine security mechanisms through iterative feedback. The use of automated exploit simulations and protocol fuzzing provides a scalable means of identifying latent vulnerabilities, while adversarial machine learning techniques highlight novel attack surfaces within AI-enhanced cryptographic processes. This study offers a comprehensive methodology for strengthening quantum security and provides a foundation for integrating AI-driven cybersecurity practices into the evolving quantum landscape.
Finite Sample Analyses for Continuous-time Linear Systems: System Identification and Online Control
Real world evolves in continuous time but computations are done from finite samples. Therefore, we study algorithms using finite observations in continuous-time linear dynamical systems. We first study the system identification problem, and propose a first non-asymptotic error analysis with finite observations. Our algorithm identifies system parameters without needing integrated observations over certain time intervals, making it more practical for real-world applications. Further we propose a lower bound result that shows our estimator is provably optimal up to constant factors. Moreover, we apply the above algorithm to online control regret analysis for continuous-time linear system. Our system identification method allows us to explore more efficiently, enabling the swift detection of ineffective policies. We achieve a regret of $\mathcal{O}(\sqrt{T})$ over a single $T$-time horizon in a controllable system, requiring only $\mathcal{O}(T)$ observations of the system.
Neo-Grounded Theory: A Methodological Innovation Integrating High-Dimensional Vector Clustering and Multi-Agent Collaboration for Qualitative Research
Purpose: Neo Grounded Theory (NGT) integrates vector clustering with multi agent systems to resolve qualitative research's scale depth paradox, enabling analysis of massive datasets in hours while preserving interpretive rigor. Methods: We compared NGT against manual coding and ChatGPT-assisted analysis using 40,000 character Chinese interview transcripts. NGT employs 1536-dimensional embeddings, hierarchical clustering, and parallel agent-based coding. Two experiments tested pure automation versus human guided refinement. Findings: NGT achieved 168-fold speed improvement (3 hours vs 3 weeks), superior quality (0.904 vs 0.883), and 96% cost reduction. Human AI collaboration proved essential: automation alone produced abstract frameworks while human guidance yielded actionable dual pathway theories. The system discovered patterns invisible to manual coding, including identity bifurcation phenomena. Contributions: NGT demonstrates computational objectivity and human interpretation are complementary. Vector representations provide reproducible semantic measurement while preserving meaning's interpretive dimensions. Researchers shift from mechanical coding to theoretical guidance, with AI handling pattern recognition while humans provide creative insight. Implications: Cost reduction from \$50,000 to \$500 democratizes qualitative research, enabling communities to study themselves. Real-time analysis makes qualitative insights contemporaneous with events. The framework shows computational methods can strengthen rather than compromise qualitative research's humanistic commitments. Keywords: Grounded theory; Vector embeddings; Multi agent systems; Human AI collaboration; Computational qualitative analysis
comment: 44 pages, 11 figures
The need for and feasibility of alternative ground robots to traverse sandy and rocky extraterrestrial terrain
Robotic spacecraft have helped expand our reach for many planetary exploration missions. Most ground mobile planetary exploration robots use wheeled or modified wheeled platforms. Although extraordinarily successful at completing intended mission goals, because of the limitations of wheeled locomotion, they have been largely limited to benign, solid terrain and avoided extreme terrain with loose soil/sand and large rocks. Unfortunately, such challenging terrain is often scientifically interesting for planetary geology. Although many animals traverse such terrain at ease, robots have not matched their performance and robustness. This is in major part due to a lack of fundamental understanding of how effective locomotion can be generated from controlled interaction with complex terrain on the same level of flight aerodynamics and underwater vehicle hydrodynamics. Early fundamental understanding of legged and limbless locomotor-ground interaction has already enabled stable and efficient bio-inspired robot locomotion on relatively flat ground with small obstacles. Recent progress in the new field of terradynamics of locomotor-terrain interaction begins to reveal the principles of bio-inspired locomotion on loose soil/sand and over large obstacles. Multi-legged and limbless platforms using terradynamics insights hold the promise for serving as robust alternative platforms for traversing extreme extraterrestrial terrain and expanding our reach in planetary exploration.
Gaussian-Process-based Adaptive Tracking Control with Dynamic Active Learning for Autonomous Ground Vehicles
This article proposes an active-learning-based adaptive trajectory tracking control method for autonomous ground vehicles to compensate for modeling errors and unmodeled dynamics. The nominal vehicle model is decoupled into lateral and longitudinal subsystems, which are augmented with online Gaussian Processes (GPs), using measurement data. The estimated mean functions of the GPs are used to construct a feedback compensator, which, together with an LPV state feedback controller designed for the nominal system, gives the adaptive control structure. To assist exploration of the dynamics, the paper proposes a new, dynamic active learning method to collect the most informative samples to accelerate the training process. To analyze the performance of the overall learning tool-chain provided controller, a novel iterative, counterexample-based algorithm is proposed for calculating the induced L2 gain between the reference trajectory and the tracking error. The analysis can be executed for a set of possible realizations of the to-be-controlled system, giving robust performance certificate of the learning method under variation of the vehicle dynamics. The efficiency of the proposed control approach is shown on a high-fidelity physics simulator and in real experiments using a 1/10 scale F1TENTH electric car.
comment: Submitted to IEEE Transactions on Control Systems Technology (revised)
Divide, Conquer and Verify: Improving Symbolic Execution Performance
Symbolic Execution is a formal method that can be used to verify the behavior of computer programs and detect software vulnerabilities. Compared to other testing methods such as fuzzing, Symbolic Execution has the advantage of providing formal guarantees about the program. However, despite advances in performance in recent years, Symbolic Execution is too slow to be applied to real-world software. This is primarily caused by the \emph{path explosion problem} as well as by the computational complexity of SMT solving. In this paper, we present a divide-and-conquer approach for symbolic execution by executing individual slices and later combining the side effects. This way, the overall problem size is kept small, reducing the impact of computational complexity on large problems.
An approach to the LQG/LTR design problem with specifications for finite-dimensional SISO control systems
This is an expository paper which discusses an approach to the LQG/LTR design problem for finite-dimensional SISO control systems. The approach is based on the utilisation of weighting augmentation for incorporating design specifications into the framework of the LTR technique for LQG compensator design. The LQG compensator is to simultaneously meet given analytical low- and high-frequency design specifications expressed in terms of desirable sensitivity and controller noise sensitivity functions. The paper is aimed at nonspecialists and, in particular, practitioners in finite-dimensional LQG theory interested in the design of feedback compensators for closed-loop performance and robustness shaping of SISO control systems in realistic situations. The proposed approach is illustrated by a detailed numerical example: the torque control of a geared DC motor with an elastically mounted output shaft.
comment: typos corrected
Optimal Behavior Planning for Implicit Communication using a Probabilistic Vehicle-Pedestrian Interaction Model
In interactions between automated vehicles (AVs) and crossing pedestrians, modeling implicit vehicle communication is crucial. In this work, we present a combined prediction and planning approach that allows to consider the influence of the planned vehicle behavior on a pedestrian and predict a pedestrian's reaction. We plan the behavior by solving two consecutive optimal control problems (OCPs) analytically, using variational calculus. We perform a validation step that assesses whether the planned vehicle behavior is adequate to trigger a certain pedestrian reaction, which accounts for the closed-loop characteristics of prediction and planning influencing each other. In this step, we model the influence of the planned vehicle behavior on the pedestrian using a probabilistic behavior acceptance model that returns an estimate for the crossing probability. The probabilistic modeling of the pedestrian reaction facilitates considering the pedestrian's costs, thereby improving cooperative behavior planning. We demonstrate the performance of the proposed approach in simulated vehicle-pedestrian interactions with varying initial settings and highlight the decision making capabilities of the planning approach.
comment: 8 pages, 5 figures, conference article
Constraint Hypergraphs as a Unifying Framework for Digital Twins
Digital twins, used to represent physical systems, have been lauded as tools for understanding reality. Complex system behavior is typically captured in domain-specific models crafted by subject experts. Contemporary methods for employing models in a digital twin require prescriptive interfaces, resulting in twins that are difficult to connect, redeploy, and modify. The limited interoperability of these twins has prompted calls for a universal framework enabling observability across model aggregations. Here we show how a new mathematical formalism called a constraint hypergraph addresses these challenges in interoperability. A digital twin is shown to be the second of two coupled systems where both adhere to the same constraint hypergraph, permitting the properties of the first to be observable from the second. Interoperability is given by deconstructing models into a structure enabling autonomous, white-box simulation of system properties. The resulting digital twins can interact immediately with both human and autonomous agents. This is demonstrated in a case study of a microgrid, showing how both measured and simulated data from the aggregated twins can be provided regardless of the operating environment. By connecting models, constraint hypergraphs supply scientists and modelers robust means to capture, communicate, and combine digital twins across all fields of study. We expect this framework to expand the use of digital twins, enriching scientific insights and collaborations by providing a structure for characterizing complex systems.
comment: 12 pages, 7 figures
Vibrational Stabilization of Complex Network Systems
Many natural and man-made network systems need to maintain certain patterns, such as working at equilibria or limit cycles, to function properly. Thus, the ability to stabilize such patterns is crucial. Most of the existing studies on stabilization assume that network systems states can be measured online so that feedback control strategies can be used. However, in many real-world scenarios, systems states, e.g., neuronal activity in the brain, are often difficult to measure. In this paper, we take this situation into account and study the stabilization problem of linear network systems with an open-loop control strategy (vibrational control). We derive a graph-theoretic sufficient condition for structural vibrational stabilizability, under which network systems can always be stabilized. We further provide an approach to select the locations in the network for control placement and design corresponding vibrational inputs to stabilize systems that satisfy this condition. Finally, we provide some numerical results that demonstrate the validity of our theoretical findings.
Systems and Control (EESS)
A Preliminary Assessment of Shipboard Power System Architectures for LVDC Integration
The adoption of low-voltage direct current sections within grid architectures is emerging as a promising design option in the naval sector. This paper presents a preliminary comparative assessment of three different grid topologies, using an existing MVAC-LVAC shipboard power system as a reference: a conventional MVAC-LVAC radial distribution with an additional LVDC section, a full LVDC radial distribution and a zonal LVDC distribution. Each architecture includes typical elements such as synchronous generators, propulsion motors, energy storage system units, extra propulsive loads, and pulse power loads. The analysis exploits five key performance indicators: weight, volume, technology readiness level, average system interruption duration index, and pulsed power loads interruption index.
comment: Presented at the 10th IEEE Workshop on the Electronic Grid (eGrid 2025)
Safe-by-Design: Approximate Nonlinear Model Predictive Control with Real Time Feasibility
This paper establishes relationships between continuous-time, receding horizon, nonlinear model predictive control (MPC) and control Lyapunov and control barrier functions (CLF/CBF). We show that, if the cost function "behaves well" for points in the terminal set, then the optimal value function and the feasible set, respectively, define a compatible CLF/CBF pair on the MPC's region of attraction. We then proceed to prove that any approximation of the value function and the feasible set also define a CLF/CBF pair, as long as those approximations satisfy the same "well behavedness" condition; and that a feasible state feedback can be computed by solving an infinitesimal version of the MPC problem. This methodology permits the formulation of continuous-time small-sized quadratic programs for feedback and enables approximate solutions of the nonlinear model predictive controller with theoretical safety and convergence guarantee. Finally, we demonstrate the effectiveness of the proposed approach when compared to other constrained control techniques through numerical experiments for nonlinear constrained spacecraft control.
comment: This work has been submitted to the IEEE Transactions on Automatic Control for possible publication
Effect of Gait Design on Proprioceptive Sensing of Terrain Properties in a Quadrupedal Robot ICRA
In-situ robotic exploration is an important tool for advancing knowledge of geological processes that describe the Earth and other Planetary bodies. To inform and enhance operations for these roving laboratories, it is imperative to understand the terramechanical properties of their environments, especially for traversing on loose, deformable substrates. Recent research suggested that legged robots with direct-drive and low-gear ratio actuators can sensitively detect external forces, and therefore possess the potential to measure terrain properties with their legs during locomotion, providing unprecedented sampling speed and density while accessing terrains previously too risky to sample. This paper explores these ideas by investigating the impact of gait on proprioceptive terrain sensing accuracy, particularly comparing a sensing-oriented gait, Crawl N' Sense, with a locomotion-oriented gait, Trot-Walk. Each gait's ability to measure the strength and texture of deformable substrate is quantified as the robot locomotes over a laboratory transect consisting of a rigid surface, loose sand, and loose sand with synthetic surface crusts. Our results suggest that with both the sensing-oriented crawling gait and locomotion-oriented trot gait, the robot can measure a consistent difference in the strength (in terms of penetration resistance) between the low- and high-resistance substrates; however, the locomotion-oriented trot gait contains larger magnitude and variance in measurements. Furthermore, the slower crawl gait can detect brittle ruptures of the surface crusts with significantly higher accuracy than the faster trot gait. Our results offer new insights that inform legged robot "sensing during locomotion" gait design and planning for scouting the terrain and producing scientific measurements on other worlds to advance our understanding of their geology and formation.
comment: 7+1 pages, 5 figures, ICRA Submission This work has been submitted to the IEEE for possible publication
A Parallel Ultra-Low Power Silent Speech Interface based on a Wearable, Fully-dry EMG Neckband
We present a wearable, fully-dry, and ultra-low power EMG system for silent speech recognition, integrated into a textile neckband to enable comfortable, non-intrusive use. The system features 14 fully-differential EMG channels and is based on the BioGAP-Ultra platform for ultra-low power (22 mW) biosignal acquisition and wireless transmission. We evaluate its performance on eight speech commands under both vocalized and silent articulation, achieving average classification accuracies of 87$\pm$3% and 68$\pm$3% respectively, with a 5-fold CV approach. To mimic everyday-life conditions, we introduce session-to-session variability by repositioning the neckband between sessions, achieving leave-one-session-out accuracies of 64$\pm$18% and 54$\pm$7% for the vocalized and silent experiments, respectively. These results highlight the robustness of the proposed approach and the promise of energy-efficient silent-speech decoding.
comment: 4 pages, 4 figures
Distributed Time-Varying Optimization via Unbiased Extremum Seeking
This paper proposes a novel distributed optimization framework that addresses time-varying optimization problems without requiring explicit derivative information of the objective functions. Traditional distributed methods often rely on derivative computations, limiting their applicability when only real-time objective function measurements are available. Leveraging unbiased extremum seeking, we develop continuous-time algorithms that utilize local measurements and neighbor-shared data to collaboratively track time-varying optima. Key advancements include compatibility with directed communication graphs, customizable convergence rates (asymptotic, exponential, or prescribed-time), and the ability to handle dynamically evolving objectives. By integrating chirpy probing signals with time-varying frequencies, our unified framework achieves accelerated convergence while maintaining stability under mild assumptions. Theoretical guarantees are established through Lie bracket averaging and Lyapunov-based analysis, with linear matrix inequality conditions ensuring rigorous convergence. Numerical simulations validate the effectiveness of the algorithms.
comment: 13pages, extended version
Optimized Control of Duplex Networks
Many real-world complex systems can be modeled as multiplex networks, where each layer represents a distinct set of interactions among the same entities. Controlling such systems-steering them toward desired states using external inputs-is crucial across many domains. However, existing network control theory largely focuses on single-layer networks, and applying separate controls to each layer of a multiplex system often leads to redundant sets of driver nodes, increasing cost and complexity. To address this challenge, we formulate the Universal Minimum Union Driver Set (MinUDS) problem for duplex networks. The goal is to find the smallest set of driver nodes that can simultaneously control both layers. We propose a novel algorithm, Shortest Cross-Layer Augmenting Path Search (CLAP-S). This method introduces the concept of a Cross-Layer Augmenting Path (CLAP) and efficiently explores the combinatorial space of control configurations. CLAP-S iteratively realigns each layer's Minimum Driver Set (MDS) to maximize their overlap. We prove the algorithm's global optimality and demonstrate its efficiency on both synthetic networks and real-world multiplex systems. The results show that CLAP-S consistently outperforms baseline approaches by significantly reducing the number of required driver nodes and cutting computational time by an order of magnitude. This work provides a powerful, general-purpose tool for optimizing control strategies in multi-layer networks, enabling more economical interventions in diverse fields.
Reinforcement Learning Based Traffic Signal Design to Minimize Queue Lengths
Efficient traffic signal control (TSC) is crucial for reducing congestion, travel delays, pollution, and for ensuring road safety. Traditional approaches, such as fixed signal control and actuated control, often struggle to handle dynamic traffic patterns. In this study, we propose a novel adaptive TSC framework that leverages Reinforcement Learning (RL), using the Proximal Policy Optimization (PPO) algorithm, to minimize total queue lengths across all signal phases. The challenge of efficiently representing highly stochastic traffic conditions for an RL controller is addressed through multiple state representations, including an expanded state space, an autoencoder representation, and a K-Planes-inspired representation. The proposed algorithm has been implemented using the Simulation of Urban Mobility (SUMO) traffic simulator and demonstrates superior performance over both traditional methods and other conventional RL-based approaches in reducing queue lengths. The best performing configuration achieves an approximately 29% reduction in average queue lengths compared to the traditional Webster method. Furthermore, comparative evaluation of alternative reward formulations demonstrates the effectiveness of the proposed queue-based approach, showcasing the potential for scalable and adaptive urban traffic management.
On Suboptimal Safety-Critical Tracking Controller Design
This paper proposes a novel framework for safety-critical optimal trajectory tracking in nonlinear systems based on the state-dependent Riccati equation (SDRE) methodology. By embedding barrier states into the system dynamics, the proposed strategy simultaneously ensures safety and tracking requirements, even in scenarios where these objectives may be inherently conflicting. A discounted pseudo-quadratic cost function is formulated to achieve a suboptimal trade-off between tracking accuracy, control effort, and safety objective. We present two distinct controller designs: one utilizing a single barrier state to enforce overall safety constraints, and another employing multiple barrier states to individually tuning the system's conservatism with respect to each safety constraint, providing enhanced flexibility in tuning the system's conservatism toward individual constraints. We establish sufficient conditions to ensure the solvability of the associated Riccati equations. The proposed safe controller is well-suited for real-time implementation in practical systems, given its reasonable computational requirements and compatibility with widely available embedded microprocessors. This is supported by simulation studies involving a mechanical system and a mobile robot collision avoidance scenario, where the safe SDRE controller consistently maintained safety while achieving trajectory tracking objectives in challenging conditions. Additionally, experimental results on a cable-driven parallel robot further demonstrate the practical applicability and effectiveness of the proposed method in real-world control tasks.
comment: 14 pages, 7 figures. This work has been submitted to Automatica for possible publication
Physically Plausible Multi-System Trajectory Generation and Symmetry Discovery
From metronomes to celestial bodies, mechanics underpins how the world evolves in time and space. With consideration of this, a number of recent neural network models leverage inductive biases from classical mechanics to encourage model interpretability and ensure forecasted states are physical. However, in general, these models are designed to capture the dynamics of a single system with fixed physical parameters, from state-space measurements of a known configuration space. In this paper we introduce Symplectic Phase Space GAN (SPS-GAN) which can capture the dynamics of multiple systems, and generalize to unseen physical parameters from. Moreover, SPS-GAN does not require prior knowledge of the system configuration space. In fact, SPS-GAN can discover the configuration space structure of the system from arbitrary measurement types (e.g., state-space measurements, video frames). To achieve physically plausible generation, we introduce a novel architecture which embeds a Hamiltonian neural network recurrent module in a conditional GAN backbone. To discover the structure of the configuration space, we optimize the conditional time-series GAN objective with an additional physically motivated term to encourages a sparse representation of the configuration space. We demonstrate the utility of SPS-GAN for trajectory prediction, video generation and symmetry discovery. Our approach captures multiple systems and achieves performance on par with supervised models designed for single systems.
Safe Task Space Synchronization with Time-Delayed Information
In this paper, an adaptive controller is designed for the synchronization of the trajectory of a robot with unknown kinematics and dynamics to that of the current human trajectory in the task space using the delayed human trajectory information. The communication time delay may be a result of various factors that arise in human-robot collaboration tasks, such as sensor processing or fusion to estimate trajectory/intent, network delays, or computational limitations. The developed adaptive controller uses Barrier Lyapunov Function (BLF) to constrain the Cartesian coordinates of the robot to ensure safety, an ICL-based adaptive law to account for the unknown kinematics, and a gradient-based adaptive law to estimate unknown dynamics. Barrier Lyapunov-Krasovskii (LK) functionals are used for the stability analysis to show that the synchronization and parameter estimation errors remain semi-globally uniformly ultimately bounded (SGUUB). The simulation results based on a human-robot synchronization scenario with time delay are provided to demonstrate the effectiveness of the designed synchronization controller with safety constraints.
Hierarchical Control Design for Space Robots with Application to In-Orbit Servicing Missions
In-Orbit Servicing and Active Debris Removal require advanced robotic capabilities for capturing and detumbling uncooperative targets. This work presents a hierarchical control framework for autonomous robotic capture of tumbling objects in space. A simulation environment is developed, incorporating sloshing dynamics of the chaser, a rarely studied effect in space robotics. The proposed controller combines an inner Lyapunov-based robust control loop for multi-body dynamics with an outer loop addressing an extended inverse kinematics problem. Simulation results show improved robustness and adaptability compared to existing control schemes.
Stochastic Security Constrained AC Optimal Power Flow Using General Polynomial Chaos Expansion
Addressing the uncertainty introduced by increasing renewable integration is crucial for secure power system operation, yet capturing it while preserving the full nonlinear physics of the grid remains a significant challenge. This paper presents a stochastic security constrained optimal power flow model with chance constraints supporting nonlinear AC power flow equations and non Gaussian uncertainties. We use general polynomial chaos expansion to model arbitrary uncertainties of finite variance, enabling accurate moment computations and robust prediction of system states across diverse operating scenarios. The chance constraints probabilistically limit inequality violations, providing a more flexible representation of controllable variables and the consequent power system operation. Case studies validate the proposed models effectiveness in satisfying operational constraints and capturing uncertainty with high fidelity. Compared to the deterministic formulation, it also uncovers a wider set of unsecure contingencies, highlighting improved uncertainty capture and operational insight.
comment: 2025 IEEE International Conference on Energy Technologies for Future Grids (6 pages)
Generative Modeling and Decision Fusion for Unknown Event Detection and Classification Using Synchrophasor Data
Reliable detection and classification of power system events are critical for maintaining grid stability and situational awareness. Existing approaches often depend on limited labeled datasets, which restricts their ability to generalize to rare or unseen disturbances. This paper proposes a novel framework that integrates generative modeling, sliding-window temporal processing, and decision fusion to achieve robust event detection and classification using synchrophasor data. A variational autoencoder-generative adversarial network is employed to model normal operating conditions, where both reconstruction error and discriminator error are extracted as anomaly indicators. Two complementary decision strategies are developed: a threshold-based rule for computational efficiency and a convex hull-based method for robustness under complex error distributions. These features are organized into spatiotemporal detection and classification matrices through a sliding-window mechanism, and an identification and decision fusion stage integrates the outputs across PMUs. This design enables the framework to identify known events while systematically classifying previously unseen disturbances into a new category, addressing a key limitation of supervised classifiers. Experimental results demonstrate state-of-the-art accuracy, surpassing machine learning, deep learning, and envelope-based baselines. The ability to recognize unknown events further highlights the adaptability and practical value of the proposed approach for wide-area event analysis in modern power systems.
comment: 10 pages
Red Teaming Quantum-Resistant Cryptographic Standards: A Penetration Testing Framework Integrating AI and Quantum Security
This study presents a structured approach to evaluating vulnerabilities within quantum cryptographic protocols, focusing on the BB84 quantum key distribution method and National Institute of Standards and Technology (NIST) approved quantum-resistant algorithms. By integrating AI-driven red teaming, automated penetration testing, and real-time anomaly detection, the research develops a framework for assessing and mitigating security risks in quantum networks. The findings demonstrate that AI can be effectively used to simulate adversarial attacks, probe weaknesses in cryptographic implementations, and refine security mechanisms through iterative feedback. The use of automated exploit simulations and protocol fuzzing provides a scalable means of identifying latent vulnerabilities, while adversarial machine learning techniques highlight novel attack surfaces within AI-enhanced cryptographic processes. This study offers a comprehensive methodology for strengthening quantum security and provides a foundation for integrating AI-driven cybersecurity practices into the evolving quantum landscape.
Finite Sample Analyses for Continuous-time Linear Systems: System Identification and Online Control
Real world evolves in continuous time but computations are done from finite samples. Therefore, we study algorithms using finite observations in continuous-time linear dynamical systems. We first study the system identification problem, and propose a first non-asymptotic error analysis with finite observations. Our algorithm identifies system parameters without needing integrated observations over certain time intervals, making it more practical for real-world applications. Further we propose a lower bound result that shows our estimator is provably optimal up to constant factors. Moreover, we apply the above algorithm to online control regret analysis for continuous-time linear system. Our system identification method allows us to explore more efficiently, enabling the swift detection of ineffective policies. We achieve a regret of $\mathcal{O}(\sqrt{T})$ over a single $T$-time horizon in a controllable system, requiring only $\mathcal{O}(T)$ observations of the system.
Neo-Grounded Theory: A Methodological Innovation Integrating High-Dimensional Vector Clustering and Multi-Agent Collaboration for Qualitative Research
Purpose: Neo Grounded Theory (NGT) integrates vector clustering with multi agent systems to resolve qualitative research's scale depth paradox, enabling analysis of massive datasets in hours while preserving interpretive rigor. Methods: We compared NGT against manual coding and ChatGPT-assisted analysis using 40,000 character Chinese interview transcripts. NGT employs 1536-dimensional embeddings, hierarchical clustering, and parallel agent-based coding. Two experiments tested pure automation versus human guided refinement. Findings: NGT achieved 168-fold speed improvement (3 hours vs 3 weeks), superior quality (0.904 vs 0.883), and 96% cost reduction. Human AI collaboration proved essential: automation alone produced abstract frameworks while human guidance yielded actionable dual pathway theories. The system discovered patterns invisible to manual coding, including identity bifurcation phenomena. Contributions: NGT demonstrates computational objectivity and human interpretation are complementary. Vector representations provide reproducible semantic measurement while preserving meaning's interpretive dimensions. Researchers shift from mechanical coding to theoretical guidance, with AI handling pattern recognition while humans provide creative insight. Implications: Cost reduction from \$50,000 to \$500 democratizes qualitative research, enabling communities to study themselves. Real-time analysis makes qualitative insights contemporaneous with events. The framework shows computational methods can strengthen rather than compromise qualitative research's humanistic commitments. Keywords: Grounded theory; Vector embeddings; Multi agent systems; Human AI collaboration; Computational qualitative analysis
comment: 44 pages, 11 figures
The need for and feasibility of alternative ground robots to traverse sandy and rocky extraterrestrial terrain
Robotic spacecraft have helped expand our reach for many planetary exploration missions. Most ground mobile planetary exploration robots use wheeled or modified wheeled platforms. Although extraordinarily successful at completing intended mission goals, because of the limitations of wheeled locomotion, they have been largely limited to benign, solid terrain and avoided extreme terrain with loose soil/sand and large rocks. Unfortunately, such challenging terrain is often scientifically interesting for planetary geology. Although many animals traverse such terrain at ease, robots have not matched their performance and robustness. This is in major part due to a lack of fundamental understanding of how effective locomotion can be generated from controlled interaction with complex terrain on the same level of flight aerodynamics and underwater vehicle hydrodynamics. Early fundamental understanding of legged and limbless locomotor-ground interaction has already enabled stable and efficient bio-inspired robot locomotion on relatively flat ground with small obstacles. Recent progress in the new field of terradynamics of locomotor-terrain interaction begins to reveal the principles of bio-inspired locomotion on loose soil/sand and over large obstacles. Multi-legged and limbless platforms using terradynamics insights hold the promise for serving as robust alternative platforms for traversing extreme extraterrestrial terrain and expanding our reach in planetary exploration.
Gaussian-Process-based Adaptive Tracking Control with Dynamic Active Learning for Autonomous Ground Vehicles
This article proposes an active-learning-based adaptive trajectory tracking control method for autonomous ground vehicles to compensate for modeling errors and unmodeled dynamics. The nominal vehicle model is decoupled into lateral and longitudinal subsystems, which are augmented with online Gaussian Processes (GPs), using measurement data. The estimated mean functions of the GPs are used to construct a feedback compensator, which, together with an LPV state feedback controller designed for the nominal system, gives the adaptive control structure. To assist exploration of the dynamics, the paper proposes a new, dynamic active learning method to collect the most informative samples to accelerate the training process. To analyze the performance of the overall learning tool-chain provided controller, a novel iterative, counterexample-based algorithm is proposed for calculating the induced L2 gain between the reference trajectory and the tracking error. The analysis can be executed for a set of possible realizations of the to-be-controlled system, giving robust performance certificate of the learning method under variation of the vehicle dynamics. The efficiency of the proposed control approach is shown on a high-fidelity physics simulator and in real experiments using a 1/10 scale F1TENTH electric car.
comment: Submitted to IEEE Transactions on Control Systems Technology (revised)
Divide, Conquer and Verify: Improving Symbolic Execution Performance
Symbolic Execution is a formal method that can be used to verify the behavior of computer programs and detect software vulnerabilities. Compared to other testing methods such as fuzzing, Symbolic Execution has the advantage of providing formal guarantees about the program. However, despite advances in performance in recent years, Symbolic Execution is too slow to be applied to real-world software. This is primarily caused by the \emph{path explosion problem} as well as by the computational complexity of SMT solving. In this paper, we present a divide-and-conquer approach for symbolic execution by executing individual slices and later combining the side effects. This way, the overall problem size is kept small, reducing the impact of computational complexity on large problems.
An approach to the LQG/LTR design problem with specifications for finite-dimensional SISO control systems
This is an expository paper which discusses an approach to the LQG/LTR design problem for finite-dimensional SISO control systems. The approach is based on the utilisation of weighting augmentation for incorporating design specifications into the framework of the LTR technique for LQG compensator design. The LQG compensator is to simultaneously meet given analytical low- and high-frequency design specifications expressed in terms of desirable sensitivity and controller noise sensitivity functions. The paper is aimed at nonspecialists and, in particular, practitioners in finite-dimensional LQG theory interested in the design of feedback compensators for closed-loop performance and robustness shaping of SISO control systems in realistic situations. The proposed approach is illustrated by a detailed numerical example: the torque control of a geared DC motor with an elastically mounted output shaft.
comment: typos corrected
Optimal Behavior Planning for Implicit Communication using a Probabilistic Vehicle-Pedestrian Interaction Model
In interactions between automated vehicles (AVs) and crossing pedestrians, modeling implicit vehicle communication is crucial. In this work, we present a combined prediction and planning approach that allows to consider the influence of the planned vehicle behavior on a pedestrian and predict a pedestrian's reaction. We plan the behavior by solving two consecutive optimal control problems (OCPs) analytically, using variational calculus. We perform a validation step that assesses whether the planned vehicle behavior is adequate to trigger a certain pedestrian reaction, which accounts for the closed-loop characteristics of prediction and planning influencing each other. In this step, we model the influence of the planned vehicle behavior on the pedestrian using a probabilistic behavior acceptance model that returns an estimate for the crossing probability. The probabilistic modeling of the pedestrian reaction facilitates considering the pedestrian's costs, thereby improving cooperative behavior planning. We demonstrate the performance of the proposed approach in simulated vehicle-pedestrian interactions with varying initial settings and highlight the decision making capabilities of the planning approach.
comment: 8 pages, 5 figures, conference article
Constraint Hypergraphs as a Unifying Framework for Digital Twins
Digital twins, used to represent physical systems, have been lauded as tools for understanding reality. Complex system behavior is typically captured in domain-specific models crafted by subject experts. Contemporary methods for employing models in a digital twin require prescriptive interfaces, resulting in twins that are difficult to connect, redeploy, and modify. The limited interoperability of these twins has prompted calls for a universal framework enabling observability across model aggregations. Here we show how a new mathematical formalism called a constraint hypergraph addresses these challenges in interoperability. A digital twin is shown to be the second of two coupled systems where both adhere to the same constraint hypergraph, permitting the properties of the first to be observable from the second. Interoperability is given by deconstructing models into a structure enabling autonomous, white-box simulation of system properties. The resulting digital twins can interact immediately with both human and autonomous agents. This is demonstrated in a case study of a microgrid, showing how both measured and simulated data from the aggregated twins can be provided regardless of the operating environment. By connecting models, constraint hypergraphs supply scientists and modelers robust means to capture, communicate, and combine digital twins across all fields of study. We expect this framework to expand the use of digital twins, enriching scientific insights and collaborations by providing a structure for characterizing complex systems.
comment: 12 pages, 7 figures
Vibrational Stabilization of Complex Network Systems
Many natural and man-made network systems need to maintain certain patterns, such as working at equilibria or limit cycles, to function properly. Thus, the ability to stabilize such patterns is crucial. Most of the existing studies on stabilization assume that network systems states can be measured online so that feedback control strategies can be used. However, in many real-world scenarios, systems states, e.g., neuronal activity in the brain, are often difficult to measure. In this paper, we take this situation into account and study the stabilization problem of linear network systems with an open-loop control strategy (vibrational control). We derive a graph-theoretic sufficient condition for structural vibrational stabilizability, under which network systems can always be stabilized. We further provide an approach to select the locations in the network for control placement and design corresponding vibrational inputs to stabilize systems that satisfy this condition. Finally, we provide some numerical results that demonstrate the validity of our theoretical findings.
Robotics
Taxonomy-aware Dynamic Motion Generation on Hyperbolic Manifolds
Human-like motion generation for robots often draws inspiration from biomechanical studies, which often categorize complex human motions into hierarchical taxonomies. While these taxonomies provide rich structural information about how movements relate to one another, this information is frequently overlooked in motion generation models, leading to a disconnect between the generated motions and their underlying hierarchical structure. This paper introduces the \ac{gphdm}, a novel approach that learns latent representations preserving both the hierarchical structure of motions and their temporal dynamics to ensure physical consistency. Our model achieves this by extending the dynamics prior of the Gaussian Process Dynamical Model (GPDM) to the hyperbolic manifold and integrating it with taxonomy-aware inductive biases. Building on this geometry- and taxonomy-aware frameworks, we propose three novel mechanisms for generating motions that are both taxonomically-structured and physically-consistent: two probabilistic recursive approaches and a method based on pullback-metric geodesics. Experiments on generating realistic motion sequences on the hand grasping taxonomy show that the proposed GPHDM faithfully encodes the underlying taxonomy and temporal dynamics, and generates novel physically-consistent trajectories.
comment: 8 pages, 6 figures, 1 table
\LARGE GMP$^{3}$: Learning-Driven, Bellman-Guided Trajectory Planning for UAVs in Real-Time on SE(3)
We propose $\text{GMP}^{3}$, a multiphase global path planning framework that generates dynamically feasible three-dimensional trajectories for unmanned aerial vehicles (UAVs) operating in cluttered environments. The framework extends traditional path planning from Euclidean position spaces to the Lie group $\mathrm{SE}(3)$, allowing joint learning of translational motion and rotational dynamics. A modified Bellman-based operator is introduced to support reinforcement learning (RL) policy updates while leveraging prior trajectory information for improved convergence. $\text{GMP}^{3}$ is designed as a distributed framework in which agents influence each other and share policy information along the trajectory: each agent refines its assigned segment and shares with its neighbors via a consensus-based scheme, enabling cooperative policy updates and convergence toward a path shaped globally even under kinematic constraints. We also propose DroneManager, a modular ground control software that interfaces the planner with real UAV platforms via the MAVLink protocol, supporting real-time deployment and feedback. Simulation studies and indoor flight experiments validate the effectiveness of the proposed method in constrained 3D environments, demonstrating reliable obstacle avoidance and smooth, feasible trajectories across both position and orientation. The open-source implementation is available at https://github.com/Domattee/DroneManager
BiNoMaP: Learning Category-Level Bimanual Non-Prehensile Manipulation Primitives
Non-prehensile manipulation, encompassing ungraspable actions such as pushing, poking, and pivoting, represents a critical yet underexplored domain in robotics due to its contact-rich and analytically intractable nature. In this work, we revisit this problem from two novel perspectives. First, we move beyond the usual single-arm setup and the strong assumption of favorable external dexterity such as walls, ramps, or edges. Instead, we advocate a generalizable dual-arm configuration and establish a suite of Bimanual Non-prehensile Manipulation Primitives (BiNoMaP). Second, we depart from the prevailing RL-based paradigm and propose a three-stage, RL-free framework to learn non-prehensile skills. Specifically, we begin by extracting bimanual hand motion trajectories from video demonstrations. Due to visual inaccuracies and morphological gaps, these coarse trajectories are difficult to transfer directly to robotic end-effectors. To address this, we propose a geometry-aware post-optimization algorithm that refines raw motions into executable manipulation primitives that conform to specific motion patterns. Beyond instance-level reproduction, we further enable category-level generalization by parameterizing the learned primitives with object-relevant geometric attributes, particularly size, resulting in adaptable and general parameterized manipulation primitives. We validate BiNoMaP across a range of representative bimanual tasks and diverse object categories, demonstrating its effectiveness, efficiency, versatility, and superior generalization capability.
comment: under review
RetoVLA: Reusing Register Tokens for Spatial Reasoning in Vision-Language-Action Models
Recent Vision-Language-Action (VLA) models demonstrate remarkable generalization in robotics but are restricted by their substantial size and computational cost, limiting real-world deployment. However, conventional lightweighting methods often sacrifice critical capabilities, particularly spatial reasoning. This creates a trade-off between efficiency and performance. To address this challenge, our work reuses Register Tokens, which were introduced for artifact removal in Vision Transformers but subsequently discarded. We suppose that these tokens contain essential spatial information and propose RetoVLA, a novel architecture that reuses them directly by injecting them into the Action Expert. RetoVLA maintains a lightweight structure while leveraging this repurposed spatial context to enhance reasoning. We demonstrate RetoVLA's effectiveness through a series of comprehensive experiments. On our custom-built 7-DOF robot arm, the model achieves a 17.1%p absolute improvement in success rates for complex manipulation tasks. Our results confirm that reusing Register Tokens directly enhances spatial reasoning, demonstrating that what was previously discarded as an artifact is in fact a valuable, unexplored resource for robotic intelligence. A video demonstration is available at: https://youtu.be/2CseBR-snZg
FSGlove: An Inertial-Based Hand Tracking System with Shape-Aware Calibration IROS 2025
Accurate hand motion capture (MoCap) is vital for applications in robotics, virtual reality, and biomechanics, yet existing systems face limitations in capturing high-degree-of-freedom (DoF) joint kinematics and personalized hand shape. Commercial gloves offer up to 21 DoFs, which are insufficient for complex manipulations while neglecting shape variations that are critical for contact-rich tasks. We present FSGlove, an inertial-based system that simultaneously tracks up to 48 DoFs and reconstructs personalized hand shapes via DiffHCal, a novel calibration method. Each finger joint and the dorsum are equipped with IMUs, enabling high-resolution motion sensing. DiffHCal integrates with the parametric MANO model through differentiable optimization, resolving joint kinematics, shape parameters, and sensor misalignment during a single streamlined calibration. The system achieves state-of-the-art accuracy, with joint angle errors of less than 2.7 degree, and outperforms commercial alternatives in shape reconstruction and contact fidelity. FSGlove's open-source hardware and software design ensures compatibility with current VR and robotics ecosystems, while its ability to capture subtle motions (e.g., fingertip rubbing) bridges the gap between human dexterity and robotic imitation. Evaluated against Nokov optical MoCap, FSGlove advances hand tracking by unifying the kinematic and contact fidelity. Hardware design, software, and more results are available at: https://sites.google.com/view/fsglove.
comment: Presented at IROS 2025, details are available at https://fsglove.robotflow.ai
SEEC: Stable End-Effector Control with Model-Enhanced Residual Learning for Humanoid Loco-Manipulation
Arm end-effector stabilization is essential for humanoid loco-manipulation tasks, yet it remains challenging due to the high degrees of freedom and inherent dynamic instability of bipedal robot structures. Previous model-based controllers achieve precise end-effector control but rely on precise dynamics modeling and estimation, which often struggle to capture real-world factors (e.g., friction and backlash) and thus degrade in practice. On the other hand, learning-based methods can better mitigate these factors via exploration and domain randomization, and have shown potential in real-world use. However, they often overfit to training conditions, requiring retraining with the entire body, and still struggle to adapt to unseen scenarios. To address these challenges, we propose a novel stable end-effector control (SEEC) framework with model-enhanced residual learning that learns to achieve precise and robust end-effector compensation for lower-body induced disturbances through model-guided reinforcement learning (RL) with a perturbation generator. This design allows the upper-body policy to achieve accurate end-effector stabilization as well as adapt to unseen locomotion controllers with no additional training. We validate our framework in different simulators and transfer trained policies to the Booster T1 humanoid robot. Experiments demonstrate that our method consistently outperforms baselines and robustly handles diverse and demanding loco-manipulation tasks.
comment: 9 pages, 5 figures
Next-Generation Aerial Robots -- Omniorientational Strategies: Dynamic Modeling, Control, and Comparative Analysis
Conventional multi-rotors are under-actuated systems, hindering them from independently controlling attitude from position. In this study, we present several distinct configurations that incorporate additional control inputs for manipulating the angles of the propeller axes. This addresses the mentioned limitations, making the systems "omniorientational". We comprehensively derived detailed dynamic models for all introduced configurations and validated by a methodology using Simscape Multibody simulations. Two controllers are designed: a sliding mode controller for robust handling of disturbances and a novel PID-based controller with gravity compensation integrating linear and non-linear allocators, designed for computational efficiency. A custom control allocation strategy is implemented to manage the input-non-affine nature of these systems, seeking to maximize battery life by minimizing the "Power Consumption Factor" defined in this study. Moreover, the controllers effectively managed harsh disturbances and uncertainties. Simulations compare and analyze the proposed configurations and controllers, majorly considering their power consumption. Furthermore, we conduct a qualitative comparison to evaluate the impact of different types of uncertainties on the control system, highlighting areas for potential model or hardware improvements. The analysis in this study provides a roadmap for future researchers to design omniorientational drones based on their design objectives, offering practical insights into configuration selection and controller design. This research aligns with the project SAC-1, one of the objectives of Sharif AgRoLab.
Human-like Navigation in a World Built for Humans
When navigating in a man-made environment they haven't visited before--like an office building--humans employ behaviors such as reading signs and asking others for directions. These behaviors help humans reach their destinations efficiently by reducing the need to search through large areas. Existing robot navigation systems lack the ability to execute such behaviors and are thus highly inefficient at navigating within large environments. We present ReasonNav, a modular navigation system which integrates these human-like navigation skills by leveraging the reasoning capabilities of a vision-language model (VLM). We design compact input and output abstractions based on navigation landmarks, allowing the VLM to focus on language understanding and reasoning. We evaluate ReasonNav on real and simulated navigation tasks and show that the agent successfully employs higher-order reasoning to navigate efficiently in large, complex buildings.
comment: CoRL 2025. Project website: https://reasonnav.github.io/
DAGDiff: Guiding Dual-Arm Grasp Diffusion to Stable and Collision-Free Grasps
Reliable dual-arm grasping is essential for manipulating large and complex objects but remains a challenging problem due to stability, collision, and generalization requirements. Prior methods typically decompose the task into two independent grasp proposals, relying on region priors or heuristics that limit generalization and provide no principled guarantee of stability. We propose DAGDiff, an end-to-end framework that directly denoises to grasp pairs in the SE(3) x SE(3) space. Our key insight is that stability and collision can be enforced more effectively by guiding the diffusion process with classifier signals, rather than relying on explicit region detection or object priors. To this end, DAGDiff integrates geometry-, stability-, and collision-aware guidance terms that steer the generative process toward grasps that are physically valid and force-closure compliant. We comprehensively evaluate DAGDiff through analytical force-closure checks, collision analysis, and large-scale physics-based simulations, showing consistent improvements over previous work on these metrics. Finally, we demonstrate that our framework generates dual-arm grasps directly on real-world point clouds of previously unseen objects, which are executed on a heterogeneous dual-arm setup where two manipulators reliably grasp and lift them.
Automotive-ENV: Benchmarking Multimodal Agents in Vehicle Interface Systems
Multimodal agents have demonstrated strong performance in general GUI interactions, but their application in automotive systems has been largely unexplored. In-vehicle GUIs present distinct challenges: drivers' limited attention, strict safety requirements, and complex location-based interaction patterns. To address these challenges, we introduce Automotive-ENV, the first high-fidelity benchmark and interaction environment tailored for vehicle GUIs. This platform defines 185 parameterized tasks spanning explicit control, implicit intent understanding, and safety-aware tasks, and provides structured multimodal observations with precise programmatic checks for reproducible evaluation. Building on this benchmark, we propose ASURADA, a geo-aware multimodal agent that integrates GPS-informed context to dynamically adjust actions based on location, environmental conditions, and regional driving norms. Experiments show that geo-aware information significantly improves success on safety-aware tasks, highlighting the importance of location-based context in automotive environments. We will release Automotive-ENV, complete with all tasks and benchmarking tools, to further the development of safe and adaptive in-vehicle agents.
comment: 10 pages, 5 figures,
Rich State Observations Empower Reinforcement Learning to Surpass PID: A Drone Ball Balancing Study IROS
This paper addresses a drone ball-balancing task, in which a drone stabilizes a ball atop a movable beam through cable-based interaction. We propose a hierarchical control framework that decouples high-level balancing policy from low-level drone control, and train a reinforcement learning (RL) policy to handle the high-level decision-making. Simulation results show that the RL policy achieves superior performance compared to carefully tuned PID controllers within the same hierarchical structure. Through systematic comparative analysis, we demonstrate that RL's advantage stems not from improved parameter tuning or inherent nonlinear mapping capabilities, but from its ability to effectively utilize richer state observations. These findings underscore the critical role of comprehensive state representation in learning-based systems and suggest that enhanced sensing could be instrumental in improving controller performance.
comment: Accepted for presentation at the Advancements in Aerial Physical Interaction Workshop of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2025
Cross-Modal Instructions for Robot Motion Generation
Teaching robots novel behaviors typically requires motion demonstrations via teleoperation or kinaesthetic teaching, that is, physically guiding the robot. While recent work has explored using human sketches to specify desired behaviors, data collection remains cumbersome, and demonstration datasets are difficult to scale. In this paper, we introduce an alternative paradigm, Learning from Cross-Modal Instructions, where robots are shaped by demonstrations in the form of rough annotations, which can contain free-form text labels, and are used in lieu of physical motion. We introduce the CrossInstruct framework, which integrates cross-modal instructions as examples into the context input to a foundational vision-language model (VLM). The VLM then iteratively queries a smaller, fine-tuned model, and synthesizes the desired motion over multiple 2D views. These are then subsequently fused into a coherent distribution over 3D motion trajectories in the robot's workspace. By incorporating the reasoning of the large VLM with a fine-grained pointing model, CrossInstruct produces executable robot behaviors that generalize beyond the environment of in the limited set of instruction examples. We then introduce a downstream reinforcement learning pipeline that leverages CrossInstruct outputs to efficiently learn policies to complete fine-grained tasks. We rigorously evaluate CrossInstruct on benchmark simulation tasks and real hardware, demonstrating effectiveness without additional fine-tuning and providing a strong initialization for policies subsequently refined via reinforcement learning.
Flight Dynamics to Sensing Modalities: Exploiting Drone Ground Effect for Accurate Edge Detection
Drone-based rapid and accurate environmental edge detection is highly advantageous for tasks such as disaster relief and autonomous navigation. Current methods, using radars or cameras, raise deployment costs and burden lightweight drones with high computational demands. In this paper, we propose AirTouch, a system that transforms the ground effect from a stability "foe" in traditional flight control views, into a "friend" for accurate and efficient edge detection. Our key insight is that analyzing drone basic attitude sensor readings and flight commands allows us to detect ground effect changes. Such changes typically indicate the drone flying over a boundary of two materials, making this information valuable for edge detection. We approach this insight through theoretical analysis, algorithm design, and implementation, fully leveraging the ground effect as a new sensing modality without compromising drone flight stability, thereby achieving accurate and efficient scene edge detection. We also compare this new sensing modality with vision-based methods to clarify its exclusive advantages in resource efficiency and detection capability. Extensive evaluations demonstrate that our system achieves a high detection accuracy with mean detection distance errors of 0.051m, outperforming the baseline method performance by 86%. With such detection performance, our system requires only 43 mW power consumption, contributing to this new sensing modality for low-cost and highly efficient edge detection.
Normalizing Flows are Capable Visuomotor Policy Learning Models
The field of general purpose robotics has recently embraced powerful probabilistic models, such as diffusion models, to model and learn complex behaviors. However, these models often come with significant trade-offs, namely high computational costs for inference and a fundamental inability to quantify output uncertainty. We argue that a model's trustworthiness, a critical factor for reliable, general-purpose robotics, is inherently linked to its ability to provide confidence measures. In this work, we introduce Normalizing Flows Policy, a novel visuomotor policy learning model based on Normalizing Flows. We show that Normalizing Flows are a natural and powerful alternative to diffusion models, providing both a statistically sound measure of confidence and a highly efficient inference process. Through comprehensive experiments across four distinct simulated robotic tasks, we demonstrate that Normalizing Flows Policy achieves performance comparable to, and often surpassing, Diffusion Policy, and it does so not only with improved sample efficiency but also with up to 30 times faster inference. Additionally, our ablation study validates several key architectural and training techniques that enable Normalizing Flows to perform well in this domain.
MPC-based Deep Reinforcement Learning Method for Space Robotic Control with Fuel Sloshing Mitigation IROS
This paper presents an integrated Reinforcement Learning (RL) and Model Predictive Control (MPC) framework for autonomous satellite docking with a partially filled fuel tank. Traditional docking control faces challenges due to fuel sloshing in microgravity, which induces unpredictable forces affecting stability. To address this, we integrate Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC) RL algorithms with MPC, leveraging MPC's predictive capabilities to accelerate RL training and improve control robustness. The proposed approach is validated through Zero-G Lab of SnT experiments for planar stabilization and high-fidelity numerical simulations for 6-DOF docking with fuel sloshing dynamics. Simulation results demonstrate that SAC-MPC achieves superior docking accuracy, higher success rates, and lower control effort, outperforming standalone RL and PPO-MPC methods. This study advances fuel-efficient and disturbance-resilient satellite docking, enhancing the feasibility of on-orbit refueling and servicing missions.
comment: Pre-print version submitted to IEEE IROS
KeyWorld: Key Frame Reasoning Enables Effective and Efficient World Models
Robotic world models are a promising paradigm for forecasting future environment states, yet their inference speed and the physical plausibility of generated trajectories remain critical bottlenecks, limiting their real-world applications. This stems from the redundancy of the prevailing frame-to-frame generation approach, where the model conducts costly computation on similar frames, as well as neglecting the semantic importance of key transitions. To address this inefficiency, we propose KeyWorld, a framework that improves text-conditioned robotic world models by concentrating transformers computation on a few semantic key frames while employing a lightweight convolutional model to fill the intermediate frames. Specifically, KeyWorld first identifies significant transitions by iteratively simplifying the robot's motion trajectories, obtaining the ground truth key frames. Then, a DiT model is trained to reason and generate these physically meaningful key frames from textual task descriptions. Finally, a lightweight interpolator efficiently reconstructs the full video by inpainting all intermediate frames. Evaluations on the LIBERO benchmark demonstrate that KeyWorld achieves a 5.68$\times$ acceleration compared to the frame-to-frame generation baseline, and focusing on the motion-aware key frames further contributes to the physical validity of the generated videos, especially on complex tasks. Our approach highlights a practical path toward deploying world models in real-time robotic control and other domains requiring both efficient and effective world models. Code is released at https://anonymous.4open.science/r/Keyworld-E43D.
Multi-Robot Vision-Based Task and Motion Planning for EV Battery Disassembly and Sorting
Electric-vehicle (EV) battery disassembly requires precise multi-robot coordination, short and reliable motions, and robust collision safety in cluttered, dynamic scenes. We propose a four-layer task-and-motion planning (TAMP) framework that couples symbolic task planning and cost- and accessibility-aware allocation with a TP-GMM-guided motion planner learned from demonstrations. Stereo vision with YOLOv8 provides real-time component localization, while OctoMap-based 3D mapping and FCL(Flexible Collision Library) checks in MoveIt unify predictive digital-twin collision checking with reactive, vision-based avoidance. Validated on two UR10e robots across cable, busbar, service plug, and three leaf-cell removals, the approach yields substantially more compact and safer motions than a default RRTConnect baseline under identical perception and task assignments: average end-effector path length drops by $-63.3\%$ and makespan by $-8.1\%$; per-arm swept volumes shrink (R1: $0.583\rightarrow0.139\,\mathrm{m}^3$; R2: $0.696\rightarrow0.252\,\mathrm{m}^3$), and mutual overlap decreases by $47\%$ ($0.064\rightarrow0.034\,\mathrm{m}^3$). These results highlight improved autonomy, precision, and safety for multi-robot EV battery disassembly in unstructured, dynamic environments.
AnywhereVLA: Language-Conditioned Exploration and Mobile Manipulation
We address natural language pick-and-place in unseen, unpredictable indoor environments with AnywhereVLA, a modular framework for mobile manipulation. A user text prompt serves as an entry point and is parsed into a structured task graph that conditions classical SLAM with LiDAR and cameras, metric semantic mapping, and a task-aware frontier exploration policy. An approach planner then selects visibility and reachability aware pre grasp base poses. For interaction, a compact SmolVLA manipulation head is fine tuned on platform pick and place trajectories for the SO-101 by TheRobotStudio, grounding local visual context and sub-goals into grasp and place proposals. The full system runs fully onboard on consumer-level hardware, with Jetson Orin NX for perception and VLA and an Intel NUC for SLAM, exploration, and control, sustaining real-time operation. We evaluated AnywhereVLA in a multi-room lab under static scenes and normal human motion. In this setting, the system achieves a $46\%$ overall task success rate while maintaining throughput on embedded compute. By combining a classical stack with a fine-tuned VLA manipulation, the system inherits the reliability of geometry-based navigation with the agility and task generalization of language-conditioned manipulation.
BactoBot: A Low-Cost, Bacteria-Inspired Soft Underwater Robot for Marine Exploration
Traditional rigid underwater vehicles pose risks to delicate marine ecosystems. This paper presents BactoBot, a low-cost, soft underwater robot designed for safe and gentle marine exploration. Inspired by bacterial flagellar propulsion, BactoBot features 12 flexible, silicone-based arms arranged on a 3D-printed dodecahedral frame. The design provides inherent compliance, redundancy, and the potential for omnidirectional movement. The prototype was fabricated using accessible DIY methods, including food-grade silicone molding, 3D printing, and off-the-shelf microcontrollers. Waterproofing and buoyancy calibration protocols were developed, and the robot was successfully tested in a controlled water tank, demonstrating forward motion and turning. The results validate the feasibility of replicating complex biological locomotion at low cost. The project lays a foundation for environmentally conscious robotic tools, particularly for marine science in resource-constrained settings, and identifies pathways toward autonomous operation and field deployment.
comment: 8 pages, 4 figures. Project repository available at https://github.com/rubaiyattasnim/BactoBot
Autoregressive End-to-End Planning with Time-Invariant Spatial Alignment and Multi-Objective Policy Refinement
The inherent sequential modeling capabilities of autoregressive models make them a formidable baseline for end-to-end planning in autonomous driving. Nevertheless, their performance is constrained by a spatio-temporal misalignment, as the planner must condition future actions on past sensory data. This creates an inconsistent worldview, limiting the upper bound of performance for an otherwise powerful approach. To address this, we propose a Time-Invariant Spatial Alignment (TISA) module that learns to project initial environmental features into a consistent ego-centric frame for each future time step, effectively correcting the agent's worldview without explicit future scene prediction. In addition, we employ a kinematic action prediction head (i.e., acceleration and yaw rate) to ensure physically feasible trajectories. Finally, we introduce a multi-objective post-training stage using Direct Preference Optimization (DPO) to move beyond pure imitation. Our approach provides targeted feedback on specific driving behaviors, offering a more fine-grained learning signal than the single, overall objective used in standard DPO. Our model achieves a state-of-the-art 89.8 PDMS on the NAVSIM dataset among autoregressive models. The video document is available at https://tisa-dpo-e2e.github.io/.
Efficient Differentiable Contact Model with Long-range Influence
With the maturation of differentiable physics, its role in various downstream applications: such as model predictive control, robotic design optimization, and neural PDE solvers, has become increasingly important. However, the derivative information provided by differentiable simulators can exhibit abrupt changes or vanish altogether, impeding the convergence of gradient-based optimizers. In this work, we demonstrate that such erratic gradient behavior is closely tied to the design of contact models. We further introduce a set of properties that a contact model must satisfy to ensure well-behaved gradient information. Lastly, we present a practical contact model for differentiable rigid-body simulators that satisfies all of these properties while maintaining computational efficiency. Our experiments show that, even from simple initializations, our contact model can discover complex, contact-rich control signals, enabling the successful execution of a range of downstream locomotion and manipulation tasks.
Finding 3D Positions of Distant Objects from Noisy Camera Movement and Semantic Segmentation Sequences
3D object localisation based on a sequence of camera measurements is essential for safety-critical surveillance tasks, such as drone-based wildfire monitoring. Localisation of objects detected with a camera can typically be solved with dense depth estimation or 3D scene reconstruction. However, in the context of distant objects or tasks limited by the amount of available computational resources, neither solution is feasible. In this paper, we show that the task can be solved using particle filters for both single and multiple target scenarios. The method was studied using a 3D simulation and a drone-based image segmentation sequence with global navigation satellite system (GNSS)-based camera pose estimates. The results showed that a particle filter can be used to solve practical localisation tasks based on camera poses and image segments in these situations where other solutions fail. The particle filter is independent of the detection method, making it flexible for new tasks. The study also demonstrates that drone-based wildfire monitoring can be conducted using the proposed method paired with a pre-existing image segmentation model.
MTRDrive: Memory-Tool Synergistic Reasoning for Robust Autonomous Driving in Corner Cases
Vision-Language Models(VLMs) have demonstrated significant potential for end-to-end autonomous driving, yet a substantial gap remains between their current capabilities and the reliability necessary for real-world deployment. A critical challenge is their fragility, characterized by hallucinations and poor generalization in out-of-distribution (OOD) scenarios. To bridge this gap, we introduce MTRDrive, a novel framework that integrates procedural driving experiences with a dynamic toolkit to enhance generalization and proactive decision-making. MTRDrive addresses these limitations through a closed-loop system that combines a memory-based experience retrieval mechanism with dynamic toolkits. This synergy enables the model to interact more effectively with its environment, improving both reasoning and decision-making capabilities with the help of our memory-tool synergistic reasoning. Additionally, we introduce a new benchmark based on complex Roadwork construction scenarios to rigorously evaluate zero-shot generalization. Extensive experiments demonstrate the superior effectiveness of our approach. On the public NAVSIM benchmark, our 3B-parameter MTRDrive model achieves an exceptional PDMS of 88.3 without chain-of-thought and sets a state-of-the-art performance bar on high-level planning, with a driving metric score of 79.8\% and a planning accuracy of 82.6\%. Rigorous zero-shot evaluation on the new Roadwork-VLM benchmark shows a strong ability to reason robustly in unseen scenarios, achieving a driving metric score of 80.2\%. These results highlight MTRDrive's potential to advance autonomous driving toward safer and more reliable systems.
comment: 8 pages
ImaginationPolicy: Towards Generalizable, Precise and Reliable End-to-End Policy for Robotic Manipulation
End-to-end robot manipulation policies offer significant potential for enabling embodied agents to understand and interact with the world. Unlike traditional modular pipelines, end-to-end learning mitigates key limitations such as information loss between modules and feature misalignment caused by isolated optimization targets. Despite these advantages, existing end-to-end neural networks for robotic manipulation--including those based on large VLM/VLA models--remain insufficiently performant for large-scale practical deployment. In this paper, we take a step towards an end-to-end manipulation policy that is generalizable, accurate and reliable. To achieve this goal, we propose a novel Chain of Moving Oriented Keypoints (CoMOK) formulation for robotic manipulation. Our formulation is used as the action representation of a neural policy, which can be trained in an end-to-end fashion. Such an action representation is general, as it extends the standard end-effector pose action representation and supports a diverse set of manipulation tasks in a unified manner. The oriented keypoint in our method enables natural generalization to objects with different shapes and sizes, while achieving sub-centimeter accuracy. Moreover, our formulation can easily handle multi-stage tasks, multi-modal robot behaviors, and deformable objects. Extensive simulated and hardware experiments demonstrate the effectiveness of our method.
comment: First two authors contribute equally. Project page: https://sites.google.com/view/imaginationpolicy
SemSight: Probabilistic Bird's-Eye-View Prediction of Multi-Level Scene Semantics for Navigation
In target-driven navigation and autonomous exploration, reasonable prediction of unknown regions is crucial for efficient navigation and environment understanding. Existing methods mostly focus on single objects or geometric occupancy maps, lacking the ability to model room-level semantic structures. We propose SemSight, a probabilistic bird's-eye-view prediction model for multi-level scene semantics. The model jointly infers structural layouts, global scene context, and target area distributions, completing semantic maps of unexplored areas while estimating probability maps for target categories. To train SemSight, we simulate frontier-driven exploration on 2,000 indoor layout graphs, constructing a diverse dataset of 40,000 sequential egocentric observations paired with complete semantic maps. We adopt an encoder-decoder network as the core architecture and introduce a mask-constrained supervision strategy. This strategy applies a binary mask of unexplored areas so that supervision focuses only on unknown regions, forcing the model to infer semantic structures from the observed context. Experimental results show that SemSight improves prediction performance for key functional categories in unexplored regions and outperforms non-mask-supervised approaches on metrics such as Structural Consistency (SC) and Region Recognition Accuracy (PA). It also enhances navigation efficiency in closed-loop simulations, reducing the number of search steps when guiding robots toward target areas.
Leveraging Temporally Extended Behavior Sharing for Multi-task Reinforcement Learning IROS
Multi-task reinforcement learning (MTRL) offers a promising approach to improve sample efficiency and generalization by training agents across multiple tasks, enabling knowledge sharing between them. However, applying MTRL to robotics remains challenging due to the high cost of collecting diverse task data. To address this, we propose MT-L\'evy, a novel exploration strategy that enhances sample efficiency in MTRL environments by combining behavior sharing across tasks with temporally extended exploration inspired by L\'evy flight. MT-L\'evy leverages policies trained on related tasks to guide exploration towards key states, while dynamically adjusting exploration levels based on task success ratios. This approach enables more efficient state-space coverage, even in complex robotics environments. Empirical results demonstrate that MT-L\'evy significantly improves exploration and sample efficiency, supported by quantitative and qualitative analyses. Ablation studies further highlight the contribution of each component, showing that combining behavior sharing with adaptive exploration strategies can significantly improve the practicality of MTRL in robotics applications.
comment: Accepted for publication in the proceedings of the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
MASt3R-Fusion: Integrating Feed-Forward Visual Model with IMU, GNSS for High-Functionality SLAM
Visual SLAM is a cornerstone technique in robotics, autonomous driving and extended reality (XR), yet classical systems often struggle with low-texture environments, scale ambiguity, and degraded performance under challenging visual conditions. Recent advancements in feed-forward neural network-based pointmap regression have demonstrated the potential to recover high-fidelity 3D scene geometry directly from images, leveraging learned spatial priors to overcome limitations of traditional multi-view geometry methods. However, the widely validated advantages of probabilistic multi-sensor information fusion are often discarded in these pipelines. In this work, we propose MASt3R-Fusion,a multi-sensor-assisted visual SLAM framework that tightly integrates feed-forward pointmap regression with complementary sensor information, including inertial measurements and GNSS data. The system introduces Sim(3)-based visualalignment constraints (in the Hessian form) into a universal metric-scale SE(3) factor graph for effective information fusion. A hierarchical factor graph design is developed, which allows both real-time sliding-window optimization and global optimization with aggressive loop closures, enabling real-time pose tracking, metric-scale structure perception and globally consistent mapping. We evaluate our approach on both public benchmarks and self-collected datasets, demonstrating substantial improvements in accuracy and robustness over existing visual-centered multi-sensor SLAM systems. The code will be released open-source to support reproducibility and further research (https://github.com/GREAT-WHU/MASt3R-Fusion).
Meta-Memory: Retrieving and Integrating Semantic-Spatial Memories for Robot Spatial Reasoning
Navigating complex environments requires robots to effectively store observations as memories and leverage them to answer human queries about spatial locations, which is a critical yet underexplored research challenge. While prior work has made progress in constructing robotic memory, few have addressed the principled mechanisms needed for efficient memory retrieval and integration. To bridge this gap, we propose Meta-Memory, a large language model (LLM)-driven agent that constructs a high-density memory representation of the environment. The key innovation of Meta-Memory lies in its capacity to retrieve and integrate relevant memories through joint reasoning over semantic and spatial modalities in response to natural language location queries, thereby empowering robots with robust and accurate spatial reasoning capabilities. To evaluate its performance, we introduce SpaceLocQA, a large-scale dataset encompassing diverse real-world spatial question-answering scenarios. Experimental results show that Meta-Memory significantly outperforms state-of-the-art methods on both the SpaceLocQA and the public NaVQA benchmarks. Furthermore, we successfully deployed Meta-Memory on real-world robotic platforms, demonstrating its practical utility in complex environments. Project page: https://itsbaymax.github.io/meta-memory.github.io/ .
SLAM-Free Visual Navigation with Hierarchical Vision-Language Perception and Coarse-to-Fine Semantic Topological Planning
Conventional SLAM pipelines for legged robot navigation are fragile under rapid motion, calibration demands, and sensor drift, while offering limited semantic reasoning for task-driven exploration. To deal with these issues, we propose a vision-only, SLAM-free navigation framework that replaces dense geometry with semantic reasoning and lightweight topological representations. A hierarchical vision-language perception module fuses scene-level context with object-level cues for robust semantic inference. And a semantic-probabilistic topological map supports coarse-to-fine planning: LLM-based global reasoning for subgoal selection and vision-based local planning for obstacle avoidance. Integrated with reinforcement-learning locomotion controllers, the framework is deployable across diverse legged robot platforms. Experiments in simulation and real-world settings demonstrate consistent improvements in semantic accuracy, planning quality, and navigation success, while ablation studies further showcase the necessity of both hierarchical perception and fine local planning. This work introduces a new paradigm for SLAM-free, vision-language-driven navigation, shifting robotic exploration from geometry-centric mapping to semantics-driven decision making.
RobotDancing: Residual-Action Reinforcement Learning Enables Robust Long-Horizon Humanoid Motion Tracking
Long-horizon, high-dynamic motion tracking on humanoids remains brittle because absolute joint commands cannot compensate model-plant mismatch, leading to error accumulation. We propose RobotDancing, a simple, scalable framework that predicts residual joint targets to explicitly correct dynamics discrepancies. The pipeline is end-to-end--training, sim-to-sim validation, and zero-shot sim-to-real--and uses a single-stage reinforcement learning (RL) setup with a unified observation, reward, and hyperparameter configuration. We evaluate primarily on Unitree G1 with retargeted LAFAN1 dance sequences and validate transfer on H1/H1-2. RobotDancing can track multi-minute, high-energy behaviors (jumps, spins, cartwheels) and deploys zero-shot to hardware with high motion tracking quality.
Digital Twin-Guided Robot Path Planning: A Beta-Bernoulli Fusion with Large Language Model as a Sensor
Integrating natural language (NL) prompts into robotic mission planning has attracted significant interest in recent years. In the construction domain, Building Information Models (BIM) encapsulate rich NL descriptions of the environment. We present a novel framework that fuses NL directives with BIM-derived semantic maps via a Beta-Bernoulli Bayesian fusion by interpreting the LLM as a sensor: each obstacle's design-time repulsive coefficient is treated as a Beta(alpha, beta) random variable and LLM-returned danger scores are incorporated as pseudo-counts to update alpha and beta. The resulting posterior mean yields a continuous, context-aware repulsive gain that augments a Euclidean-distance-based potential field for cost heuristics. By adjusting gains based on sentiment and context inferred from user prompts, our method guides robots along safer, more context-aware paths. This provides a numerically stable method that can chain multiple natural commands and prompts from construction workers and foreman to enable planning while giving flexibility to be integrated in any learned or classical AI framework. Simulation results demonstrate that this Beta-Bernoulli fusion yields both qualitative and quantitative improvements in path robustness and validity.
Building Information Models to Robot-Ready Site Digital Twins (BIM2RDT): An Agentic AI Safety-First Framework
The adoption of cyber-physical systems and jobsite intelligence that connects design models, real-time site sensing, and autonomous field operations can dramatically enhance digital management in the construction industry. This paper introduces BIM2RDT (Building Information Models to Robot-Ready Site Digital Twins), an agentic artificial intelligence (AI) framework designed to transform static Building Information Modeling (BIM) into dynamic, robot-ready digital twins (DTs) that prioritize safety during execution. The framework bridges the gap between pre-existing BIM data and real-time site conditions by integrating three key data streams: geometric and semantic information from BIM models, activity data from IoT sensor networks, and visual-spatial data collected by robots during site traversal. The methodology introduces Semantic-Gravity ICP (SG-ICP), a point cloud registration algorithm that leverages large language model (LLM) reasoning. Unlike traditional methods, SG-ICP utilizes an LLM to infer object-specific, plausible orientation priors based on BIM semantics, improving alignment accuracy by avoiding convergence on local minima. This creates a feedback loop where robot-collected data updates the DT, which in turn optimizes paths for missions. The framework employs YOLOE object detection and Shi-Tomasi corner detection to identify and track construction elements while using BIM geometry as a priori maps. The framework also integrates real-time Hand-Arm Vibration (HAV) monitoring, mapping sensor-detected safety events to the digital twin using IFC standards for intervention. Experiments demonstrate SG-ICP's superiority over standard ICP, achieving RMSE reductions of 64.3%--88.3% in alignment across scenarios with occluded features, ensuring plausible orientations. HAV integration triggers warnings upon exceeding exposure limits, enhancing compliance with ISO 5349-1.
Joint Flow Trajectory Optimization For Feasible Robot Motion Generation from Video Demonstrations
Learning from human video demonstrations offers a scalable alternative to teleoperation or kinesthetic teaching, but poses challenges for robot manipulators due to embodiment differences and joint feasibility constraints. We address this problem by proposing the Joint Flow Trajectory Optimization (JFTO) framework for grasp pose generation and object trajectory imitation under the video-based Learning-from-Demonstration (LfD) paradigm. Rather than directly imitating human hand motions, our method treats demonstrations as object-centric guides, balancing three objectives: (i) selecting a feasible grasp pose, (ii) generating object trajectories consistent with demonstrated motions, and (iii) ensuring collision-free execution within robot kinematics. To capture the multimodal nature of demonstrations, we extend flow matching to $\SE(3)$ for probabilistic modeling of object trajectories, enabling density-aware imitation that avoids mode collapse. The resulting optimization integrates grasp similarity, trajectory likelihood, and collision penalties into a unified differentiable objective. We validate our approach in both simulation and real-world experiments across diverse real-world manipulation tasks.
RuN: Residual Policy for Natural Humanoid Locomotion
Enabling humanoid robots to achieve natural and dynamic locomotion across a wide range of speeds, including smooth transitions from walking to running, presents a significant challenge. Existing deep reinforcement learning methods typically require the policy to directly track a reference motion, forcing a single policy to simultaneously learn motion imitation, velocity tracking, and stability maintenance. To address this, we introduce RuN, a novel decoupled residual learning framework. RuN decomposes the control task by pairing a pre-trained Conditional Motion Generator, which provides a kinematically natural motion prior, with a reinforcement learning policy that learns a lightweight residual correction to handle dynamical interactions. Experiments in simulation and reality on the Unitree G1 humanoid robot demonstrate that RuN achieves stable, natural gaits and smooth walk-run transitions across a broad velocity range (0-2.5 m/s), outperforming state-of-the-art methods in both training efficiency and final performance.
Incorporating Human-Inspired Ankle Characteristics in a Forced-Oscillation-Based Reduced-Order Model for Walking
This paper extends the forced-oscillation-based reduced-order model of walking to a model with ankles and feet. A human-inspired paradigm was designed for the ankle dynamics, which results in improved gait characteristics compared to the point-foot model. In addition, it was shown that while the proposed model can stabilize against large errors in initial conditions through combination of foot placement and ankle strategies, the model is able to stabilize against small perturbations without relying on the foot placement control and solely through the designed proprioceptive ankle scheme. This novel property, which is also observed in humans, can help in better understanding of anthropomorphic walking and its stabilization mechanisms.
RAM-NAS: Resource-aware Multiobjective Neural Architecture Search Method for Robot Vision Tasks IROS 2024
Neural architecture search (NAS) has shown great promise in automatically designing lightweight models. However, conventional approaches are insufficient in training the supernet and pay little attention to actual robot hardware resources. To meet such challenges, we propose RAM-NAS, a resource-aware multi-objective NAS method that focuses on improving the supernet pretrain and resource-awareness on robot hardware devices. We introduce the concept of subnets mutual distillation, which refers to mutually distilling all subnets sampled by the sandwich rule. Additionally, we utilize the Decoupled Knowledge Distillation (DKD) loss to enhance logits distillation performance. To expedite the search process with consideration for hardware resources, we used data from three types of robotic edge hardware to train Latency Surrogate predictors. These predictors facilitated the estimation of hardware inference latency during the search phase, enabling a unified multi-objective evolutionary search to balance model accuracy and latency trade-offs. Our discovered model family, RAM-NAS models, can achieve top-1 accuracy ranging from 76.7% to 81.4% on ImageNet. In addition, the resource-aware multi-objective NAS we employ significantly reduces the model's inference latency on edge hardware for robots. We conducted experiments on downstream tasks to verify the scalability of our methods. The inference time for detection and segmentation is reduced on all three hardware types compared to MobileNetv3-based methods. Our work fills the gap in NAS for robot hardware resource-aware.
comment: Joint first authors: Shouren Mao and Minghao Qin. Published in IEEE/RSJ IROS 2024. This arXiv version adds a joint first-authorship note to correct an omission in the IEEE Xplore version. No technical changes. Please cite the IEEE version
Efficient Construction of Implicit Surface Models From a Single Image for Motion Generation
Implicit representations have been widely applied in robotics for obstacle avoidance and path planning. In this paper, we explore the problem of constructing an implicit distance representation from a single image. Past methods for implicit surface reconstruction, such as \emph{NeuS} and its variants generally require a large set of multi-view images as input, and require long training times. In this work, we propose Fast Image-to-Neural Surface (FINS), a lightweight framework that can reconstruct high-fidelity surfaces and SDF fields based on a single or a small set of images. FINS integrates a multi-resolution hash grid encoder with lightweight geometry and color heads, making the training via an approximate second-order optimizer highly efficient and capable of converging within a few seconds. Additionally, we achieve the construction of a neural surface requiring only a single RGB image, by leveraging pre-trained foundation models to estimate the geometry inherent in the image. Our experiments demonstrate that under the same conditions, our method outperforms state-of-the-art baselines in both convergence speed and accuracy on surface reconstruction and SDF field estimation. Moreover, we demonstrate the applicability of FINS for robot surface following tasks and show its scalability to a variety of benchmark datasets.
Equi-RO: A 4D mmWave Radar Odometry via Equivariant Networks
Autonomous vehicles and robots rely on accurate odometry estimation in GPS-denied environments. While LiDARs and cameras struggle under extreme weather, 4D mmWave radar emerges as a robust alternative with all-weather operability and velocity measurement. In this paper, we introduce Equi-RO, an equivariant network-based framework for 4D radar odometry. Our algorithm pre-processes Doppler velocity into invariant node and edge features in the graph, and employs separate networks for equivariant and invariant feature processing. A graph-based architecture enhances feature aggregation in sparse radar data, improving inter-frame correspondence. Experiments on the open-source dataset and self-collected dataset show Equi-RO outperforms state-of-the-art algorithms in accuracy and robustness. Overall, our method achieves 10.7% and 20.0% relative improvements in translation and rotation accuracy, respectively, compared to the best baseline on the open-source dataset.
EEG-Driven AR-Robot System for Zero-Touch Grasping Manipulation ICRA 2026
Reliable brain-computer interface (BCI) control of robots provides an intuitive and accessible means of human-robot interaction, particularly valuable for individuals with motor impairments. However, existing BCI-Robot systems face major limitations: electroencephalography (EEG) signals are noisy and unstable, target selection is often predefined and inflexible, and most studies remain restricted to simulation without closed-loop validation. These issues hinder real-world deployment in assistive scenarios. To address them, we propose a closed-loop BCI-AR-Robot system that integrates motor imagery (MI)-based EEG decoding, augmented reality (AR) neurofeedback, and robotic grasping for zero-touch operation. A 14-channel EEG headset enabled individualized MI calibration, a smartphone-based AR interface supported multi-target navigation with direction-congruent feedback to enhance stability, and the robotic arm combined decision outputs with vision-based pose estimation for autonomous grasping. Experiments are conducted to validate the framework: MI training achieved 93.1 percent accuracy with an average information transfer rate (ITR) of 14.8 bit/min; AR neurofeedback significantly improved sustained control (SCI = 0.210) and achieved the highest ITR (21.3 bit/min) compared with static, sham, and no-AR baselines; and closed-loop grasping achieved a 97.2 percent success rate with good efficiency and strong user-reported control. These results show that AR feedback substantially stabilizes EEG-based control and that the proposed framework enables robust zero-touch grasping, advancing assistive robotic applications and future modes of human-robot interaction.
comment: 8 pages, 14 figures, submitted to ICRA 2026
Cyber Racing Coach: A Haptic Shared Control Framework for Teaching Advanced Driving Skills
This study introduces a haptic shared control framework designed to teach human drivers advanced driving skills. In this context, shared control refers to a driving mode where the human driver collaborates with an autonomous driving system to control the steering of a vehicle simultaneously. Advanced driving skills are those necessary to safely push the vehicle to its handling limits in high-performance driving such as racing and emergency obstacle avoidance. Previous research has demonstrated the performance and safety benefits of shared control schemes using both subjective and objective evaluations. However, these schemes have not been assessed for their impact on skill acquisition on complex and demanding tasks. Prior research on long-term skill acquisition either applies haptic shared control to simple tasks or employs other feedback methods like visual and auditory aids. To bridge this gap, this study creates a cyber racing coach framework based on the haptic shared control paradigm and evaluates its performance in helping human drivers acquire high-performance driving skills. The framework introduces (1) an autonomous driving system that is capable of cooperating with humans in a highly performant driving scenario; and (2) a haptic shared control mechanism along with a fading scheme to gradually reduce the steering assistance from autonomy based on the human driver's performance during training. Two benchmarks are considered: self-learning (no assistance) and full assistance during training. Results from a human subject study indicate that the proposed framework helps human drivers develop superior racing skills compared to the benchmarks, resulting in better performance and consistency.
Wonder Wins Ways: Curiosity-Driven Exploration through Multi-Agent Contextual Calibration
Autonomous exploration in complex multi-agent reinforcement learning (MARL) with sparse rewards critically depends on providing agents with effective intrinsic motivation. While artificial curiosity offers a powerful self-supervised signal, it often confuses environmental stochasticity with meaningful novelty. Moreover, existing curiosity mechanisms exhibit a uniform novelty bias, treating all unexpected observations equally. However, peer behavior novelty, which encode latent task dynamics, are often overlooked, resulting in suboptimal exploration in decentralized, communication-free MARL settings. To this end, inspired by how human children adaptively calibrate their own exploratory behaviors via observing peers, we propose a novel approach to enhance multi-agent exploration. We introduce CERMIC, a principled framework that empowers agents to robustly filter noisy surprise signals and guide exploration by dynamically calibrating their intrinsic curiosity with inferred multi-agent context. Additionally, CERMIC generates theoretically-grounded intrinsic rewards, encouraging agents to explore state transitions with high information gain. We evaluate CERMIC on benchmark suites including VMAS, Meltingpot, and SMACv2. Empirical results demonstrate that exploration with CERMIC significantly outperforms SoTA algorithms in sparse-reward environments.
Suction Leap-Hand: Suction Cups on a Multi-fingered Hand Enable Embodied Dexterity and In-Hand Teleoperation
Dexterous in-hand manipulation remains a foundational challenge in robotics, with progress often constrained by the prevailing paradigm of imitating the human hand. This anthropomorphic approach creates two critical barriers: 1) it limits robotic capabilities to tasks humans can already perform, and 2) it makes data collection for learning-based methods exceedingly difficult. Both challenges are caused by traditional force-closure which requires coordinating complex, multi-point contacts based on friction, normal force, and gravity to grasp an object. This makes teleoperated demonstrations unstable and amplifies the sim-to-real gap for reinforcement learning. In this work, we propose a paradigm shift: moving away from replicating human mechanics toward the design of novel robotic embodiments. We introduce the \textbf{S}uction \textbf{Leap}-Hand (SLeap Hand), a multi-fingered hand featuring integrated fingertip suction cups that realize a new form of suction-enabled dexterity. By replacing complex force-closure grasps with stable, single-point adhesion, our design fundamentally simplifies in-hand teleoperation and facilitates the collection of high-quality demonstration data. More importantly, this suction-based embodiment unlocks a new class of dexterous skills that are difficult or even impossible for the human hand, such as one-handed paper cutting and in-hand writing. Our work demonstrates that by moving beyond anthropomorphic constraints, novel embodiments can not only lower the barrier for collecting robust manipulation data but also enable the stable, single-handed completion of tasks that would typically require two human hands. Our webpage is https://sites.google.com/view/sleaphand.
comment: An IEEE conference paper currently under review
Learning Terrain-Specialized Policies for Adaptive Locomotion in Challenging Environments
Legged robots must exhibit robust and agile locomotion across diverse, unstructured terrains, a challenge exacerbated under blind locomotion settings where terrain information is unavailable. This work introduces a hierarchical reinforcement learning framework that leverages terrain-specialized policies and curriculum learning to enhance agility and tracking performance in complex environments. We validated our method on simulation, where our approach outperforms a generalist policy by up to 16% in success rate and achieves lower tracking errors as the velocity target increases, particularly on low-friction and discontinuous terrains, demonstrating superior adaptability and robustness across mixed-terrain scenarios.
comment: Accepted to the 22nd International Conference on Advanced Robotics (ICAR 2025). 7 pages
Towards Versatile Humanoid Table Tennis: Unified Reinforcement Learning with Prediction Augmentation
Humanoid table tennis (TT) demands rapid perception, proactive whole-body motion, and agile footwork under strict timing -- capabilities that remain difficult for unified controllers. We propose a reinforcement learning framework that maps ball-position observations directly to whole-body joint commands for both arm striking and leg locomotion, strengthened by predictive signals and dense, physics-guided rewards. A lightweight learned predictor, fed with recent ball positions, estimates future ball states and augments the policy's observations for proactive decision-making. During training, a physics-based predictor supplies precise future states to construct dense, informative rewards that lead to effective exploration. The resulting policy attains strong performance across varied serve ranges (hit rate $\geq$ 96% and success rate $\geq$ 92%) in simulations. Ablation studies confirm that both the learned predictor and the predictive reward design are critical for end-to-end learning. Deployed zero-shot on a physical Booster T1 humanoid with 23 revolute joints, the policy produces coordinated lateral and forward-backward footwork with accurate, fast returns, suggesting a practical path toward versatile, competitive humanoid TT.
Generating Stable Placements via Physics-guided Diffusion Models
Stably placing an object in a multi-object scene is a fundamental challenge in robotic manipulation, as placements must be penetration-free, establish precise surface contact, and result in a force equilibrium. To assess stability, existing methods rely on running a simulation engine or resort to heuristic, appearance-based assessments. In contrast, our approach integrates stability directly into the sampling process of a diffusion model. To this end, we query an offline sampling-based planner to gather multi-modal placement labels and train a diffusion model to generate stable placements. The diffusion model is conditioned on scene and object point clouds, and serves as a geometry-aware prior. We leverage the compositional nature of score-based generative models to combine this learned prior with a stability-aware loss, thereby increasing the likelihood of sampling from regions of high stability. Importantly, this strategy requires no additional re-training or fine-tuning, and can be directly applied to off-the-shelf models. We evaluate our method on four benchmark scenes where stability can be accurately computed. Our physics-guided models achieve placements that are 56% more robust to forceful perturbations while reducing runtime by 47% compared to a state-of-the-art geometric method.
comment: Submitted to the IEEE International Conference on Robotics and Automation 2026, Vienna, Austria, June 1-5, 2026
Real-Time Indoor Object SLAM with LLM-Enhanced Priors
Object-level Simultaneous Localization and Mapping (SLAM), which incorporates semantic information for high-level scene understanding, faces challenges of under-constrained optimization due to sparse observations. Prior work has introduced additional constraints using commonsense knowledge, but obtaining such priors has traditionally been labor-intensive and lacks generalizability across diverse object categories. We address this limitation by leveraging large language models (LLMs) to provide commonsense knowledge of object geometric attributes, specifically size and orientation, as prior factors in a graph-based SLAM framework. These priors are particularly beneficial during the initial phase when object observations are limited. We implement a complete pipeline integrating these priors, achieving robust data association on sparse object-level features and enabling real-time object SLAM. Our system, evaluated on the TUM RGB-D and 3RScan datasets, improves mapping accuracy by 36.8\% over the latest baseline. Additionally, we present real-world experiments in the supplementary video, demonstrating its real-time performance.
Autonomous UAV-Quadruped Docking in Complex Terrains via Active Posture Alignment and Constraint-Aware Control
Autonomous docking between Unmanned Aerial Vehicles (UAVs) and ground robots is essential for heterogeneous systems, yet most existing approaches target wheeled platforms whose limited mobility constrains exploration in complex terrains. Quadruped robots offer superior adaptability but undergo frequent posture variations, making it difficult to provide a stable landing surface for UAVs. To address these challenges, we propose an autonomous UAV-quadruped docking framework for GPS-denied environments. On the quadruped side, a Hybrid Internal Model with Horizontal Alignment (HIM-HA), learned via deep reinforcement learning, actively stabilizes the torso to provide a level platform. On the UAV side, a three-phase strategy is adopted, consisting of long-range acquisition with a median-filtered YOLOv8 detector, close-range tracking with a constraint-aware controller that integrates a Nonsingular Fast Terminal Sliding Mode Controller (NFTSMC) and a logarithmic Barrier Function (BF) to guarantee finite-time error convergence under field-of-view (FOV) constraints, and terminal descent guided by a Safety Period (SP) mechanism that jointly verifies tracking accuracy and platform stability. The proposed framework is validated in both simulation and real-world scenarios, successfully achieving docking on outdoor staircases higher than 17 cm and rough slopes steeper than 30 degrees. Supplementary materials and videos are available at: https://uav-quadruped-docking.github.io.
PL-VIWO2: A Lightweight, Fast and Robust Visual-Inertial-Wheel Odometry Using Points and Lines
Vision-based odometry has been widely adopted in autonomous driving owing to its low cost and lightweight setup; however, its performance often degrades in complex outdoor urban environments. To address these challenges, we propose PL-VIWO2, a filter-based visual-inertial-wheel odometry system that integrates an IMU, wheel encoder, and camera (supporting both monocular and stereo) for long-term robust state estimation. The main contributions are: (i) a novel line feature processing framework that exploits the geometric relationship between 2D feature points and lines, enabling fast and robust line tracking and triangulation while ensuring real-time performance; (ii) an SE(2)-constrained SE(3) wheel pre-integration method that leverages the planar motion characteristics of ground vehicles for accurate wheel updates; and (iii) an efficient motion consistency check (MCC) that filters out dynamic features by jointly using IMU and wheel measurements. Extensive experiments on Monte Carlo simulations and public autonomous driving datasets demonstrate that PL-VIWO2 outperforms state-of-the-art methods in terms of accuracy, efficiency, and robustness.
comment: 16 pages
Plan2Evolve: LLM Self-Evolution for Improved Planning Capability via Automated Domain Generation
Large Language Models (LLMs) have recently shown strong potential in robotic task planning, particularly through automatic planning domain generation that integrates symbolic search. Prior approaches, however, have largely treated these domains as search utilities, with limited attention to their potential as scalable sources of reasoning data. At the same time, progress in reasoning LLMs has been driven by chain-of-thought (CoT) supervision, whose application in robotics remains dependent on costly, human-curated datasets. We propose Plan2Evolve, an LLM self-evolving framework in which the base model generates planning domains that serve as engines for producing symbolic problem-plan pairs as reasoning traces. These pairs are then transformed into extended CoT trajectories by the same model through natural-language explanations, thereby explicitly aligning symbolic planning structures with natural language reasoning. The resulting data extend beyond the model's intrinsic planning capacity, enabling model fine-tuning that yields a planning-enhanced LLM with improved planning success, stronger cross-task generalization, and reduced inference costs.
comment: 25 pages, 7 figures
DroneFL: Federated Learning for Multi-UAV Visual Target Tracking
Multi-robot target tracking is a fundamental problem that requires coordinated monitoring of dynamic entities in applications such as precision agriculture, environmental monitoring, disaster response, and security surveillance. While Federated Learning (FL) has the potential to enhance learning across multiple robots without centralized data aggregation, its use in multi-Unmanned Aerial Vehicle (UAV) target tracking remains largely underexplored. Key challenges include limited onboard computational resources, significant data heterogeneity in FL due to varying targets and the fields of view, and the need for tight coupling between trajectory prediction and multi-robot planning. In this paper, we introduce DroneFL, the first federated learning framework specifically designed for efficient multi-UAV target tracking. We design a lightweight local model to predict target trajectories from sensor inputs, using a frozen YOLO backbone and a shallow transformer for efficient onboard training. The updated models are periodically aggregated in the cloud for global knowledge sharing. To alleviate the data heterogeneity that hinders FL convergence, DroneFL introduces a position-invariant model architecture with altitude-based adaptive instance normalization. Finally, we fuse predictions from multiple UAVs in the cloud and generate optimal trajectories that balance target prediction accuracy and overall tracking performance. Our results show that DroneFL reduces prediction error by 6%-83% and tracking distance by 0.4%-4.6% compared to a distributed non-FL framework. In terms of efficiency, DroneFL runs in real time on a Raspberry Pi 5 and has on average just 1.56 KBps data rate to the cloud.
Wall Inspector: Quadrotor Control in Wall-proximity Through Model Compensation
The safe operation of quadrotors in near-wall urban or indoor environments (e.g., inspection and search-and-rescue missions) is challenged by unmodeled aerodynamic effects arising from wall-proximity. It generates complex vortices that induce destabilizing suction forces, potentially leading to hazardous vibrations or collisions. This paper presents a comprehensive solution featuring (1) a physics-based suction force model that explicitly characterizes the dependency on both rotor speed and wall distance, and (2) a suction-compensated model predictive control (SC-MPC) framework designed to ensure accurate and stable trajectory tracking during wall-proximity operations. The proposed SC-MPC framework incorporates an enhanced dynamics model that accounts for suction force effects, formulated as a factor graph optimization problem integrating system dynamics constraints, trajectory tracking objectives, control input smoothness requirements, and actuator physical limitations. The suction force model parameters are systematically identified through extensive experimental measurements across varying operational conditions. Experimental validation demonstrates SC-MPC's superior performance, achieving 2.1 cm root mean squared error (RMSE) in X-axis and 2.0 cm RMSE in Y-axis position control - representing 74% and 79% improvements over cascaded proportional-integral-derivative (PID) control, and 60% and 53% improvements over standard MPC respectively. The corresponding mean absolute error (MAE) metrics (1.2 cm X-axis, 1.4 cm Y-axis) similarly outperform both baselines. The evaluation platform employs a ducted quadrotor design that provides collision protection while maintaining aerodynamic efficiency. To facilitate reproducibility and community adoption, we have open-sourced our complete implementation, available at https://anonymous.4open.science/r/SC-MPC-6A61.
Residual Vector Quantization For Communication-Efficient Multi-Agent Perception
Multi-agent collaborative perception (CP) improves scene understanding by sharing information across connected agents such as autonomous vehicles, unmanned aerial vehicles, and robots. Communication bandwidth, however, constrains scalability. We present ReVQom, a learned feature codec that preserves spatial identity while compressing intermediate features. ReVQom is an end-to-end method that compresses feature dimensions via a simple bottleneck network followed by multi-stage residual vector quantization (RVQ). This allows only per-pixel code indices to be transmitted, reducing payloads from 8192 bits per pixel (bpp) of uncompressed 32-bit float features to 6-30 bpp per agent with minimal accuracy loss. On DAIR-V2X real-world CP dataset, ReVQom achieves 273x compression at 30 bpp to 1365x compression at 6 bpp. At 18 bpp (455x), ReVQom matches or outperforms raw-feature CP, and at 6-12 bpp it enables ultra-low-bandwidth operation with graceful degradation. ReVQom allows efficient and accurate multi-agent collaborative perception with a step toward practical V2X deployment.
comment: 5 pages
Developing a Mono-Actuated Compliant GeoGami Robot
This paper presents the design of a new soft-rigid robotic platform, "GeoGami". We leverage origami surface capabilities to achieve shape contraction and to support locomotion with underactuated forms. A key challenge is that origami surfaces have high degrees of freedom and typically require many actuators; we address repeatability by integrating surface compliance. We propose a mono-actuated GeoGami mobile platform that combines origami surface compliance with a geometric compliant skeleton, enabling the robot to transform and locomote using a single actuator. We demonstrate the robot, develop a stiffness model, and describe the central gearbox mechanism. We also analyze alternative cable-driven actuation methods for the skeleton to enable surface transformation. Finally, we evaluate the GeoGami platform for capabilities, including shape transformation and rolling. This platform opens new capabilities for robots that change shape to access different environments and that use shape transformation for locomotion.
comment: 8 pages, 12 figures, under-review
HL-IK: A Lightweight Implementation of Human-Like Inverse Kinematics in Humanoid Arms
Traditional IK methods for redundant humanoid manipulators emphasize end-effector (EE) tracking, frequently producing configurations that are valid mechanically but not human-like. We present Human-Like Inverse Kinematics (HL-IK), a lightweight IK framework that preserves EE tracking while shaping whole-arm configurations to appear human-like, without full-body sensing at runtime. The key idea is a learned elbow prior: using large-scale human motion data retargeted to the robot, we train a FiLM-modulated spatio-temporal attention network (FiSTA) to predict the next-step elbow pose from the EE target and a short history of EE-elbow states.This prediction is incorporated as a small residual alongside EE and smoothness terms in a standard Levenberg-Marquardt optimizer, making HL-IK a drop-in addition to numerical IK stacks. Over 183k simulation steps, HL-IK reduces arm-similarity position and direction error by 30.6% and 35.4% on average, and by 42.2% and 47.4% on the most challenging trajectories. Hardware teleoperation on a robot distinct from simulation further confirms the gains in anthropomorphism. HL-IK is simple to integrate, adaptable across platforms via our pipeline, and adds minimal computation, enabling human-like motions for humanoid robots. Project page: https://hl-ik.github.io/
Hyperspectral Adapter for Semantic Segmentation with Vision Foundation Models
Hyperspectral imaging (HSI) captures spatial information along with dense spectral measurements across numerous narrow wavelength bands. This rich spectral content has the potential to facilitate robust robotic perception, particularly in environments with complex material compositions, varying illumination, or other visually challenging conditions. However, current HSI semantic segmentation methods underperform due to their reliance on architectures and learning frameworks optimized for RGB inputs. In this work, we propose a novel hyperspectral adapter that leverages pretrained vision foundation models to effectively learn from hyperspectral data. Our architecture incorporates a spectral transformer and a spectrum-aware spatial prior module to extract rich spatial-spectral features. Additionally, we introduce a modality-aware interaction block that facilitates effective integration of hyperspectral representations and frozen vision Transformer features through dedicated extraction and injection mechanisms. Extensive evaluations on three benchmark autonomous driving datasets demonstrate that our architecture achieves state-of-the-art semantic segmentation performance while directly using HSI inputs, outperforming both vision-based and hyperspectral segmentation methods. We make the code available at https://hsi-adapter.cs.uni-freiburg.de.
GUIDE: A Diffusion-Based Autonomous Robot Exploration Framework Using Global Graph Inference
Autonomous exploration in structured and complex indoor environments remains a challenging task, as existing methods often struggle to appropriately model unobserved space and plan globally efficient paths. To address these limitations, we propose GUIDE, a novel exploration framework that synergistically combines global graph inference with diffusion-based decision-making. We introduce a region-evaluation global graph representation that integrates both observed environmental data and predictions of unexplored areas, enhanced by a region-level evaluation mechanism to prioritize reliable structural inferences while discounting uncertain predictions. Building upon this enriched representation, a diffusion policy network generates stable, foresighted action sequences with significantly reduced denoising steps. Extensive simulations and real-world deployments demonstrate that GUIDE consistently outperforms state-of-the-art methods, achieving up to 18.3% faster coverage completion and a 34.9% reduction in redundant movements.
Online Language Splatting
To enable AI agents to interact seamlessly with both humans and 3D environments, they must not only perceive the 3D world accurately but also align human language with 3D spatial representations. While prior work has made significant progress by integrating language features into geometrically detailed 3D scene representations using 3D Gaussian Splatting (GS), these approaches rely on computationally intensive offline preprocessing of language features for each input image, limiting adaptability to new environments. In this work, we introduce Online Language Splatting, the first framework to achieve online, near real-time, open-vocabulary language mapping within a 3DGS-SLAM system without requiring pre-generated language features. The key challenge lies in efficiently fusing high-dimensional language features into 3D representations while balancing the computation speed, memory usage, rendering quality and open-vocabulary capability. To this end, we innovatively design: (1) a high-resolution CLIP embedding module capable of generating detailed language feature maps in 18ms per frame, (2) a two-stage online auto-encoder that compresses 768-dimensional CLIP features to 15 dimensions while preserving open-vocabulary capabilities, and (3) a color-language disentangled optimization approach to improve rendering quality. Experimental results show that our online method not only surpasses the state-of-the-art offline methods in accuracy but also achieves more than 40x efficiency boost, demonstrating the potential for dynamic and interactive AI applications.
HUNT: High-Speed UAV Navigation and Tracking in Unstructured Environments via Instantaneous Relative Frames
Search and rescue operations require unmanned aerial vehicles to both traverse unknown unstructured environments at high speed and track targets once detected. Achieving both capabilities under degraded sensing and without global localization remains an open challenge. Recent works on relative navigation have shown robust tracking by anchoring planning and control to a visible detected object, but cannot address navigation when no target is in the field of view. We present HUNT (High-speed UAV Navigation and Tracking), a real-time framework that unifies traversal, acquisition, and tracking within a single relative formulation. HUNT defines navigation objectives directly from onboard instantaneous observables such as attitude, altitude, and velocity, enabling reactive high-speed flight during search. Once a target is detected, the same perception-control pipeline transitions seamlessly to tracking. Outdoor experiments in dense forests, container compounds, and search-and-rescue operations with vehicles and mannequins demonstrate robust autonomy where global methods fail.
Occlusion-Aware Consistent Model Predictive Control for Robot Navigation in Occluded Obstacle-Dense Environments
Ensuring safety and motion consistency for robot navigation in occluded, obstacle-dense environments is a critical challenge. In this context, this study presents an occlusion-aware Consistent Model Predictive Control (CMPC) strategy. To account for the occluded obstacles, it incorporates adjustable risk regions that represent their potential future locations. Subsequently, dynamic risk boundary constraints are developed online to ensure safety.The CMPC then constructs multiple locally optimal trajectory branches (each tailored to different risk regions) to strike a balance between safety and performance. A shared consensus segment is generated to ensure smooth transitions between branches without significant velocity fluctuations, further preserving motion consistency. To facilitate high computational efficiency and ensure coordination across local trajectories, we use the alternating direction method of multipliers (ADMM) to decompose the CMPC into manageable sub-problems for parallel solving. The proposed strategy is validated through simulations and real-world experiments on an Ackermann-steering robot platform. The results demonstrate the effectiveness of the proposed CMPC strategy through comparisons with baseline approaches in occluded, obstacle-dense environments.
V2V-GoT: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multimodal Large Language Models and Graph-of-Thoughts
Current state-of-the-art autonomous vehicles could face safety-critical situations when their local sensors are occluded by large nearby objects on the road. Vehicle-to-vehicle (V2V) cooperative autonomous driving has been proposed as a means of addressing this problem, and one recently introduced framework for cooperative autonomous driving has further adopted an approach that incorporates a Multimodal Large Language Model (MLLM) to integrate cooperative perception and planning processes. However, despite the potential benefit of applying graph-of-thoughts reasoning to the MLLM, this idea has not been considered by previous cooperative autonomous driving research. In this paper, we propose a novel graph-of-thoughts framework specifically designed for MLLM-based cooperative autonomous driving. Our graph-of-thoughts includes our proposed novel ideas of occlusion-aware perception and planning-aware prediction. We curate the V2V-GoT-QA dataset and develop the V2V-GoT model for training and testing the cooperative driving graph-of-thoughts. Our experimental results show that our method outperforms other baselines in cooperative perception, prediction, and planning tasks. Our project website: https://eddyhkchiu.github.io/v2vgot.github.io/ .
comment: Our project website: https://eddyhkchiu.github.io/v2vgot.github.io/
Pure Vision Language Action (VLA) Models: A Comprehensive Survey
The emergence of Vision Language Action (VLA) models marks a paradigm shift from traditional policy-based control to generalized robotics, reframing Vision Language Models (VLMs) from passive sequence generators into active agents for manipulation and decision-making in complex, dynamic environments. This survey delves into advanced VLA methods, aiming to provide a clear taxonomy and a systematic, comprehensive review of existing research. It presents a comprehensive analysis of VLA applications across different scenarios and classifies VLA approaches into several paradigms: autoregression-based, diffusion-based, reinforcement-based, hybrid, and specialized methods; while examining their motivations, core strategies, and implementations in detail. In addition, foundational datasets, benchmarks, and simulation platforms are introduced. Building on the current VLA landscape, the review further proposes perspectives on key challenges and future directions to advance research in VLA models and generalizable robotics. By synthesizing insights from over three hundred recent studies, this survey maps the contours of this rapidly evolving field and highlights the opportunities and challenges that will shape the development of scalable, general-purpose VLA methods.
Growing with Your Embodied Agent: A Human-in-the-Loop Lifelong Code Generation Framework for Long-Horizon Manipulation Skills
Large language models (LLMs)-based code generation for robotic manipulation has recently shown promise by directly translating human instructions into executable code, but existing methods remain noisy, constrained by fixed primitives and limited context windows, and struggle with long-horizon tasks. While closed-loop feedback has been explored, corrected knowledge is often stored in improper formats, restricting generalization and causing catastrophic forgetting, which highlights the need for learning reusable skills. Moreover, approaches that rely solely on LLM guidance frequently fail in extremely long-horizon scenarios due to LLMs' limited reasoning capability in the robotic domain, where such issues are often straightforward for humans to identify. To address these challenges, we propose a human-in-the-loop framework that encodes corrections into reusable skills, supported by external memory and Retrieval-Augmented Generation with a hint mechanism for dynamic reuse. Experiments on Ravens, Franka Kitchen, and MetaWorld, as well as real-world settings, show that our framework achieves a 0.93 success rate (up to 27% higher than baselines) and a 42% efficiency improvement in correction rounds. It can robustly solve extremely long-horizon tasks such as "build a house", which requires planning over 20 primitives.
comment: update fig 1, typo correction - v2
GVDepth: Zero-Shot Monocular Depth Estimation for Ground Vehicles based on Probabilistic Cue Fusion ICCV 2025
Generalizing metric monocular depth estimation presents a significant challenge due to its ill-posed nature, while the entanglement between camera parameters and depth amplifies issues further, hindering multi-dataset training and zero-shot accuracy. This challenge is particularly evident in autonomous vehicles and mobile robotics, where data is collected with fixed camera setups, limiting the geometric diversity. Yet, this context also presents an opportunity: the fixed relationship between the camera and the ground plane imposes additional perspective geometry constraints, enabling depth regression via vertical image positions of objects. However, this cue is highly susceptible to overfitting, thus we propose a novel canonical representation that maintains consistency across varied camera setups, effectively disentangling depth from specific parameters and enhancing generalization across datasets. We also propose a novel architecture that adaptively and probabilistically fuses depths estimated via object size and vertical image position cues. A comprehensive evaluation demonstrates the effectiveness of the proposed approach on five autonomous driving datasets, achieving accurate metric depth estimation for varying resolutions, aspect ratios and camera setups. Notably, we achieve comparable accuracy to existing zero-shot methods, despite training on a single dataset with a single-camera setup. Project website: https://unizgfer-lamor.github.io/gvdepth/
comment: ICCV 2025
Label-Efficient Grasp Joint Prediction with Point-JEPA IROS 2025
We study whether 3D self-supervised pretraining with Point--JEPA enables label-efficient grasp joint-angle prediction. Meshes are sampled to point clouds and tokenized; a ShapeNet-pretrained Point--JEPA encoder feeds a $K{=}5$ multi-hypothesis head trained with winner-takes-all and evaluated by top--logit selection. On a multi-finger hand dataset with strict object-level splits, Point--JEPA improves top--logit RMSE and Coverage@15$^{\circ}$ in low-label regimes (e.g., 26% lower RMSE at 25% data) and reaches parity at full supervision, suggesting JEPA-style pretraining is a practical lever for data-efficient grasp learning.
comment: 4 pages, 5 figures. Submitted to IROS 2025 Workshop
Constrained Decoding for Robotics Foundation Models
Recent advances in the development of robotic foundation models have led to promising end-to-end and general-purpose capabilities in robotic systems. These models are pretrained on vast datasets of robot trajectories to process multi-modal inputs and directly output a sequence of action that the system then executes in the real world. Although this approach is attractive from the perspective of improved generalization across diverse tasks, these models are still data-driven and, therefore, lack explicit notions of behavioral correctness and safety constraints. We address these limitations by introducing a constrained decoding framework for robotics foundation models that enforces logical constraints on action trajectories in dynamical systems. Our method ensures that generated actions provably satisfy signal temporal logic (STL) specifications at runtime without retraining, while remaining agnostic of the underlying foundation model. We perform comprehensive evaluation of our approach across state-of-the-art navigation foundation models and we show that our decoding-time interventions are useful not only for filtering unsafe actions but also for conditional action-generation. Videos available on our website: https://constrained-robot-fms.github.io
Aegis: Automated Error Generation and Identification for Multi-Agent Systems
As Multi-Agent Systems (MAS) become increasingly autonomous and complex, understanding their error modes is critical for ensuring their reliability and safety. However, research in this area has been severely hampered by the lack of large-scale, diverse datasets with precise, ground-truth error labels. To address this bottleneck, we introduce \textbf{AEGIS}, a novel framework for \textbf{A}utomated \textbf{E}rror \textbf{G}eneration and \textbf{I}dentification for Multi-Agent \textbf{S}ystems. By systematically injecting controllable and traceable errors into initially successful trajectories, we create a rich dataset of realistic failures. This is achieved using a context-aware, LLM-based adaptive manipulator that performs sophisticated attacks like prompt injection and response corruption to induce specific, predefined error modes. We demonstrate the value of our dataset by exploring three distinct learning paradigms for the error identification task: Supervised Fine-Tuning, Reinforcement Learning, and Contrastive Learning. Our comprehensive experiments show that models trained on AEGIS data achieve substantial improvements across all three learning paradigms. Notably, several of our fine-tuned models demonstrate performance competitive with or superior to proprietary systems an order of magnitude larger, validating our automated data generation framework as a crucial resource for developing more robust and interpretable multi-agent systems. Our project website is available at https://kfq20.github.io/AEGIS-Website.
Contact-Aware Safety in Soft Robots Using High-Order Control Barrier and Lyapunov Functions
Robots operating alongside people, particularly in sensitive scenarios such as aiding the elderly with daily tasks or collaborating with workers in manufacturing, must guarantee safety and cultivate user trust. Continuum soft manipulators promise safety through material compliance, but as designs evolve for greater precision, payload capacity, and speed, and increasingly incorporate rigid elements, their injury risk resurfaces. In this letter, we introduce a comprehensive High-Order Control Barrier Function (HOCBF) + High-Order Control Lyapunov Function (HOCLF) framework that enforces strict contact force limits across the entire soft-robot body during environmental interactions. Our approach combines a differentiable Piecewise Cosserat-Segment (PCS) dynamics model with a convex-polygon distance approximation metric, named Differentiable Conservative Separating Axis Theorem (DCSAT), based on the soft robot geometry to enable real-time, whole-body collision detection, resolution, and enforcement of the safety constraints. By embedding HOCBFs into our optimization routine, we guarantee safety, allowing, for instance, safe navigation in operational space under HOCLF-driven motion objectives. Extensive planar simulations demonstrate that our method maintains safety-bounded contacts while achieving precise shape and task-space regulation. This work thus lays a foundation for the deployment of soft robots in human-centric environments with provable safety and performance.
comment: 8 pages
Security of Deep Reinforcement Learning for Autonomous Driving: A Survey
Reinforcement learning (RL) enables agents to learn optimal behaviors through interaction with their environment and has been increasingly deployed in safety-critical applications, including autonomous driving. Despite its promise, RL is susceptible to attacks designed either to compromise policy learning or to induce erroneous decisions by trained agents. Although the literature on RL security has grown rapidly and several surveys exist, existing categorizations often fall short in guiding the selection of appropriate defenses for specific systems. In this work, we present a comprehensive survey of 86 recent studies on RL security, addressing these limitations by systematically categorizing attacks and defenses according to defined threat models and single- versus multi-agent settings. Furthermore, we examine the relevance and applicability of state-of-the-art attacks and defense mechanisms within the context of autonomous driving, providing insights to inform the design of robust RL systems.
SmallPlan: Leverage Small Language Models for Sequential Path Planning with Simulation-Powered, LLM-Guided Distillation
Efficient path planning in robotics, particularly within large-scale, complex environments, remains a significant hurdle. While Large Language Models (LLMs) offer strong reasoning capabilities, their high computational cost and limited adaptability hinder real-time deployment on edge devices. We present SmallPlan - a novel framework leveraging LLMs as teacher models to train lightweight Small Language Models (SLMs) for high-level path planning tasks. In SmallPlan, the SLMs provide optimal action sequences to navigate across scene graphs that compactly represent full-scaled 3D scenes. The SLMs are trained in a simulation-powered, interleaved manner with LLM-guided supervised fine-tuning (SFT) and reinforcement learning (RL). This strategy not only enables SLMs to successfully complete navigation tasks but also makes them aware of important factors like distance travel, providing more efficient path planning. Through experiments, we demonstrate that the fine-tuned SLMs perform competitively with larger models like GPT-4o on sequential path planning, without suffering from hallucination and overfitting. SmallPlan is resource-efficient, making it well-suited for edge-device deployment and advancing practical autonomous robotics. Our source code is available here: https://github.com/quangpham2006/SmallPlan
comment: Paper is under review
villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models
Vision-Language-Action (VLA) models have emerged as a popular paradigm for learning robot manipulation policies that can follow language instructions and generalize to novel scenarios. Recent works have begun to explore the incorporation of latent actions, abstract representations of motion between two frames, into VLA pre-training. In this paper, we introduce villa-X, a novel Vision-Language-Latent-Action (ViLLA) framework that advances latent action modeling for learning generalizable robot manipulation policies. Our approach improves both how latent actions are learned and how they are incorporated into VLA pre-training. We demonstrate that villa-X can generate latent action plans in a zero-shot fashion, even for unseen embodiments and open-vocabulary symbolic understanding. This capability enables villa-X to achieve superior performance across diverse simulation tasks in SIMPLER and on two real-world robotic setups involving both gripper and dexterous hand manipulation. These results establish villa-X as a principled and scalable paradigm for learning generalizable robot manipulation policies. We believe it provides a strong foundation for future research.
comment: Project page: https://aka.ms/villa-x
DyDexHandover: Human-like Bimanual Dynamic Dexterous Handover using RGB-only Perception
Dynamic in air handover is a fundamental challenge for dual-arm robots, requiring accurate perception, precise coordination, and natural motion. Prior methods often rely on dynamics models, strong priors, or depth sensing, limiting generalization and naturalness. We present DyDexHandover, a novel framework that employs multi-agent reinforcement learning to train an end to end RGB based policy for bimanual object throwing and catching. To achieve more human-like behavior, the throwing policy is guided by a human policy regularization scheme, encouraging fluid and natural motion, and enhancing the generalization capability of the policy. A dual arm simulation environment was built in Isaac Sim for experimental evaluation. DyDexHandover achieves nearly 99 percent success on training objects and 75 percent on unseen objects, while generating human-like throwing and catching behaviors. To our knowledge, it is the first method to realize dual-arm in-air handover using only raw RGB perception.
comment: 8 pages, 7 figures
Stairway to Success: An Online Floor-Aware Zero-Shot Object-Goal Navigation Framework via LLM-Driven Coarse-to-Fine Exploration
Deployable service and delivery robots struggle to navigate multi-floor buildings to reach object goals, as existing systems fail due to single-floor assumptions and requirements for offline, globally consistent maps. Multi-floor environments pose unique challenges including cross-floor transitions and vertical spatial reasoning, especially navigating unknown buildings. Object-Goal Navigation benchmarks like HM3D and MP3D also capture this multi-floor reality, yet current methods lack support for online, floor-aware navigation. To bridge this gap, we propose \textbf{\textit{ASCENT}}, an online framework for Zero-Shot Object-Goal Navigation that enables robots to operate without pre-built maps or retraining on new object categories. It introduces: (1) a \textbf{Multi-Floor Abstraction} module that dynamically constructs hierarchical representations with stair-aware obstacle mapping and cross-floor topology modeling, and (2) a \textbf{Coarse-to-Fine Reasoning} module that combines frontier ranking with LLM-driven contextual analysis for multi-floor navigation decisions. We evaluate on HM3D and MP3D benchmarks, outperforming state-of-the-art zero-shot approaches, and demonstrate real-world deployment on a quadruped robot.
comment: Preprint; 9 pages, 9 figures, 8 tables; Project Page at https://zeying-gong.github.io/projects/ascent
Real-Time Out-of-Distribution Failure Prevention via Multi-Modal Reasoning
While foundation models offer promise toward improving robot safety in out-of-distribution (OOD) scenarios, how to effectively harness their generalist knowledge for real-time, dynamically feasible response remains a crucial problem. We present FORTRESS, a joint reasoning and planning framework that generates semantically safe fallback strategies to prevent safety-critical, OOD failures. At a low frequency under nominal operation, FORTRESS uses multi-modal foundation models to anticipate possible failure modes and identify safe fallback sets. When a runtime monitor triggers a fallback response, FORTRESS rapidly synthesizes plans to fallback goals while inferring and avoiding semantically unsafe regions in real time. By bridging open-world, multi-modal reasoning with dynamics-aware planning, we eliminate the need for hard-coded fallbacks and human safety interventions. FORTRESS outperforms on-the-fly prompting of slow reasoning models in safety classification accuracy on synthetic benchmarks and real-world ANYmal robot data, and further improves system safety and planning success in simulation and on quadrotor hardware for urban navigation. Website can be found at https://milanganai.github.io/fortress.
comment: Conference on Robot Learning (CoRL) 2025 (Oral)
EC-Diffuser: Multi-Object Manipulation via Entity-Centric Behavior Generation
Object manipulation is a common component of everyday tasks, but learning to manipulate objects from high-dimensional observations presents significant challenges. These challenges are heightened in multi-object environments due to the combinatorial complexity of the state space as well as of the desired behaviors. While recent approaches have utilized large-scale offline data to train models from pixel observations, achieving performance gains through scaling, these methods struggle with compositional generalization in unseen object configurations with constrained network and dataset sizes. To address these issues, we propose a novel behavioral cloning (BC) approach that leverages object-centric representations and an entity-centric Transformer with diffusion-based optimization, enabling efficient learning from offline image data. Our method first decomposes observations into an object-centric representation, which is then processed by our entity-centric Transformer that computes attention at the object level, simultaneously predicting object dynamics and the agent's actions. Combined with the ability of diffusion models to capture multi-modal behavior distributions, this results in substantial performance improvements in multi-object tasks and, more importantly, enables compositional generalization. We present BC agents capable of zero-shot generalization to tasks with novel compositions of objects and goals, including larger numbers of objects than seen during training. We provide video rollouts on our webpage: https://sites.google.com/view/ec-diffuser.
LeVERB: Humanoid Whole-Body Control with Latent Vision-Language Instruction
Vision-language-action (VLA) models have demonstrated strong semantic understanding and zero-shot generalization, yet most existing systems assume an accurate low-level controller with hand-crafted action "vocabulary" such as end-effector pose or root velocity. This assumption confines prior work to quasi-static tasks and precludes the agile, whole-body behaviors required by humanoid whole-body control (WBC) tasks. To capture this gap in the literature, we start by introducing the first sim-to-real-ready, vision-language, closed-loop benchmark for humanoid WBC, comprising over 150 tasks from 10 categories. We then propose LeVERB: Latent Vision-Language-Encoded Robot Behavior, a hierarchical latent instruction-following framework for humanoid vision-language WBC, the first of its kind. At the top level, a vision-language policy learns a latent action vocabulary from synthetically rendered kinematic demonstrations; at the low level, a reinforcement-learned WBC policy consumes these latent verbs to generate dynamics-level commands. In our benchmark, LeVERB can zero-shot attain a 80% success rate on simple visual navigation tasks, and 58.5% success rate overall, outperforming naive hierarchical whole-body VLA implementation by 7.8 times.
comment: https://ember-lab-berkeley.github.io/LeVERB-Website/
Omni-Roach: A legged robot capable of traversing multiple types of large obstacles and self-righting
Robots excel at avoiding obstacles but struggle to traverse complex 3-D terrain with cluttered large obstacles. By contrast, insects like cockroaches excel at doing so. Recent research in our lab elucidated how locomotor transitions emerge from locomotor-environment interaction for diverse locomotor challenges abstracted from complex 3-D terrain and the strategies to overcome them. Here we built on these fundamental insights to develop a cockroach-inspired legged robot, Omni-Roach, that integrated these strategies to achieve multi-modal locomotion and provide a robophysical model to study the trade-off between multi-functionality and performance. The robot was based on the RHex design with six compliant legs and featured a rounded body with two wings that can open and a tail with pitch and yaw degrees of freedom. After two development and testing iterations, our robot was capable of overcoming all locomotor challenges with a high performance and success rate. It traversed cluttered rigid pillars only 1.1x robot body width apart, a 2.5x hip height bump, a 0.75x body length gap, densely cluttered flexible beams only 65% body width apart, and self-righted within 4 seconds. Systematic beam traversal experiments further revealed that a downward-pointing tail oscillating laterally helps roll the body into beam gaps and break frictional and interlocking contact to traverse. Our work highlights the usefulness of multi-functional appendages and exaptation for large obstacle traversal.
SenSnake: A snake robot with contact force sensing for studying locomotion in complex 3-D terrain
Despite advances in a diversity of environments, snake robots are still far behind snakes in traversing complex 3-D terrain with large obstacles. This is due to a lack of understanding of how to control 3-D body bending to push against terrain features to generate and control propulsion. Biological studies suggested that generalist snakes use contact force sensing to adjust body bending in real time to do so. However, studying this sensory-modulated force control in snakes is challenging, due to a lack of basic knowledge of how their force sensing organs work. Here, we take a robophysics approach to make progress, starting by developing a snake robot capable of 3-D body bending with contact force sensing to enable systematic locomotion experiments and force measurements. Through two development and testing iterations, we created a 12-segment robot with 36 piezo-resistive sheet sensors distributed on all segments with compliant shells with a sampling frequency of 30 Hz. The robot measured contact forces while traversing a large obstacle using vertical bending with high repeatability, achieving the goal of providing a platform for systematic experiments. Finally, we explored model-based calibration considering the viscoelastic behavior of the piezo-resistive sensor, which will for useful for future studies.
Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering
As robots become increasingly capable of operating over extended periods -- spanning days, weeks, and even months -- they are expected to accumulate knowledge of their environments and leverage this experience to assist humans more effectively. This paper studies the problem of Long-term Active Embodied Question Answering (LA-EQA), a new task in which a robot must both recall past experiences and actively explore its environment to answer complex, temporally-grounded questions. Unlike traditional EQA settings, which typically focus either on understanding the present environment alone or on recalling a single past observation, LA-EQA challenges an agent to reason over past, present, and possible future states, deciding when to explore, when to consult its memory, and when to stop gathering observations and provide a final answer. Standard EQA approaches based on large models struggle in this setting due to limited context windows, absence of persistent memory, and an inability to combine memory recall with active exploration. To address this, we propose a structured memory system for robots, inspired by the mind palace method from cognitive science. Our method encodes episodic experiences as scene-graph-based world instances, forming a reasoning and planning algorithm that enables targeted memory retrieval and guided navigation. To balance the exploration-recall trade-off, we introduce value-of-information-based stopping criteria that determines when the agent has gathered sufficient information. We evaluate our method on real-world experiments and introduce a new benchmark that spans popular simulation environments and actual industrial sites. Our approach significantly outperforms state-of-the-art baselines, yielding substantial gains in both answer accuracy and exploration efficiency.
Adaptive Diffusion Constrained Sampling for Bimanual Robot Manipulation
Coordinated multi-arm manipulation requires satisfying multiple simultaneous geometric constraints across high-dimensional configuration spaces, which poses a significant challenge for traditional planning and control methods. In this work, we propose Adaptive Diffusion Constrained Sampling (ADCS), a generative framework that flexibly integrates both equality (e.g., relative and absolute pose constraints) and structured inequality constraints (e.g., proximity to object surfaces) into an energy-based diffusion model. Equality constraints are modeled using dedicated energy networks trained on pose differences in Lie algebra space, while inequality constraints are represented via Signed Distance Functions (SDFs) and encoded into learned constraint embeddings, allowing the model to reason about complex spatial regions. A key innovation of our method is a Transformer-based architecture that learns to weight constraint-specific energy functions at inference time, enabling flexible and context-aware constraint integration. Moreover, we adopt a two-phase sampling strategy that improves precision and sample diversity by combining Langevin dynamics with resampling and density-aware re-weighting. Experimental results on dual-arm manipulation tasks show that ADCS significantly improves sample diversity and generalization across settings demanding precise coordination and adaptive constraint handling.
Residual Off-Policy RL for Finetuning Behavior Cloning Policies
Recent advances in behavior cloning (BC) have enabled impressive visuomotor control policies. However, these approaches are limited by the quality of human demonstrations, the manual effort required for data collection, and the diminishing returns from offline data. In comparison, reinforcement learning (RL) trains an agent through autonomous interaction with the environment and has shown remarkable success in various domains. Still, training RL policies directly on real-world robots remains challenging due to sample inefficiency, safety concerns, and the difficulty of learning from sparse rewards for long-horizon tasks, especially for high-degree-of-freedom (DoF) systems. We present a recipe that combines the benefits of BC and RL through a residual learning framework. Our approach leverages BC policies as black-box bases and learns lightweight per-step residual corrections via sample-efficient off-policy RL. We demonstrate that our method requires only sparse binary reward signals and can effectively improve manipulation policies on high-degree-of-freedom (DoF) systems in both simulation and the real world. In particular, we demonstrate, to the best of our knowledge, the first successful real-world RL training on a humanoid robot with dexterous hands. Our results demonstrate state-of-the-art performance in various vision-based tasks, pointing towards a practical pathway for deploying RL in the real world.
comment: Project website: https://residual-offpolicy-rl.github.io
One Demo Is All It Takes: Planning Domain Derivation with LLMs from A Single Demonstration
Pre-trained large language models (LLMs) show promise for robotic task planning but often struggle to guarantee correctness in long-horizon problems. Task and motion planning (TAMP) addresses this by grounding symbolic plans in low-level execution, yet it relies heavily on manually engineered planning domains. To improve long-horizon planning reliability and reduce human intervention, we present Planning Domain Derivation with LLMs (PDDLLM), a framework that automatically induces symbolic predicates and actions directly from demonstration trajectories by combining LLM reasoning with physical simulation roll-outs. Unlike prior domain-inference methods that rely on partially predefined or language descriptions of planning domains, PDDLLM constructs domains without manual domain initialization and automatically integrates them with motion planners to produce executable plans, enhancing long-horizon planning automation. Across 1,200 tasks in nine environments, PDDLLM outperforms six LLM-based planning baselines, achieving at least 20\% higher success rates, reduced token costs, and successful deployment on multiple physical robot platforms.
comment: 35 pages, 10 figures
DexWrist: A Robotic Wrist for Constrained and Dynamic Manipulation ICRA 2026
Development of dexterous manipulation hardware has primarily focused on hands and grippers. However, robotic wrists are equally critical, often playing a greater role than the end effector itself. Many conventional wrist designs fall short in human environments because they are too large or rely on rigid, high-reduction actuators that cannot support dynamic, contact-rich tasks. Some designs address these issues using backdrivable quasi-direct drive (QDD) actuators and compact form factors. However, they are often difficult to model and control due to coupled kinematics or high mechanical inertia. We present DexWrist, a robotic wrist that is designed to advance robotic manipulation in highly constrained environments, enable dynamic and contact-rich tasks, and simplify policy learning. DexWrist provides low-impedance actuation, low inertia, integrated proprioception, high speed, and a large workspace. Together, these capabilities support robust learning-based manipulation. DexWrist accelerates policy learning by: (i) enabling faster teleoperation for scalable data collection, (ii) simplifying the learned function through shorter trajectories and decoupled degrees of freedom (DOFs), (iii) providing natural backdrivability for safe contact without complex compliant controllers, and (iv) expanding the manipulation workspace in cluttered scenes. In our experiments, DexWrist improved policy success rates by 50-55% and reduced task completion times by a factor of 3-5. More details about the wrist can be found at https://dexwrist.csail.mit.edu.
comment: 9 pages, 8 figures. Submitted to ICRA 2026
DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation
Adaptive navigation in unfamiliar environments is crucial for household service robots but remains challenging due to the need for both low-level path planning and high-level scene understanding. While recent vision-language model (VLM) based zero-shot approaches reduce dependence on prior maps and scene-specific training data, they face significant limitations: spatiotemporal discontinuity from discrete observations, unstructured memory representations, and insufficient task understanding leading to navigation failures. We propose DORAEMON (Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation), a novel cognitive-inspired framework consisting of Ventral and Dorsal Streams that mimics human navigation capabilities. The Dorsal Stream implements the Hierarchical Semantic-Spatial Fusion and Topology Map to handle spatiotemporal discontinuities, while the Ventral Stream combines RAG-VLM and Policy-VLM to improve decision-making. Our approach also develops Nav-Ensurance to ensure navigation safety and efficiency. We evaluate DORAEMON on the HM3D, MP3D, and GOAT datasets, where it achieves state-of-the-art performance on both success rate (SR) and success weighted by path length (SPL) metrics, significantly outperforming existing methods. We also introduce a new evaluation metric (AORI) to assess navigation intelligence better. Comprehensive experiments demonstrate DORAEMON's effectiveness in zero-shot autonomous navigation without requiring prior map building or pre-training.
Systems and Control (CS)
An enhanced statistical feature fusion approach using an improved distance evaluation algorithm and weighted K-nearest neighbor for bearing fault diagnosis
Bearings are among the most failure-prone components in rotating machinery, and their condition directly impacts overall performance. Therefore, accurately diagnosing bearing faults is essential for ensuring system stability. However, detecting such malfunctions in noisy environments, where data is collected from multiple sensors, necessitates the extraction and selection of informative features. This paper proposes an improved distance evaluation algorithm combined with a weighted K-nearest neighbor (KNN) classifier for bearing fault diagnosis. The process begins with extracting and integrating statistical features of vibration across the time, frequency, and time-frequency domains. Next, the improved distance evaluation algorithm assigns weights to the extracted features, retaining only the most informative ones by eliminating insensitive features. Finally, the selected features are used to train the weighted KNN classifier. To validate the proposed method, we employ bearing data from the University of Ottawa. The results demonstrate the effectiveness of our approach in accurately identifying bearing faults.
Next-Generation Aerial Robots -- Omniorientational Strategies: Dynamic Modeling, Control, and Comparative Analysis
Conventional multi-rotors are under-actuated systems, hindering them from independently controlling attitude from position. In this study, we present several distinct configurations that incorporate additional control inputs for manipulating the angles of the propeller axes. This addresses the mentioned limitations, making the systems "omniorientational". We comprehensively derived detailed dynamic models for all introduced configurations and validated by a methodology using Simscape Multibody simulations. Two controllers are designed: a sliding mode controller for robust handling of disturbances and a novel PID-based controller with gravity compensation integrating linear and non-linear allocators, designed for computational efficiency. A custom control allocation strategy is implemented to manage the input-non-affine nature of these systems, seeking to maximize battery life by minimizing the "Power Consumption Factor" defined in this study. Moreover, the controllers effectively managed harsh disturbances and uncertainties. Simulations compare and analyze the proposed configurations and controllers, majorly considering their power consumption. Furthermore, we conduct a qualitative comparison to evaluate the impact of different types of uncertainties on the control system, highlighting areas for potential model or hardware improvements. The analysis in this study provides a roadmap for future researchers to design omniorientational drones based on their design objectives, offering practical insights into configuration selection and controller design. This research aligns with the project SAC-1, one of the objectives of Sharif AgRoLab.
Continuous-Time System Identification and OCV Reconstruction of Li-ion Batteries via Regularized Least Squares
Accurate identification of lithium-ion (Li-ion) battery parameters is essential for managing and predicting battery behavior. However, existing discrete-time methods hinder the estimation of physical parameters and face the fast-slow dynamics problem presented in the battery. In this paper, we developed a continuous-time approach that enables the estimation of battery parameters directly from sampled data. This method avoids discretization errors in converting continuous-time models into discrete-time ones, achieving more accurate identification. In addition, we jointly identify the open-circuit voltage (OCV) and the state of charge (SOC) relation of the battery without utilizing offline OCV tests. By modeling the OCV-SOC curve as a cubic B-spline, we achieve a high-fidelity representation of the OCV curve, facilitating its estimation. Through solving a rank and L1 regularized least squares problem, we jointly identify battery parameters and the OCV-SOC relation from the battery's dynamic data. Simulated and real-life data demonstrate the effectiveness of the developed method.
Direct Continuous-Time LPV System Identification of Li-ion Batteries via L1-Regularized Least Squares
Accurate identification of lithium-ion battery parameters is essential for estimating battery states and managing performance. However, the variation of battery parameters over the state of charge (SOC) and the nonlinear dependence of the open-circuit voltage (OCV) on the SOC complicate the identification process. In this work, we develop a continuous-time LPV system identification approach to identify the SOC-dependent battery parameters and the OCV-SOC mapping. We model parameter variations using cubic B-splines to capture the piecewise nonlinearity of the variations and estimate signal derivatives via state variable filters, facilitating CT-LPV identification. Battery parameters and the OCV-SOC mapping are jointly identified by solving L1-regularized least squares problems. Numerical experiments on a simulated battery and real-life data demonstrate the effectiveness of the developed method in battery identification, presenting improved performance compared to conventional RLS-based methods.
The Use of the Simplex Architecture to Enhance Safety in Deep-Learning-Powered Autonomous Systems
Recently, the outstanding performance reached by neural networks in many tasks has led to their deployment in autonomous systems, such as robots and vehicles. However, neural networks are not yet trustworthy, being prone to different types of misbehavior, such as anomalous samples, distribution shifts, adversarial attacks, and other threats. Furthermore, frameworks for accelerating the inference of neural networks typically run on rich operating systems that are less predictable in terms of timing behavior and present larger surfaces for cyber-attacks. To address these issues, this paper presents a software architecture for enhancing safety, security, and predictability levels of learning-based autonomous systems. It leverages two isolated execution domains, one dedicated to the execution of neural networks under a rich operating system, which is deemed not trustworthy, and one responsible for running safety-critical functions, possibly under a different operating system capable of handling real-time constraints. Both domains are hosted on the same computing platform and isolated through a type-1 real-time hypervisor enabling fast and predictable inter-domain communication to exchange real-time data. The two domains cooperate to provide a fail-safe mechanism based on a safety monitor, which oversees the state of the system and switches to a simpler but safer backup module, hosted in the safety-critical domain, whenever its behavior is considered untrustworthy. The effectiveness of the proposed architecture is illustrated by a set of experiments performed on two control systems: a Furuta pendulum and a rover. The results confirm the utility of the fall-back mechanism in preventing faults due to the learning component.
On the convergence of a numerical scheme for a boundary controlled 1D linear parabolic PIDE
We consider an 1D partial integro-differential equation (PIDE) comprising of an 1D parabolic partial differential equation (PDE) and a nonlocal integral term. The control input is applied on one of the boundaries of the PIDE. Partitioning the spatial interval into $n+1$ subintervals and approximating the spatial derivatives and the integral term with their finite-difference approximations and Riemann sum, respectively, we derive an $n^{\rm th}$-order semi-discrete approximation of the PIDE. The $n^{\rm th}$-order semi-discrete approximation of the PIDE is an $n^{\rm th}$-order ordinary differential equation (ODE) in time. We establish some of its salient properties and using them prove that the solution of the semi-discrete approximation converges to the solution of the PIDE as $n\to\infty$. We illustrate our convergence results using numerical examples. The results in this work are useful for establishing the null controllability of the PIDE considered.
comment: 6 pages, 3 figures
Dual-Band Flexible Endfire Filtering Antenna With Conformal Capability for Emergency Communication Applications
In this letter, a single-layer dual-band flexible conformal filtering endfire antenna is presented. The proposed antenna is based on two co-designed folded dipoles (FDs) working at two frequencies, where the lower-frequency FD acts as a reflector for the higher-frequency one. Then, by devising an additional reflector for lower-frequency FD, dual-band endfire radiation is realized. Parasitic strips are deliberately introduced around the FDs to generate electric coupling and magnetic coupling in the two operating bands, resulting in significant filtering performance with four radiation nulls. With flexible structure and single-layer configuration, the antenna design exhibits flexible conformability with cylindrical surfaces of diverse diameters, thereby enabling seamless integration into scalable emergency communication systems. To verify our design concept, an antenna prototype is fabricated and measured. The measured working frequency ranges from 1.37 to 1.45 GHz and 1.89 to 2.07 GHz. Out-of-band radiation suppression more than 11 dB is achieved under different bending radii. The proposed design offers several advantages including dual-band endfire filtering radiation, flexible conformability and low-profile.
Revealing Chaotic Dependence and Degree-Structure Mechanisms in Optimal Pinning Control of Complex Networks
Identifying an optimal set of driver nodes to achieve synchronization via pinning control is a fundamental challenge in complex network science, limited by computational intractability and the lack of general theory. Here, leveraging a degree-based mean-field (annealed) approximation from statistical physics, we analytically reveal how the structural degree distribution systematically governs synchronization performance, and derive an analytic characterization of the globally optimal pinning set and constructive algorithms with linear complexity (dominated by degree sorting, O(N+M). The optimal configuration exhibits a chaotic dependence--a discontinuous sensitivity--on its cardinality, whereby adding a single node can trigger abrupt changes in node composition and control effectiveness. This structural transition fundamentally challenges traditional heuristics that assume monotonic performance gains with budget. Systematic experiments on synthetic and empirical networks confirm that the proposed approach consistently outperforms degree-, betweenness-, and other centrality-based baselines. Furthermore, we quantify how key degree-distribution features--low-degree saturation, high-degree cutoff, and the power-law exponent--govern achievable synchronizability and shape the form of optimal sets. These results offer a systematic understanding of how degree heterogeneity shapes the network controllability. Our work establishes a unified link between degree heterogeneity and spectral controllability, offering both mechanistic insights and practical design rules for optimal driver-node selection in diverse complex systems.
comment: 16 pages, 6 figures; primary: eess.SY; cross-lists: cs.SY, math.OC. Submitted to IEEE TAC
Automated algorithm design for convex optimization problems with linear equality constraints
Synthesis of optimization algorithms typically follows a {\em design-then-analyze\/} approach, which can obscure fundamental performance limits and hinder the systematic development of algorithms that operate near these limits. Recently, a framework grounded in robust control theory has emerged as a powerful tool for automating algorithm synthesis. By integrating design and analysis stages, fundamental performance bounds are revealed and synthesis of algorithms that achieve them is enabled. In this paper, we apply this framework to design algorithms for solving strongly convex optimization problems with linear equality constraints. Our approach yields a single-loop, gradient-based algorithm whose convergence rate is independent of the condition number of the constraint matrix. This improves upon the best known rate within the same algorithm class, which depends on the product of the condition numbers of the objective function and the constraint matrix.
comment: Accepted to 64th IEEE Conference on Decision Control (CDC), 2025
Parasitic actuation delay limits the minimum employable time headway in connected and autonomous vehicles
Adaptive andcooperative adaptive cruise control (ACC and CACC) and next generation CACC (CACC+) systems usually employ a constant time headway policy (CTHP) for platooning of connected and autonomous vehicles (CAVs). In ACC, the ego vehicle uses onboard sensors to measure the position and velocity of the predecessor vehicle to maintain a desired spacing. The CACC and CACC+systems use additional information, such as acceleration(s) communicated through vehicle-to-vehicle (V2V) communication of the predecessor vehicle(s); these systems have been shown to result in improved spacing performance, throughput, and safety over ACC. Parasitic dynamics are generally difficult to model and the parasitic parameters (delay, lag, etc.) are difficult to obtain. Parasitic actuation delays can have deleterious effects and impose limits on the mobility and safety of CAVs. It is reasonable to assume that the bounds on parasitic actuation delays are known a priori. For CAVs, we need to address both internal stability and string stability in the presence of parasitic actuation delays. This requires robustness of string and internal stability for all values of parasitic actuation delays that are within the specified upper bound. In this paper, we provide the minimum employable time headway for ACC, CACC, and CACC+ (`r' predecessors look-ahead), respectively. The inclusion of the internal stability in the string stability condition is analyzed based on Pontryagin's interlacing theorem for time delay systems. We provide comparative numerical results to corroborate the achieved theoretical results.
Frequency Domain Stability Conditions for Hybrid AC/DC Systems
In this article, we investigate small-signal frequency and DC voltage stability of hybrid AC/DC power systems that combine AC and DC transmission, conventional machine- based generation, and converter-interfaced generation. The main contributions of this work are a compact frequency domain representation of hybrid AC/DC systems and associated stability conditions that can be divided into conditions on the individual bus dynamics and conditions on each DC network. The bus- level conditions apply to a wide range of technologies (e.g., synchronous generators, synchronous condensers, grid-forming renewables and energy storage). Moreover, the system-level conditions establish that hybrid AC/DC systems combining a wide range of devices are stable independently of the network topology provided that the frequency response of converters on each DC network is sufficiently coherent relative to the network coupling strength. Additionally, we develop and validate a novel reduced- order damper winding model for multi-machine systems.
comment: presented at IREP 2025, published in Sustainable Energy, Grids and Networks
NEO-Grid: A Neural Approximation Framework for Optimization and Control in Distribution Grids
The rise of distributed energy resources (DERs) is reshaping modern distribution grids, introducing new challenges in attaining voltage stability under dynamic and decentralized operating conditions. This paper presents NEO-Grid, a unified learning-based framework for volt-var optimization (VVO) and volt-var control (VVC) that leverages neural network surrogates for power flow and deep equilibrium models (DEQs) for closed-loop control. Our method replaces traditional linear approximations with piecewise-linear ReLU networks trained to capture the nonlinear relationship between power injections and voltage magnitudes. For control, we model the recursive interaction between voltage and inverter response using DEQs, allowing direct fixed-point computation and efficient training via implicit differentiation. We evaluated NEO-Grid on the IEEE 33-bus system, demonstrating that it significantly improves voltage regulation performance compared to standard linear and heuristic baselines in both optimization and control settings. Our results establish NEO-Grid as a scalable, accurate, and interpretable solution for learning-based voltage regulation in distribution grids.
Mitigation of Active Power Oscillation in Multi-VSG Grids: An Impedance-Based Perspective
Active power oscillations frequently arise in inverter-dominated power systems with multiple converters operating under Virtual Synchronous Generator control, posing risks to system stability and protection coordination. While various mitigation strategies have been proposed, many rely on prior knowledge of system parameters, offer limited damping performance, or involve complex models that lack physical interpretability, making them difficult to apply in practice. To address these challenges, this paper first introduces a physically intuitive RLC equivalent circuit model to explain the root causes of APOs in both stand-alone and grid-connected modes. By mapping inertia, damping, and feeder impedance to capacitive, resistive, and inductive elements, respectively, the model reveals how mismatches among converters lead to inter-unit oscillations characterized by LC resonance. Building on this insight, we propose two mode-specific mitigation strategies: in SA mode, a graph theory based impedance control ensures proportional reactive power sharing and effectively suppresses APOs; and in GC mode, adaptive inertia and damping control with feedforward filtering is designed to reshape transient power dynamics while preserving frequency stability. The proposed methods are validated through extensive simulations and real-time hardware-in-the-loop experiments, demonstrating their effectiveness in suppressing oscillations and enhancing the robustness of multi-converter power systems.
A comprehensive equivalent circuit model for high overtone bulk acoustic resonators (HBARs)
This paper presents a new and comprehensive equivalent circuit model for high overtone bulk acoustic resonators (HBARs). HBARs demonstrate several very sharp resonance modes distributed nearly periodically over a very wide frequency range. This spectrum response of HBARs offers unique advantages but poses significant modeling challenges. The proposed circuit incorporates and models the unique physical components of the HBAR: piezoelectric transducer, substrate (a perfectly periodic multimode cavity), piezoelectric coupling, and critically, the imperfectly matched transducer-substrate interface which imparts characteristic aperiodicity to the HBAR mode spectrum. By judicious use of fixed, periodic, or tightly constrained virtual lumped-element branches, and sets of branches, the model retains clear and intuitive links to the physical device, while reducing the complexity needed for fitting dense, broadband datasets. We demonstrate the validity and power of this model by simultaneously fitting measured data for 61 modes of a GaN/NbN/sapphire HBAR over a span of 1 GHz, and extracting modal parameters such as quality factors and coupling coefficients. We show that this new model is compact and yet scalable: by leveraging the inherent internal relationships in an HBAR, the model can be easily expanded to include multiple transducer overtones and envelopes, multiple distinct transducers, and spurious modes. In addition to fitting measured datasets, the new model can also be used to easily analyze various perturbations to the nominal state of the HBAR. We expect the new model to be useful for the design of classical HBAR-based oscillators, filters, and sensors, and for the integration of HBARs into quantum circuits.
comment: 20 pages, 9 figures
Quaternionic Pole Placement via Companion Forms and the Ackermann Formula
We present an extension of state-feedback pole placement for quaternionic systems, based on companion forms and the Ackermann formula. For controllable single-input quaternionic LTI models, we define a companion polynomial that right-annihilates its companion matrix, characterize spectra via right-eigenvalue similarity classes, and prove coefficient-matching design in controllable coordinates. We then derive a coordinate-free Ackermann gain expression valid for real target polynomials, and state its scope and limitations. Short examples demonstrate correctness, practical use, and numerical simplicity.
comment: 8 pages. Preprint (submitted version). Submitted to IEEE Transactions on Automatic Control (Technical Note) on 22 Sep 2025. Funding: co-funded by the European Union under the project ROBOPROX (reg. no. CZ.02.01.01/00/22_008/0004590). Keywords: quaternions; pole placement; companion form; Ackermann formula; right eigenvalues; quaternionic systems
Automated Algorithm Design via Nevanlinna-Pick Interpolation NeurIPS 2025
The synthesis of optimization algorithms typically follows a design-first-analyze-later approach, which often obscures fundamental performance limitations and hinders the systematic design of algorithms operating at the achievable theoretical boundaries. Recently, a framework based on frequency-domain techniques from robust control theory has emerged as a powerful tool for automating algorithm synthesis. By integrating the design and analysis stages, this framework enables the identification of fundamental performance limits. In this paper, we build on this framework and extend it to address algorithms for solving strongly convex problems with equality constraints. As a result, we obtain a new class of algorithms that offers sharp trade-off between number of matrix multiplication per iteration and convergence rate.
comment: Accepted to DynaFront Workshop at NeurIPS 2025
Rebuild AC Power Flow Models with Graph Attention Networks
A full power flow (PF) model is a complete representation of the physical power network. Traditional model-based methods rely on the full PF model to implement power flow analysis. In practice, however, some PF model parameters can be inaccurate or even unavailable due to the uncertainties or dynamics in the power systems. Moreover, because the power network keeps evolving with possibly changing topology, the generalizability of a PF model to different network sizes and typologies should be considered. In this paper, we propose a PF rebuild model based on graph attention networks (GAT) by constructing a new graph based on the real and imaginary parts of voltage at each bus. By comparing with two state-of-the-art PF rebuild models for different standard IEEE power system cases and their modified topology variants, we demonstrate the feasibility of our method. Experimental results show that our proposed model achieves better accuracy for a changing network and can generalize to different networks with less accuracy discount.
Occlusion-Aware Consistent Model Predictive Control for Robot Navigation in Occluded Obstacle-Dense Environments
Ensuring safety and motion consistency for robot navigation in occluded, obstacle-dense environments is a critical challenge. In this context, this study presents an occlusion-aware Consistent Model Predictive Control (CMPC) strategy. To account for the occluded obstacles, it incorporates adjustable risk regions that represent their potential future locations. Subsequently, dynamic risk boundary constraints are developed online to ensure safety.The CMPC then constructs multiple locally optimal trajectory branches (each tailored to different risk regions) to strike a balance between safety and performance. A shared consensus segment is generated to ensure smooth transitions between branches without significant velocity fluctuations, further preserving motion consistency. To facilitate high computational efficiency and ensure coordination across local trajectories, we use the alternating direction method of multipliers (ADMM) to decompose the CMPC into manageable sub-problems for parallel solving. The proposed strategy is validated through simulations and real-world experiments on an Ackermann-steering robot platform. The results demonstrate the effectiveness of the proposed CMPC strategy through comparisons with baseline approaches in occluded, obstacle-dense environments.
On the System Theoretic Offline Learning of Continuous-Time LQR with Exogenous Disturbances
We analyze offline designs of linear quadratic regulator (LQR) strategies with uncertain disturbances. First, we consider the scenario where the exogenous variable can be estimated in a controlled environment, and subsequently, consider a more practical and challenging scenario where it is unknown in a stochastic setting. Our approach builds on the fundamental learning-based framework of adaptive dynamic programming (ADP), combined with a Lyapunov-based analytical methodology to design the algorithms and derive sample-based approximations motivated from the Markov decision process (MDP)-based approaches. For the scenario involving non-measurable disturbances, we further establish stability and convergence guarantees for the learned control gains under sample-based approximations. The overall methodology emphasizes simplicity while providing rigorous guarantees. Finally, numerical experiments focus on the intricacies and validations for the design of offline continuous-time LQR with exogenous disturbances.
comment: 17 pages, 3 figures
Real-time Hybrid System Identification with Online Deterministic Annealing
We introduce a real-time identification method for discrete-time state-dependent switching systems in both the input--output and state-space domains. In particular, we design a system of adaptive algorithms running in two timescales; a stochastic approximation algorithm implements an online deterministic annealing scheme at a slow timescale and estimates the mode-switching signal, and an recursive identification algorithm runs at a faster timescale and updates the parameters of the local models based on the estimate of the switching signal. We first focus on piece-wise affine systems and discuss identifiability conditions and convergence properties based on the theory of two-timescale stochastic approximation. In contrast to standard identification algorithms for switched systems, the proposed approach gradually estimates the number of modes and is appropriate for real-time system identification using sequential data acquisition. The progressive nature of the algorithm improves computational efficiency and provides real-time control over the performance-complexity trade-off. Finally, we address specific challenges that arise in the application of the proposed methodology in identification of more general switching systems. Simulation results validate the efficacy of the proposed methodology.
Identification and Optimal Nonlinear Control of Turbojet Engine Using Koopman Eigenfunction Model
Gas turbine engines are complex and highly nonlinear dynamical systems. Deriving their physics-based models can be challenging because it requires performance characteristics that are not always available, often leading to many simplifying assumptions. This paper discusses the limitations of conventional experimental methods used to derive component-level and locally linear parameter-varying models, and addresses these issues by employing identification techniques based on data collected from standard engine operation under closed-loop control. The rotor dynamics are estimated using the sparse identification of nonlinear dynamics. Subsequently, the autonomous part of the dynamics is mapped into an optimally constructed Koopman eigenfunction space. This process involves eigenvalue optimization using metaheuristic algorithms and temporal projection, followed by gradient-based eigenfunction identification. The resulting Koopman model is validated against an in-house reference component-level model. A globally optimal nonlinear feedback controller and a Kalman estimator are then designed within the eigenfunction space and compared to traditional and gain-scheduled proportional-integral controllers, as well as a proposed internal model control approach. The eigenmode structure enables targeting individual modes during optimization, leading to improved performance tuning. Results demonstrate that the Koopman-based controller surpasses other benchmark controllers in both reference tracking and disturbance rejection under sea-level and varying flight conditions, due to its global nature.
comment: 34 pages, 28 figures Under review at Springer Nonlinear Dynamics
Contact-Aware Safety in Soft Robots Using High-Order Control Barrier and Lyapunov Functions
Robots operating alongside people, particularly in sensitive scenarios such as aiding the elderly with daily tasks or collaborating with workers in manufacturing, must guarantee safety and cultivate user trust. Continuum soft manipulators promise safety through material compliance, but as designs evolve for greater precision, payload capacity, and speed, and increasingly incorporate rigid elements, their injury risk resurfaces. In this letter, we introduce a comprehensive High-Order Control Barrier Function (HOCBF) + High-Order Control Lyapunov Function (HOCLF) framework that enforces strict contact force limits across the entire soft-robot body during environmental interactions. Our approach combines a differentiable Piecewise Cosserat-Segment (PCS) dynamics model with a convex-polygon distance approximation metric, named Differentiable Conservative Separating Axis Theorem (DCSAT), based on the soft robot geometry to enable real-time, whole-body collision detection, resolution, and enforcement of the safety constraints. By embedding HOCBFs into our optimization routine, we guarantee safety, allowing, for instance, safe navigation in operational space under HOCLF-driven motion objectives. Extensive planar simulations demonstrate that our method maintains safety-bounded contacts while achieving precise shape and task-space regulation. This work thus lays a foundation for the deployment of soft robots in human-centric environments with provable safety and performance.
comment: 8 pages
Robust Set Partitioning Strategy for Malicious Information Detection in Large-Scale Internet of Things
With the rapid development of the Internet of Things (IoT), the risks of data tampering and malicious information injection have intensified, making efficient threat detection in large-scale distributed sensor networks a pressing challenge. To address the decline in malicious information detection efficiency as network scale expands, this paper investigates a robust set partitioning strategy and, on this basis, develops a distributed attack detection framework with theoretical guarantees. Specifically, we introduce a gain mutual influence metric to characterize the inter-subset interference arising during gain updates, thereby revealing the fundamental reason for the performance gap between distributed and centralized algorithms. Building on this insight, the set partitioning strategy based on Grassmann distance is proposed, which significantly reduces the computational cost of gain updates while maintaining detection performance, and ensures that the distributed setting under subset partitioning preserves the same theoretical performance bound as the baseline algorithm. Unlike conventional clustering methods, the proposed set partitioning strategy leverages the intrinsic observational features of sensors for robust partitioning, thereby enhancing resilience to noise and interference. Simulation results demonstrate that the proposed method limits the performance gap between distributed and centralized detection to no more than 1.648$\%$, while the computational cost decreases at an order of $O(1/m)$ with the number of subsets $m$. Therefore, the proposed algorithm effectively reduces computational overhead while preserving detection accuracy, offering a practical low-cost and highly reliable security detection solution for edge nodes in large-scale IoT systems.
comment: 24 pages, 5 figures
Discovery Learning accelerates battery design evaluation
Fast and reliable validation of novel designs in complex physical systems such as batteries is critical to accelerating technological innovation. However, battery research and development remain bottlenecked by the prohibitively high time and energy costs required to evaluate numerous new design candidates, particularly in battery prototyping and life testing. Despite recent progress in data-driven battery lifetime prediction, existing methods require labeled data of target designs to improve accuracy and cannot make reliable predictions until after prototyping, thus falling far short of the efficiency needed to enable rapid feedback for battery design. Here, we introduce Discovery Learning (DL), a scientific machine-learning paradigm that integrates active learning, physics-guided learning, and zero-shot learning into a human-like reasoning loop, drawing inspiration from learning theories in educational psychology. DL can learn from historical battery designs and actively reduce the need for prototyping, thus enabling rapid lifetime evaluation for unobserved material-design combinations without requiring additional data labeling. To test DL, we present 123 industrial-grade large-format lithium-ion pouch cells, spanning eight material-design combinations and diverse cycling protocols. Trained solely on public datasets of small-capacity cylindrical cells, DL achieves 7.2% test error in predicting the average cycle life under unknown device variability. This results in savings of 98% in time and 95% in energy compared to industrial practices. This work highlights the potential of uncovering insights from historical designs to inform and accelerate the development of next-generation battery technologies. DL represents a key advance toward efficient data-driven modeling and helps realize the promise of machine learning for accelerating scientific discovery and engineering innovation.
comment: Main text, 20 pages, 5 figures
Application of Battery Storage to Switching Predictive Control of Power Distribution Systems Including Road Heating
In regions with heavy snowfall, the living environment is becoming a serious problem due to heavy snow accumulation. A road heating is an electrical device which promotes snow melting by burying a heating cable as a thermal source underground in such regions. When integrating the road heating into power distribution systems, we need to optimize the flow of electric power by appropriately integrating distributed power sources and conventional power distribution equipment. In this paper, we introduce a battery storage to the power distribution system including road heating, and extend the predictive switching control of the systems due to the authors' previous study to the case where battery storage is installed. As a main result, we propose a predictive switching control that utilizes photovoltaic (PV) power generation and surplus power stored in the battery storage effectively, and achieves the reduction of distribution loss, attenuation of voltage fluctuation, and efficient snow melting, simultaneously. We verify the effectiveness of the application of battery storage through numerical simulation using actual time series data of weather conditions and active power of the PV power generation and load.
comment: 14 pages, 14 figures
Contact feedback helps snake robots propel against uneven terrain using vertical bending
Snakes can bend their elongate bodies in various forms to traverse various environments. We understand how snakes use lateral bending to push against asperities on flat ground for propulsion, and snake robots can do so effectively. However, snakes can also use vertical bending to push against terrain of large height variation for propulsion, and they can adjust it to adapt to novel terrain presumably using mechano-sensing feedback control. Although some snake robots can traverse uneven terrain, few have used vertical bending for propulsion, and how to control it in novel environments is poorly understood. Here we systematically studied a snake robot with force sensors pushing against large bumps using vertical bending to understand the role of sensory feedback control. We compared a feedforward controller and four feedback controllers that use different senses and generate distinct bending patterns and body-terrain interaction. We challenged the robot with increasing backward load and novel terrain geometry that break its contact with the terrain. We further varied how much the feedback control modulated bending to conform to or push against the terrain to test their effects. Feedforward propagation of vertical bending generated large propulsion when the shape matched terrain geometry. However, when perturbations caused loss of contact, the robot easily lost propulsion or had motor overload. Contact feedback control resolved these by improving contact. Yet excessive conformation interrupted propagation and excessive pushing stalled motors. Unlike that using lateral bending, for propulsion generation using vertical bending, body weight can help maintain contact with the environment but may also overload motors. Our results will help snake robots better traverse terrain with large height variation and can inform how snakes use sensory feedback to control vertical bending for propulsion.
comment: 62 pages, 20 figures
The Asymptotic Behavior of Attention in Transformers
The transformer architecture has become the foundation of modern Large Language Models (LLMs), yet its theoretical properties are still not well understood. As with classic neural networks, a common approach to improve these models is to increase their size and depth. However, such strategies may be suboptimal, as several works have shown that adding more layers yields increasingly diminishing returns. More importantly, prior studies have shown that increasing depth may lead to model collapse, i.e., all the tokens converge to a single cluster, undermining the ability of LLMs to generate diverse outputs. Building on differential equation models for the transformer dynamics, we prove that all the tokens in a transformer asymptotically converge to a cluster as depth increases. At the technical level we leverage tools from control theory, including consensus dynamics on manifolds and input-to-state stability (ISS). We then extend our analysis to autoregressive models, exploiting their structure to further generalize the theoretical guarantees.
Linear Phase Balancing Scheme using Voltage Unbalance Sensitivities in Multi-phase Power Distribution Grids
Power distribution networks, especially in North America, are often unbalanced due to the mix of single-, two- and three-phase networks as well as due to the high penetration of single-phase devices at the distribution level such as electric vehicle (EV) chargers and single-phase solar plants. However, the network operator must adhere to the voltage unbalance levels within the limits specified by IEEE, IEC, and NEMA standards for the safety of the equipment as well as the efficiency of the network operation. Existing works have proposed active and reactive power control in the network to minimize imbalances. However, these optimization problems are highly nonlinear and nonconvex due to the inherent non-linearity of unbalanced metrics and power-flow equations. In this work, we propose a linearization approach of unbalance metrics such as voltage unbalance factors (VUF), phase voltage unbalance rate (PVUR), and line voltage unbalance rate (LVUR) using the first order Taylor's approximation. This linearization is then applied to the phase balancing control scheme; it is formulated as a feedback approach where the linearization is updated successively after the active/reactive control setpoint has been actuated and shows improvement in voltage imbalances. We demonstrate the application of the proposed scheme on a standard IEEE benchmark test case, demonstrating its effectiveness.
comment: To appear at 64th IEEE Conference on Decision and Control
On the Sharp Input-Output Analysis of Nonlinear Systems under Adversarial Attacks
This paper is concerned with learning the input-output mapping of general nonlinear dynamical systems. While the existing literature focuses on Gaussian inputs and benign disturbances, we significantly broaden the scope of admissible control inputs and allow correlated, nonzero-mean, adversarial disturbances. With our reformulation as a linear combination of basis functions, we prove that the $\ell_2$-norm estimator overcomes the challenges as long as the probability that the system is under adversarial attack at a given time is smaller than a certain threshold. We provide an estimation error bound that decays with the input memory length and prove its optimality by constructing a problem instance that suffers from the same bound under adversarial attacks. Our work provides a sharp input-output analysis for a generic nonlinear and partially observed system under significantly generalized assumptions compared to existing works.
comment: 26 pages, 3 figures
GNN-DT: Graph Neural Network Enhanced Decision Transformer for Efficient Optimization in Dynamic Environments
Reinforcement Learning (RL) methods used for solving real-world optimization problems often involve dynamic state-action spaces, larger scale, and sparse rewards, leading to significant challenges in convergence, scalability, and efficient exploration of the solution space. This study introduces GNN-DT, a novel Decision Transformer (DT) architecture that integrates Graph Neural Network (GNN) embedders with a novel residual connection between input and output tokens crucial for handling dynamic environments. By learning from previously collected trajectories, GNN-DT tackles the sparse rewards limitations of online RL algorithms and delivers high-quality solutions in real-time. We evaluate GNN-DT on the complex electric vehicle (EV) charging optimization problem and prove that its performance is superior and requires significantly fewer training trajectories, thus improving sample efficiency compared to existing DT and offline RL baselines. Furthermore, GNN-DT exhibits robust generalization to unseen environments and larger action spaces, addressing a critical gap in prior offline and online RL approaches.
Systems and Control (EESS)
An enhanced statistical feature fusion approach using an improved distance evaluation algorithm and weighted K-nearest neighbor for bearing fault diagnosis
Bearings are among the most failure-prone components in rotating machinery, and their condition directly impacts overall performance. Therefore, accurately diagnosing bearing faults is essential for ensuring system stability. However, detecting such malfunctions in noisy environments, where data is collected from multiple sensors, necessitates the extraction and selection of informative features. This paper proposes an improved distance evaluation algorithm combined with a weighted K-nearest neighbor (KNN) classifier for bearing fault diagnosis. The process begins with extracting and integrating statistical features of vibration across the time, frequency, and time-frequency domains. Next, the improved distance evaluation algorithm assigns weights to the extracted features, retaining only the most informative ones by eliminating insensitive features. Finally, the selected features are used to train the weighted KNN classifier. To validate the proposed method, we employ bearing data from the University of Ottawa. The results demonstrate the effectiveness of our approach in accurately identifying bearing faults.
Next-Generation Aerial Robots -- Omniorientational Strategies: Dynamic Modeling, Control, and Comparative Analysis
Conventional multi-rotors are under-actuated systems, hindering them from independently controlling attitude from position. In this study, we present several distinct configurations that incorporate additional control inputs for manipulating the angles of the propeller axes. This addresses the mentioned limitations, making the systems "omniorientational". We comprehensively derived detailed dynamic models for all introduced configurations and validated by a methodology using Simscape Multibody simulations. Two controllers are designed: a sliding mode controller for robust handling of disturbances and a novel PID-based controller with gravity compensation integrating linear and non-linear allocators, designed for computational efficiency. A custom control allocation strategy is implemented to manage the input-non-affine nature of these systems, seeking to maximize battery life by minimizing the "Power Consumption Factor" defined in this study. Moreover, the controllers effectively managed harsh disturbances and uncertainties. Simulations compare and analyze the proposed configurations and controllers, majorly considering their power consumption. Furthermore, we conduct a qualitative comparison to evaluate the impact of different types of uncertainties on the control system, highlighting areas for potential model or hardware improvements. The analysis in this study provides a roadmap for future researchers to design omniorientational drones based on their design objectives, offering practical insights into configuration selection and controller design. This research aligns with the project SAC-1, one of the objectives of Sharif AgRoLab.
Continuous-Time System Identification and OCV Reconstruction of Li-ion Batteries via Regularized Least Squares
Accurate identification of lithium-ion (Li-ion) battery parameters is essential for managing and predicting battery behavior. However, existing discrete-time methods hinder the estimation of physical parameters and face the fast-slow dynamics problem presented in the battery. In this paper, we developed a continuous-time approach that enables the estimation of battery parameters directly from sampled data. This method avoids discretization errors in converting continuous-time models into discrete-time ones, achieving more accurate identification. In addition, we jointly identify the open-circuit voltage (OCV) and the state of charge (SOC) relation of the battery without utilizing offline OCV tests. By modeling the OCV-SOC curve as a cubic B-spline, we achieve a high-fidelity representation of the OCV curve, facilitating its estimation. Through solving a rank and L1 regularized least squares problem, we jointly identify battery parameters and the OCV-SOC relation from the battery's dynamic data. Simulated and real-life data demonstrate the effectiveness of the developed method.
Direct Continuous-Time LPV System Identification of Li-ion Batteries via L1-Regularized Least Squares
Accurate identification of lithium-ion battery parameters is essential for estimating battery states and managing performance. However, the variation of battery parameters over the state of charge (SOC) and the nonlinear dependence of the open-circuit voltage (OCV) on the SOC complicate the identification process. In this work, we develop a continuous-time LPV system identification approach to identify the SOC-dependent battery parameters and the OCV-SOC mapping. We model parameter variations using cubic B-splines to capture the piecewise nonlinearity of the variations and estimate signal derivatives via state variable filters, facilitating CT-LPV identification. Battery parameters and the OCV-SOC mapping are jointly identified by solving L1-regularized least squares problems. Numerical experiments on a simulated battery and real-life data demonstrate the effectiveness of the developed method in battery identification, presenting improved performance compared to conventional RLS-based methods.
The Use of the Simplex Architecture to Enhance Safety in Deep-Learning-Powered Autonomous Systems
Recently, the outstanding performance reached by neural networks in many tasks has led to their deployment in autonomous systems, such as robots and vehicles. However, neural networks are not yet trustworthy, being prone to different types of misbehavior, such as anomalous samples, distribution shifts, adversarial attacks, and other threats. Furthermore, frameworks for accelerating the inference of neural networks typically run on rich operating systems that are less predictable in terms of timing behavior and present larger surfaces for cyber-attacks. To address these issues, this paper presents a software architecture for enhancing safety, security, and predictability levels of learning-based autonomous systems. It leverages two isolated execution domains, one dedicated to the execution of neural networks under a rich operating system, which is deemed not trustworthy, and one responsible for running safety-critical functions, possibly under a different operating system capable of handling real-time constraints. Both domains are hosted on the same computing platform and isolated through a type-1 real-time hypervisor enabling fast and predictable inter-domain communication to exchange real-time data. The two domains cooperate to provide a fail-safe mechanism based on a safety monitor, which oversees the state of the system and switches to a simpler but safer backup module, hosted in the safety-critical domain, whenever its behavior is considered untrustworthy. The effectiveness of the proposed architecture is illustrated by a set of experiments performed on two control systems: a Furuta pendulum and a rover. The results confirm the utility of the fall-back mechanism in preventing faults due to the learning component.
On the convergence of a numerical scheme for a boundary controlled 1D linear parabolic PIDE
We consider an 1D partial integro-differential equation (PIDE) comprising of an 1D parabolic partial differential equation (PDE) and a nonlocal integral term. The control input is applied on one of the boundaries of the PIDE. Partitioning the spatial interval into $n+1$ subintervals and approximating the spatial derivatives and the integral term with their finite-difference approximations and Riemann sum, respectively, we derive an $n^{\rm th}$-order semi-discrete approximation of the PIDE. The $n^{\rm th}$-order semi-discrete approximation of the PIDE is an $n^{\rm th}$-order ordinary differential equation (ODE) in time. We establish some of its salient properties and using them prove that the solution of the semi-discrete approximation converges to the solution of the PIDE as $n\to\infty$. We illustrate our convergence results using numerical examples. The results in this work are useful for establishing the null controllability of the PIDE considered.
comment: 6 pages, 3 figures
Dual-Band Flexible Endfire Filtering Antenna With Conformal Capability for Emergency Communication Applications
In this letter, a single-layer dual-band flexible conformal filtering endfire antenna is presented. The proposed antenna is based on two co-designed folded dipoles (FDs) working at two frequencies, where the lower-frequency FD acts as a reflector for the higher-frequency one. Then, by devising an additional reflector for lower-frequency FD, dual-band endfire radiation is realized. Parasitic strips are deliberately introduced around the FDs to generate electric coupling and magnetic coupling in the two operating bands, resulting in significant filtering performance with four radiation nulls. With flexible structure and single-layer configuration, the antenna design exhibits flexible conformability with cylindrical surfaces of diverse diameters, thereby enabling seamless integration into scalable emergency communication systems. To verify our design concept, an antenna prototype is fabricated and measured. The measured working frequency ranges from 1.37 to 1.45 GHz and 1.89 to 2.07 GHz. Out-of-band radiation suppression more than 11 dB is achieved under different bending radii. The proposed design offers several advantages including dual-band endfire filtering radiation, flexible conformability and low-profile.
Revealing Chaotic Dependence and Degree-Structure Mechanisms in Optimal Pinning Control of Complex Networks
Identifying an optimal set of driver nodes to achieve synchronization via pinning control is a fundamental challenge in complex network science, limited by computational intractability and the lack of general theory. Here, leveraging a degree-based mean-field (annealed) approximation from statistical physics, we analytically reveal how the structural degree distribution systematically governs synchronization performance, and derive an analytic characterization of the globally optimal pinning set and constructive algorithms with linear complexity (dominated by degree sorting, O(N+M). The optimal configuration exhibits a chaotic dependence--a discontinuous sensitivity--on its cardinality, whereby adding a single node can trigger abrupt changes in node composition and control effectiveness. This structural transition fundamentally challenges traditional heuristics that assume monotonic performance gains with budget. Systematic experiments on synthetic and empirical networks confirm that the proposed approach consistently outperforms degree-, betweenness-, and other centrality-based baselines. Furthermore, we quantify how key degree-distribution features--low-degree saturation, high-degree cutoff, and the power-law exponent--govern achievable synchronizability and shape the form of optimal sets. These results offer a systematic understanding of how degree heterogeneity shapes the network controllability. Our work establishes a unified link between degree heterogeneity and spectral controllability, offering both mechanistic insights and practical design rules for optimal driver-node selection in diverse complex systems.
comment: 16 pages, 6 figures; primary: eess.SY; cross-lists: cs.SY, math.OC. Submitted to IEEE TAC
Automated algorithm design for convex optimization problems with linear equality constraints
Synthesis of optimization algorithms typically follows a {\em design-then-analyze\/} approach, which can obscure fundamental performance limits and hinder the systematic development of algorithms that operate near these limits. Recently, a framework grounded in robust control theory has emerged as a powerful tool for automating algorithm synthesis. By integrating design and analysis stages, fundamental performance bounds are revealed and synthesis of algorithms that achieve them is enabled. In this paper, we apply this framework to design algorithms for solving strongly convex optimization problems with linear equality constraints. Our approach yields a single-loop, gradient-based algorithm whose convergence rate is independent of the condition number of the constraint matrix. This improves upon the best known rate within the same algorithm class, which depends on the product of the condition numbers of the objective function and the constraint matrix.
comment: Accepted to 64th IEEE Conference on Decision Control (CDC), 2025
Parasitic actuation delay limits the minimum employable time headway in connected and autonomous vehicles
Adaptive andcooperative adaptive cruise control (ACC and CACC) and next generation CACC (CACC+) systems usually employ a constant time headway policy (CTHP) for platooning of connected and autonomous vehicles (CAVs). In ACC, the ego vehicle uses onboard sensors to measure the position and velocity of the predecessor vehicle to maintain a desired spacing. The CACC and CACC+systems use additional information, such as acceleration(s) communicated through vehicle-to-vehicle (V2V) communication of the predecessor vehicle(s); these systems have been shown to result in improved spacing performance, throughput, and safety over ACC. Parasitic dynamics are generally difficult to model and the parasitic parameters (delay, lag, etc.) are difficult to obtain. Parasitic actuation delays can have deleterious effects and impose limits on the mobility and safety of CAVs. It is reasonable to assume that the bounds on parasitic actuation delays are known a priori. For CAVs, we need to address both internal stability and string stability in the presence of parasitic actuation delays. This requires robustness of string and internal stability for all values of parasitic actuation delays that are within the specified upper bound. In this paper, we provide the minimum employable time headway for ACC, CACC, and CACC+ (`r' predecessors look-ahead), respectively. The inclusion of the internal stability in the string stability condition is analyzed based on Pontryagin's interlacing theorem for time delay systems. We provide comparative numerical results to corroborate the achieved theoretical results.
Frequency Domain Stability Conditions for Hybrid AC/DC Systems
In this article, we investigate small-signal frequency and DC voltage stability of hybrid AC/DC power systems that combine AC and DC transmission, conventional machine- based generation, and converter-interfaced generation. The main contributions of this work are a compact frequency domain representation of hybrid AC/DC systems and associated stability conditions that can be divided into conditions on the individual bus dynamics and conditions on each DC network. The bus- level conditions apply to a wide range of technologies (e.g., synchronous generators, synchronous condensers, grid-forming renewables and energy storage). Moreover, the system-level conditions establish that hybrid AC/DC systems combining a wide range of devices are stable independently of the network topology provided that the frequency response of converters on each DC network is sufficiently coherent relative to the network coupling strength. Additionally, we develop and validate a novel reduced- order damper winding model for multi-machine systems.
comment: presented at IREP 2025, published in Sustainable Energy, Grids and Networks
NEO-Grid: A Neural Approximation Framework for Optimization and Control in Distribution Grids
The rise of distributed energy resources (DERs) is reshaping modern distribution grids, introducing new challenges in attaining voltage stability under dynamic and decentralized operating conditions. This paper presents NEO-Grid, a unified learning-based framework for volt-var optimization (VVO) and volt-var control (VVC) that leverages neural network surrogates for power flow and deep equilibrium models (DEQs) for closed-loop control. Our method replaces traditional linear approximations with piecewise-linear ReLU networks trained to capture the nonlinear relationship between power injections and voltage magnitudes. For control, we model the recursive interaction between voltage and inverter response using DEQs, allowing direct fixed-point computation and efficient training via implicit differentiation. We evaluated NEO-Grid on the IEEE 33-bus system, demonstrating that it significantly improves voltage regulation performance compared to standard linear and heuristic baselines in both optimization and control settings. Our results establish NEO-Grid as a scalable, accurate, and interpretable solution for learning-based voltage regulation in distribution grids.
Mitigation of Active Power Oscillation in Multi-VSG Grids: An Impedance-Based Perspective
Active power oscillations frequently arise in inverter-dominated power systems with multiple converters operating under Virtual Synchronous Generator control, posing risks to system stability and protection coordination. While various mitigation strategies have been proposed, many rely on prior knowledge of system parameters, offer limited damping performance, or involve complex models that lack physical interpretability, making them difficult to apply in practice. To address these challenges, this paper first introduces a physically intuitive RLC equivalent circuit model to explain the root causes of APOs in both stand-alone and grid-connected modes. By mapping inertia, damping, and feeder impedance to capacitive, resistive, and inductive elements, respectively, the model reveals how mismatches among converters lead to inter-unit oscillations characterized by LC resonance. Building on this insight, we propose two mode-specific mitigation strategies: in SA mode, a graph theory based impedance control ensures proportional reactive power sharing and effectively suppresses APOs; and in GC mode, adaptive inertia and damping control with feedforward filtering is designed to reshape transient power dynamics while preserving frequency stability. The proposed methods are validated through extensive simulations and real-time hardware-in-the-loop experiments, demonstrating their effectiveness in suppressing oscillations and enhancing the robustness of multi-converter power systems.
A comprehensive equivalent circuit model for high overtone bulk acoustic resonators (HBARs)
This paper presents a new and comprehensive equivalent circuit model for high overtone bulk acoustic resonators (HBARs). HBARs demonstrate several very sharp resonance modes distributed nearly periodically over a very wide frequency range. This spectrum response of HBARs offers unique advantages but poses significant modeling challenges. The proposed circuit incorporates and models the unique physical components of the HBAR: piezoelectric transducer, substrate (a perfectly periodic multimode cavity), piezoelectric coupling, and critically, the imperfectly matched transducer-substrate interface which imparts characteristic aperiodicity to the HBAR mode spectrum. By judicious use of fixed, periodic, or tightly constrained virtual lumped-element branches, and sets of branches, the model retains clear and intuitive links to the physical device, while reducing the complexity needed for fitting dense, broadband datasets. We demonstrate the validity and power of this model by simultaneously fitting measured data for 61 modes of a GaN/NbN/sapphire HBAR over a span of 1 GHz, and extracting modal parameters such as quality factors and coupling coefficients. We show that this new model is compact and yet scalable: by leveraging the inherent internal relationships in an HBAR, the model can be easily expanded to include multiple transducer overtones and envelopes, multiple distinct transducers, and spurious modes. In addition to fitting measured datasets, the new model can also be used to easily analyze various perturbations to the nominal state of the HBAR. We expect the new model to be useful for the design of classical HBAR-based oscillators, filters, and sensors, and for the integration of HBARs into quantum circuits.
comment: 20 pages, 9 figures
Quaternionic Pole Placement via Companion Forms and the Ackermann Formula
We present an extension of state-feedback pole placement for quaternionic systems, based on companion forms and the Ackermann formula. For controllable single-input quaternionic LTI models, we define a companion polynomial that right-annihilates its companion matrix, characterize spectra via right-eigenvalue similarity classes, and prove coefficient-matching design in controllable coordinates. We then derive a coordinate-free Ackermann gain expression valid for real target polynomials, and state its scope and limitations. Short examples demonstrate correctness, practical use, and numerical simplicity.
comment: 8 pages. Preprint (submitted version). Submitted to IEEE Transactions on Automatic Control (Technical Note) on 22 Sep 2025. Funding: co-funded by the European Union under the project ROBOPROX (reg. no. CZ.02.01.01/00/22_008/0004590). Keywords: quaternions; pole placement; companion form; Ackermann formula; right eigenvalues; quaternionic systems
Automated Algorithm Design via Nevanlinna-Pick Interpolation NeurIPS 2025
The synthesis of optimization algorithms typically follows a design-first-analyze-later approach, which often obscures fundamental performance limitations and hinders the systematic design of algorithms operating at the achievable theoretical boundaries. Recently, a framework based on frequency-domain techniques from robust control theory has emerged as a powerful tool for automating algorithm synthesis. By integrating the design and analysis stages, this framework enables the identification of fundamental performance limits. In this paper, we build on this framework and extend it to address algorithms for solving strongly convex problems with equality constraints. As a result, we obtain a new class of algorithms that offers sharp trade-off between number of matrix multiplication per iteration and convergence rate.
comment: Accepted to DynaFront Workshop at NeurIPS 2025
Rebuild AC Power Flow Models with Graph Attention Networks
A full power flow (PF) model is a complete representation of the physical power network. Traditional model-based methods rely on the full PF model to implement power flow analysis. In practice, however, some PF model parameters can be inaccurate or even unavailable due to the uncertainties or dynamics in the power systems. Moreover, because the power network keeps evolving with possibly changing topology, the generalizability of a PF model to different network sizes and typologies should be considered. In this paper, we propose a PF rebuild model based on graph attention networks (GAT) by constructing a new graph based on the real and imaginary parts of voltage at each bus. By comparing with two state-of-the-art PF rebuild models for different standard IEEE power system cases and their modified topology variants, we demonstrate the feasibility of our method. Experimental results show that our proposed model achieves better accuracy for a changing network and can generalize to different networks with less accuracy discount.
Occlusion-Aware Consistent Model Predictive Control for Robot Navigation in Occluded Obstacle-Dense Environments
Ensuring safety and motion consistency for robot navigation in occluded, obstacle-dense environments is a critical challenge. In this context, this study presents an occlusion-aware Consistent Model Predictive Control (CMPC) strategy. To account for the occluded obstacles, it incorporates adjustable risk regions that represent their potential future locations. Subsequently, dynamic risk boundary constraints are developed online to ensure safety.The CMPC then constructs multiple locally optimal trajectory branches (each tailored to different risk regions) to strike a balance between safety and performance. A shared consensus segment is generated to ensure smooth transitions between branches without significant velocity fluctuations, further preserving motion consistency. To facilitate high computational efficiency and ensure coordination across local trajectories, we use the alternating direction method of multipliers (ADMM) to decompose the CMPC into manageable sub-problems for parallel solving. The proposed strategy is validated through simulations and real-world experiments on an Ackermann-steering robot platform. The results demonstrate the effectiveness of the proposed CMPC strategy through comparisons with baseline approaches in occluded, obstacle-dense environments.
On the System Theoretic Offline Learning of Continuous-Time LQR with Exogenous Disturbances
We analyze offline designs of linear quadratic regulator (LQR) strategies with uncertain disturbances. First, we consider the scenario where the exogenous variable can be estimated in a controlled environment, and subsequently, consider a more practical and challenging scenario where it is unknown in a stochastic setting. Our approach builds on the fundamental learning-based framework of adaptive dynamic programming (ADP), combined with a Lyapunov-based analytical methodology to design the algorithms and derive sample-based approximations motivated from the Markov decision process (MDP)-based approaches. For the scenario involving non-measurable disturbances, we further establish stability and convergence guarantees for the learned control gains under sample-based approximations. The overall methodology emphasizes simplicity while providing rigorous guarantees. Finally, numerical experiments focus on the intricacies and validations for the design of offline continuous-time LQR with exogenous disturbances.
comment: 17 pages, 3 figures
Real-time Hybrid System Identification with Online Deterministic Annealing
We introduce a real-time identification method for discrete-time state-dependent switching systems in both the input--output and state-space domains. In particular, we design a system of adaptive algorithms running in two timescales; a stochastic approximation algorithm implements an online deterministic annealing scheme at a slow timescale and estimates the mode-switching signal, and an recursive identification algorithm runs at a faster timescale and updates the parameters of the local models based on the estimate of the switching signal. We first focus on piece-wise affine systems and discuss identifiability conditions and convergence properties based on the theory of two-timescale stochastic approximation. In contrast to standard identification algorithms for switched systems, the proposed approach gradually estimates the number of modes and is appropriate for real-time system identification using sequential data acquisition. The progressive nature of the algorithm improves computational efficiency and provides real-time control over the performance-complexity trade-off. Finally, we address specific challenges that arise in the application of the proposed methodology in identification of more general switching systems. Simulation results validate the efficacy of the proposed methodology.
Identification and Optimal Nonlinear Control of Turbojet Engine Using Koopman Eigenfunction Model
Gas turbine engines are complex and highly nonlinear dynamical systems. Deriving their physics-based models can be challenging because it requires performance characteristics that are not always available, often leading to many simplifying assumptions. This paper discusses the limitations of conventional experimental methods used to derive component-level and locally linear parameter-varying models, and addresses these issues by employing identification techniques based on data collected from standard engine operation under closed-loop control. The rotor dynamics are estimated using the sparse identification of nonlinear dynamics. Subsequently, the autonomous part of the dynamics is mapped into an optimally constructed Koopman eigenfunction space. This process involves eigenvalue optimization using metaheuristic algorithms and temporal projection, followed by gradient-based eigenfunction identification. The resulting Koopman model is validated against an in-house reference component-level model. A globally optimal nonlinear feedback controller and a Kalman estimator are then designed within the eigenfunction space and compared to traditional and gain-scheduled proportional-integral controllers, as well as a proposed internal model control approach. The eigenmode structure enables targeting individual modes during optimization, leading to improved performance tuning. Results demonstrate that the Koopman-based controller surpasses other benchmark controllers in both reference tracking and disturbance rejection under sea-level and varying flight conditions, due to its global nature.
comment: 34 pages, 28 figures Under review at Springer Nonlinear Dynamics
Contact-Aware Safety in Soft Robots Using High-Order Control Barrier and Lyapunov Functions
Robots operating alongside people, particularly in sensitive scenarios such as aiding the elderly with daily tasks or collaborating with workers in manufacturing, must guarantee safety and cultivate user trust. Continuum soft manipulators promise safety through material compliance, but as designs evolve for greater precision, payload capacity, and speed, and increasingly incorporate rigid elements, their injury risk resurfaces. In this letter, we introduce a comprehensive High-Order Control Barrier Function (HOCBF) + High-Order Control Lyapunov Function (HOCLF) framework that enforces strict contact force limits across the entire soft-robot body during environmental interactions. Our approach combines a differentiable Piecewise Cosserat-Segment (PCS) dynamics model with a convex-polygon distance approximation metric, named Differentiable Conservative Separating Axis Theorem (DCSAT), based on the soft robot geometry to enable real-time, whole-body collision detection, resolution, and enforcement of the safety constraints. By embedding HOCBFs into our optimization routine, we guarantee safety, allowing, for instance, safe navigation in operational space under HOCLF-driven motion objectives. Extensive planar simulations demonstrate that our method maintains safety-bounded contacts while achieving precise shape and task-space regulation. This work thus lays a foundation for the deployment of soft robots in human-centric environments with provable safety and performance.
comment: 8 pages
Robust Set Partitioning Strategy for Malicious Information Detection in Large-Scale Internet of Things
With the rapid development of the Internet of Things (IoT), the risks of data tampering and malicious information injection have intensified, making efficient threat detection in large-scale distributed sensor networks a pressing challenge. To address the decline in malicious information detection efficiency as network scale expands, this paper investigates a robust set partitioning strategy and, on this basis, develops a distributed attack detection framework with theoretical guarantees. Specifically, we introduce a gain mutual influence metric to characterize the inter-subset interference arising during gain updates, thereby revealing the fundamental reason for the performance gap between distributed and centralized algorithms. Building on this insight, the set partitioning strategy based on Grassmann distance is proposed, which significantly reduces the computational cost of gain updates while maintaining detection performance, and ensures that the distributed setting under subset partitioning preserves the same theoretical performance bound as the baseline algorithm. Unlike conventional clustering methods, the proposed set partitioning strategy leverages the intrinsic observational features of sensors for robust partitioning, thereby enhancing resilience to noise and interference. Simulation results demonstrate that the proposed method limits the performance gap between distributed and centralized detection to no more than 1.648$\%$, while the computational cost decreases at an order of $O(1/m)$ with the number of subsets $m$. Therefore, the proposed algorithm effectively reduces computational overhead while preserving detection accuracy, offering a practical low-cost and highly reliable security detection solution for edge nodes in large-scale IoT systems.
comment: 24 pages, 5 figures
Discovery Learning accelerates battery design evaluation
Fast and reliable validation of novel designs in complex physical systems such as batteries is critical to accelerating technological innovation. However, battery research and development remain bottlenecked by the prohibitively high time and energy costs required to evaluate numerous new design candidates, particularly in battery prototyping and life testing. Despite recent progress in data-driven battery lifetime prediction, existing methods require labeled data of target designs to improve accuracy and cannot make reliable predictions until after prototyping, thus falling far short of the efficiency needed to enable rapid feedback for battery design. Here, we introduce Discovery Learning (DL), a scientific machine-learning paradigm that integrates active learning, physics-guided learning, and zero-shot learning into a human-like reasoning loop, drawing inspiration from learning theories in educational psychology. DL can learn from historical battery designs and actively reduce the need for prototyping, thus enabling rapid lifetime evaluation for unobserved material-design combinations without requiring additional data labeling. To test DL, we present 123 industrial-grade large-format lithium-ion pouch cells, spanning eight material-design combinations and diverse cycling protocols. Trained solely on public datasets of small-capacity cylindrical cells, DL achieves 7.2% test error in predicting the average cycle life under unknown device variability. This results in savings of 98% in time and 95% in energy compared to industrial practices. This work highlights the potential of uncovering insights from historical designs to inform and accelerate the development of next-generation battery technologies. DL represents a key advance toward efficient data-driven modeling and helps realize the promise of machine learning for accelerating scientific discovery and engineering innovation.
comment: Main text, 20 pages, 5 figures
Application of Battery Storage to Switching Predictive Control of Power Distribution Systems Including Road Heating
In regions with heavy snowfall, the living environment is becoming a serious problem due to heavy snow accumulation. A road heating is an electrical device which promotes snow melting by burying a heating cable as a thermal source underground in such regions. When integrating the road heating into power distribution systems, we need to optimize the flow of electric power by appropriately integrating distributed power sources and conventional power distribution equipment. In this paper, we introduce a battery storage to the power distribution system including road heating, and extend the predictive switching control of the systems due to the authors' previous study to the case where battery storage is installed. As a main result, we propose a predictive switching control that utilizes photovoltaic (PV) power generation and surplus power stored in the battery storage effectively, and achieves the reduction of distribution loss, attenuation of voltage fluctuation, and efficient snow melting, simultaneously. We verify the effectiveness of the application of battery storage through numerical simulation using actual time series data of weather conditions and active power of the PV power generation and load.
comment: 14 pages, 14 figures
Contact feedback helps snake robots propel against uneven terrain using vertical bending
Snakes can bend their elongate bodies in various forms to traverse various environments. We understand how snakes use lateral bending to push against asperities on flat ground for propulsion, and snake robots can do so effectively. However, snakes can also use vertical bending to push against terrain of large height variation for propulsion, and they can adjust it to adapt to novel terrain presumably using mechano-sensing feedback control. Although some snake robots can traverse uneven terrain, few have used vertical bending for propulsion, and how to control it in novel environments is poorly understood. Here we systematically studied a snake robot with force sensors pushing against large bumps using vertical bending to understand the role of sensory feedback control. We compared a feedforward controller and four feedback controllers that use different senses and generate distinct bending patterns and body-terrain interaction. We challenged the robot with increasing backward load and novel terrain geometry that break its contact with the terrain. We further varied how much the feedback control modulated bending to conform to or push against the terrain to test their effects. Feedforward propagation of vertical bending generated large propulsion when the shape matched terrain geometry. However, when perturbations caused loss of contact, the robot easily lost propulsion or had motor overload. Contact feedback control resolved these by improving contact. Yet excessive conformation interrupted propagation and excessive pushing stalled motors. Unlike that using lateral bending, for propulsion generation using vertical bending, body weight can help maintain contact with the environment but may also overload motors. Our results will help snake robots better traverse terrain with large height variation and can inform how snakes use sensory feedback to control vertical bending for propulsion.
comment: 62 pages, 20 figures
The Asymptotic Behavior of Attention in Transformers
The transformer architecture has become the foundation of modern Large Language Models (LLMs), yet its theoretical properties are still not well understood. As with classic neural networks, a common approach to improve these models is to increase their size and depth. However, such strategies may be suboptimal, as several works have shown that adding more layers yields increasingly diminishing returns. More importantly, prior studies have shown that increasing depth may lead to model collapse, i.e., all the tokens converge to a single cluster, undermining the ability of LLMs to generate diverse outputs. Building on differential equation models for the transformer dynamics, we prove that all the tokens in a transformer asymptotically converge to a cluster as depth increases. At the technical level we leverage tools from control theory, including consensus dynamics on manifolds and input-to-state stability (ISS). We then extend our analysis to autoregressive models, exploiting their structure to further generalize the theoretical guarantees.
Linear Phase Balancing Scheme using Voltage Unbalance Sensitivities in Multi-phase Power Distribution Grids
Power distribution networks, especially in North America, are often unbalanced due to the mix of single-, two- and three-phase networks as well as due to the high penetration of single-phase devices at the distribution level such as electric vehicle (EV) chargers and single-phase solar plants. However, the network operator must adhere to the voltage unbalance levels within the limits specified by IEEE, IEC, and NEMA standards for the safety of the equipment as well as the efficiency of the network operation. Existing works have proposed active and reactive power control in the network to minimize imbalances. However, these optimization problems are highly nonlinear and nonconvex due to the inherent non-linearity of unbalanced metrics and power-flow equations. In this work, we propose a linearization approach of unbalance metrics such as voltage unbalance factors (VUF), phase voltage unbalance rate (PVUR), and line voltage unbalance rate (LVUR) using the first order Taylor's approximation. This linearization is then applied to the phase balancing control scheme; it is formulated as a feedback approach where the linearization is updated successively after the active/reactive control setpoint has been actuated and shows improvement in voltage imbalances. We demonstrate the application of the proposed scheme on a standard IEEE benchmark test case, demonstrating its effectiveness.
comment: To appear at 64th IEEE Conference on Decision and Control
On the Sharp Input-Output Analysis of Nonlinear Systems under Adversarial Attacks
This paper is concerned with learning the input-output mapping of general nonlinear dynamical systems. While the existing literature focuses on Gaussian inputs and benign disturbances, we significantly broaden the scope of admissible control inputs and allow correlated, nonzero-mean, adversarial disturbances. With our reformulation as a linear combination of basis functions, we prove that the $\ell_2$-norm estimator overcomes the challenges as long as the probability that the system is under adversarial attack at a given time is smaller than a certain threshold. We provide an estimation error bound that decays with the input memory length and prove its optimality by constructing a problem instance that suffers from the same bound under adversarial attacks. Our work provides a sharp input-output analysis for a generic nonlinear and partially observed system under significantly generalized assumptions compared to existing works.
comment: 26 pages, 3 figures
GNN-DT: Graph Neural Network Enhanced Decision Transformer for Efficient Optimization in Dynamic Environments
Reinforcement Learning (RL) methods used for solving real-world optimization problems often involve dynamic state-action spaces, larger scale, and sparse rewards, leading to significant challenges in convergence, scalability, and efficient exploration of the solution space. This study introduces GNN-DT, a novel Decision Transformer (DT) architecture that integrates Graph Neural Network (GNN) embedders with a novel residual connection between input and output tokens crucial for handling dynamic environments. By learning from previously collected trajectories, GNN-DT tackles the sparse rewards limitations of online RL algorithms and delivers high-quality solutions in real-time. We evaluate GNN-DT on the complex electric vehicle (EV) charging optimization problem and prove that its performance is superior and requires significantly fewer training trajectories, thus improving sample efficiency compared to existing DT and offline RL baselines. Furthermore, GNN-DT exhibits robust generalization to unseen environments and larger action spaces, addressing a critical gap in prior offline and online RL approaches.
Multiagent Systems
AbideGym: Turning Static RL Worlds into Adaptive Challenges
Agents trained with reinforcement learning often develop brittle policies that fail when dynamics shift, a problem amplified by static benchmarks. AbideGym, a dynamic MiniGrid wrapper, introduces agent-aware perturbations and scalable complexity to enforce intra-episode adaptation. By exposing weaknesses in static policies and promoting resilience, AbideGym provides a modular, reproducible evaluation framework for advancing research in curriculum learning, continual learning, and robust generalization.
ToMPO: Training LLM Strategic Decision Making from a Multi-Agent Perspective
Large Language Models (LLMs) have been used to make decisions in complex scenarios, where they need models to think deeply, reason logically, and decide wisely. Many existing studies focus solely on multi-round conversations in social tasks or simulated environments, neglecting the various types of decisions and their interdependence. Current reinforcement learning methods struggle to consider the strategies of others during training. To address these issues, we first define a strategic decision-making problem that includes two types of decisions and their temporal dependencies. Furthermore, we propose **T**heory **o**f **M**ind **P**olicy **O**ptimization **(ToMPO)** algorithm to optimize the perception of other individual strategies and the game situation trends. Compared to the Group Relative Policy Optimization (GRPO) algorithm, ToMPO enhances the LLM's strategic decision-making mainly by: 1) generating rollouts based on reasoning the strategies of other individuals, 2) estimating advantages at both the graph-level and sample-level, and 3) balancing global and partial rewards. The ToMPO algorithm outperforms the GRPO method by 35% in terms of model output compliance and cooperative outcomes. Additionally, when compared to models with parameter sizes 100 times larger, it shows an 18% improvement. This demonstrates the effectiveness of the ToMPO algorithm in enhancing the model's strategic decision-making capabilities.
comment: 22 pages, 14 figures
Study on Locomotive Epidemic Dynamics in a Stochastic Spatio-Temporal Simulation Model on a Multiplex Network
This study presents an integrated approach to understanding epidemic dynamics through a stochastic spatio-temporal simulation model on a multiplex network, blending physical and informational layers. The physical layer maps the geographic movement of individuals, while the information layer tracks the spread of knowledge and health behavior via social interactions. We explore the interplay between physical mobility, information flow, and epidemic outcomes by simulating disease spread within this dual-structured network. Our model employs stochastic elements to mirror human behavior, mobility, and information dissemination uncertainties. Through simulations, we assess the impact of network structure, mobility patterns, and information spread speed on epidemic dynamics. The findings highlight the crucial role of effective communication in curbing disease transmission, even in highly mobile societies. Additionally, our agent-based simulation allows for real-time scenario analysis through a user interface, offering insights into leveraging physical and informational networks for epidemic control. This research sheds light on designing strategic interventions in complex social systems to manage disease outbreaks.
A Category Theoretic Approach to Approximate Game Theory
This paper uses category theory to develop an entirely new approach to approximate game theory. Game theory is the study of how different agents within a multi-agent system take decisions. At its core, game theory asks what an optimal decision is in a given scenario. Thus approximate game theory asks what is an approximately optimal decision in a given scenario. This is important in practice as -- just like in much of computing -- exact answers maybe too difficult to compute or even impossible to compute given inherent uncertainty in input. We consider first "Selection Functions" which are functions and develop a simple yet robust model of approximate equilibria. We develop the algebraic properties of approximation wrt selection functions and also relate approximation to the compositional structure of selection functions. We then repeat this process successfully for Open Games -- a more advanced model of game theory.
comment: In Proceedings ACT 2024, arXiv:2509.18357
Fairy: Interactive Mobile Assistant to Real-world Tasks via LMM-based Multi-agent
Large multi-modal models (LMMs) have advanced mobile GUI agents. However, existing methods struggle with real-world scenarios involving diverse app interfaces and evolving user needs. End-to-end methods relying on model's commonsense often fail on long-tail apps, and agents without user interaction act unilaterally, harming user experience. To address these limitations, we propose Fairy, an interactive multi-agent mobile assistant capable of continuously accumulating app knowledge and self-evolving during usage. Fairy enables cross-app collaboration, interactive execution, and continual learning through three core modules:(i) a Global Task Planner that decomposes user tasks into sub-tasks from a cross-app view; (ii) an App-Level Executor that refines sub-tasks into steps and actions based on long- and short-term memory, achieving precise execution and user interaction via four core agents operating in dual loops; and (iii) a Self-Learner that consolidates execution experience into App Map and Tricks. To evaluate Fairy, we introduce RealMobile-Eval, a real-world benchmark with a comprehensive metric suite, and LMM-based agents for automated scoring. Experiments show that Fairy with GPT-4o backbone outperforms the previous SoTA by improving user requirement completion by 33.7% and reducing redundant steps by 58.5%, showing the effectiveness of its interaction and self-learning.
comment: 20 pages, 12 figures
Multi-Objective Reinforcement Learning for Large Language Model Optimization: Visionary Perspective ECAI
Multi-Objective Reinforcement Learning (MORL) presents significant challenges and opportunities for optimizing multiple objectives in Large Language Models (LLMs). We introduce a MORL taxonomy and examine the advantages and limitations of various MORL methods when applied to LLM optimization, identifying the need for efficient and flexible approaches that accommodate personalization functionality and inherent complexities in LLMs and RL. We propose a vision for a MORL benchmarking framework that addresses the effects of different methods on diverse objective relationships. As future research directions, we focus on meta-policy MORL development that can improve efficiency and flexibility through its bi-level learning paradigm, highlighting key research questions and potential solutions for improving LLM performance.
comment: 3 pages, 1 figure, accepted by ECAI MODeM 2025
AutoClimDS: Climate Data Science Agentic AI -- A Knowledge Graph is All You Need
Climate data science faces persistent barriers stemming from the fragmented nature of data sources, heterogeneous formats, and the steep technical expertise required to identify, acquire, and process datasets. These challenges limit participation, slow discovery, and reduce the reproducibility of scientific workflows. In this paper, we present a proof of concept for addressing these barriers through the integration of a curated knowledge graph (KG) with AI agents designed for cloud-native scientific workflows. The KG provides a unifying layer that organizes datasets, tools, and workflows, while AI agents -- powered by generative AI services -- enable natural language interaction, automated data access, and streamlined analysis. Together, these components drastically lower the technical threshold for engaging in climate data science, enabling non-specialist users to identify and analyze relevant datasets. By leveraging existing cloud-ready API data portals, we demonstrate that "a knowledge graph is all you need" to unlock scalable and agentic workflows for scientific inquiry. The open-source design of our system further supports community contributions, ensuring that the KG and associated tools can evolve as a shared commons. Our results illustrate a pathway toward democratizing access to climate data and establishing a reproducible, extensible framework for human--AI collaboration in scientific research.
Aegis: Automated Error Generation and Identification for Multi-Agent Systems
As Multi-Agent Systems (MAS) become increasingly autonomous and complex, understanding their error modes is critical for ensuring their reliability and safety. However, research in this area has been severely hampered by the lack of large-scale, diverse datasets with precise, ground-truth error labels. To address this bottleneck, we introduce \textbf{AEGIS}, a novel framework for \textbf{A}utomated \textbf{E}rror \textbf{G}eneration and \textbf{I}dentification for Multi-Agent \textbf{S}ystems. By systematically injecting controllable and traceable errors into initially successful trajectories, we create a rich dataset of realistic failures. This is achieved using a context-aware, LLM-based adaptive manipulator that performs sophisticated attacks like prompt injection and response corruption to induce specific, predefined error modes. We demonstrate the value of our dataset by exploring three distinct learning paradigms for the error identification task: Supervised Fine-Tuning, Reinforcement Learning, and Contrastive Learning. Our comprehensive experiments show that models trained on AEGIS data achieve substantial improvements across all three learning paradigms. Notably, several of our fine-tuned models demonstrate performance competitive with or superior to proprietary systems an order of magnitude larger, validating our automated data generation framework as a crucial resource for developing more robust and interpretable multi-agent systems. Our project website is available at https://kfq20.github.io/AEGIS-Website.
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents EMNLP 2025
Large Language Models (LLMs) based agent systems have made great strides in real-world applications beyond traditional NLP tasks. This paper proposes a new LLM-based Multi-Agent System (LLM-MAS) benchmark, Collab-Overcooked, built on the popular Overcooked-AI game with more applicable and challenging tasks in interactive environments. Collab-Overcooked extends existing benchmarks in two novel ways. First, it provides a multi-agent framework supporting diverse tasks and objectives and encourages collaboration through natural language communication. Second, it introduces a spectrum of process-oriented evaluation metrics to assess the fine-grained collaboration capabilities of different LLM agents, a dimension often overlooked in prior work. We conduct extensive experiments with 13 popular LLMs and show that, while the LLMs exhibit a strong ability in goal interpretation, there are significant shortcomings in active collaboration and continuous adaptation, which are critical for efficiently fulfilling complex tasks. Notably, we highlight the strengths and weaknesses of LLM-MAS and provide insights for improving and evaluating LLM-MAS on a unified and open-source benchmark. The environments, 30 open-ended tasks, and the evaluation package are publicly available at https://github.com/YusaeMeow/Collab-Overcooked.
comment: Accepted to EMNLP 2025 Main Conference. Camera-Ready Version. 30 pages, 17 figures
Language-Guided Multi-Agent Learning in Simulations: A Unified Framework and Evaluation
This paper introduces LLM-MARL, a unified framework that incorporates large language models (LLMs) into multi-agent reinforcement learning (MARL) to enhance coordination, communication, and generalization in simulated game environments. The framework features three modular components of Coordinator, Communicator, and Memory, which dynamically generate subgoals, facilitate symbolic inter-agent messaging, and support episodic recall. Training combines PPO with a language-conditioned loss and LLM query gating. LLM-MARL is evaluated in Google Research Football, MAgent Battle, and StarCraft II. Results show consistent improvements over MAPPO and QMIX in win rate, coordination score, and zero-shot generalization. Ablation studies demonstrate that subgoal generation and language-based messaging each contribute significantly to performance gains. Qualitative analysis reveals emergent behaviors such as role specialization and communication-driven tactics. By bridging language modeling and policy learning, this work contributes to the design of intelligent, cooperative agents in interactive simulations. It offers a path forward for leveraging LLMs in multi-agent systems used for training, games, and human-AI collaboration.
VDFD: Multi-Agent Value Decomposition Framework with Disentangled World Model
In this paper, we propose a novel model-based multi-agent reinforcement learning approach named Value Decomposition Framework with Disentangled World Model to address the challenge of achieving a common goal of multiple agents interacting in the same environment with reduced sample complexity. Due to scalability and non-stationarity problems posed by multi-agent systems, model-free methods rely on a considerable number of samples for training. In contrast, we use a modularized world model, composed of action-conditioned, action-free, and static branches, to unravel the complicated environment dynamics. Our model produces imagined outcomes based on past experience, without sampling directly from the real environment. We employ variational auto-encoders and variational graph auto-encoders to learn the latent representations for the world model, which is merged with a value-based framework to predict the joint action-value function and optimize the overall training objective. Experimental results on StarCraft II micro-management, Multi-Agent MuJoCo, and Level-Based Foraging challenges demonstrate that our method achieves high sample efficiency and exhibits superior performance compared to other baselines across a wide range of multi-agent learning tasks.
comment: 43 pages
Network Formation and Dynamics Among Multi-LLMs
Social networks profoundly influence how humans form opinions, exchange information, and organize collectively. As large language models (LLMs) are increasingly embedded into social and professional environments, it is critical to understand whether their interactions approximate human-like network dynamics. We develop a framework to study the network formation behaviors of multiple LLM agents and benchmark them against human decisions. Across synthetic and real-world settings, including friendship, telecommunication, and employment networks, we find that LLMs consistently reproduce fundamental micro-level principles such as preferential attachment, triadic closure, and homophily, as well as macro-level properties including community structure and small-world effects. Importantly, the relative emphasis of these principles adapts to context: for example, LLMs favor homophily in friendship networks but heterophily in organizational settings, mirroring patterns of social mobility. A controlled human-subject survey confirms strong alignment between LLMs and human participants in link-formation decisions. These results establish that LLMs can serve as powerful tools for social simulation and synthetic data generation, while also raising critical questions about bias, fairness, and the design of AI systems that participate in human networks.
comment: Accepted at PNAS Nexus
HiddenBench: Assessing Collective Reasoning in Multi-Agent LLMs via Hidden Profile Tasks
Multi-agent systems built on large language models (LLMs) promise enhanced problem-solving through distributed information integration, but may also replicate collective reasoning failures observed in human groups. Yet the absence of a theory-grounded benchmark makes it difficult to systematically evaluate and improve such reasoning. We introduce HiddenBench, the first benchmark for evaluating collective reasoning in multi-agent LLMs. It builds on the Hidden Profile paradigm from social psychology, where individuals each hold asymmetric pieces of information and must communicate to reach the correct decision. To ground the benchmark, we formalize the paradigm with custom tasks and show that GPT-4.1 groups fail to integrate distributed knowledge, exhibiting human-like collective reasoning failures that persist even with varied prompting strategies. We then construct the full benchmark, spanning 65 tasks drawn from custom designs, prior human studies, and automatic generation. Evaluating 15 LLMs across four model families, HiddenBench exposes persistent limitations while also providing comparative insights: some models (e.g., Gemini-2.5-Flash/Pro) achieve higher performance, yet scale and reasoning are not reliable indicators of stronger collective reasoning. Our work delivers the first reproducible benchmark for collective reasoning in multi-agent LLMs, offering diagnostic insight and a foundation for future research on artificial collective intelligence.
Robotics
BBoE: Leveraging Bundle of Edges for Kinodynamic Bidirectional Motion Planning
In this work, we introduce BBoE, a bidirectional, kinodynamic, sampling-based motion planner that consistently and quickly finds low-cost solutions in environments with varying obstacle clutter. The algorithm combines exploration and exploitation while relying on precomputed robot state traversals, resulting in efficient convergence towards the goal. Our key contributions include: i) a strategy to navigate through obstacle-rich spaces by sorting and sequencing preprocessed forward propagations; and ii) BBoE, a robust bidirectional kinodynamic planner that utilizes this strategy to produce fast and feasible solutions. The proposed framework reduces planning time, diminishes solution cost and increases success rate in comparison to previous approaches.
comment: 8 Pages, 7 Figures
Video models are zero-shot learners and reasoners
The remarkable zero-shot capabilities of Large Language Models (LLMs) have propelled natural language processing from task-specific models to unified, generalist foundation models. This transformation emerged from simple primitives: large, generative models trained on web-scale data. Curiously, the same primitives apply to today's generative video models. Could video models be on a trajectory towards general-purpose vision understanding, much like LLMs developed general-purpose language understanding? We demonstrate that Veo 3 can solve a broad variety of tasks it wasn't explicitly trained for: segmenting objects, detecting edges, editing images, understanding physical properties, recognizing object affordances, simulating tool use, and more. These abilities to perceive, model, and manipulate the visual world enable early forms of visual reasoning like maze and symmetry solving. Veo's emergent zero-shot capabilities indicate that video models are on a path to becoming unified, generalist vision foundation models.
comment: Project page: https://video-zero-shot.github.io/
VisualMimic: Visual Humanoid Loco-Manipulation via Motion Tracking and Generation
Humanoid loco-manipulation in unstructured environments demands tight integration of egocentric perception and whole-body control. However, existing approaches either depend on external motion capture systems or fail to generalize across diverse tasks. We introduce VisualMimic, a visual sim-to-real framework that unifies egocentric vision with hierarchical whole-body control for humanoid robots. VisualMimic combines a task-agnostic low-level keypoint tracker -- trained from human motion data via a teacher-student scheme -- with a task-specific high-level policy that generates keypoint commands from visual and proprioceptive input. To ensure stable training, we inject noise into the low-level policy and clip high-level actions using human motion statistics. VisualMimic enables zero-shot transfer of visuomotor policies trained in simulation to real humanoid robots, accomplishing a wide range of loco-manipulation tasks such as box lifting, pushing, football dribbling, and kicking. Beyond controlled laboratory settings, our policies also generalize robustly to outdoor environments. Videos are available at: https://visualmimic.github.io .
comment: Website: https://visualmimic.github.io
On Robustness of Consensus over Pseudo-Undirected Path Graphs
Consensus over networked agents is typically studied using undirected or directed communication graphs. Undirected graphs enforce symmetry in information exchange, leading to convergence to the average of initial states, while directed graphs permit asymmetry but make consensus dependent on root nodes and their influence. Both paradigms impose inherent restrictions on achievable consensus values and network robustness. This paper introduces a theoretical framework for achieving consensus over a class of network topologies, termed pseudo-undirected graphs, which retains bidirectional connectivity between node pairs but allows the corresponding edge weights to differ, including the possibility of negative values under bounded conditions. The resulting Laplacian is generally non-symmetric, yet it guarantees consensus under connectivity assumptions, to expand the solution space, which enables the system to achieve a stable consensus value that can lie outside the convex hull of the initial state set. We derive admissibility bounds for negative weights for a pseudo-undirected path graph, and show an application in the simultaneous interception of a moving target.
mindmap: Spatial Memory in Deep Feature Maps for 3D Action Policies
End-to-end learning of robot control policies, structured as neural networks, has emerged as a promising approach to robotic manipulation. To complete many common tasks, relevant objects are required to pass in and out of a robot's field of view. In these settings, spatial memory - the ability to remember the spatial composition of the scene - is an important competency. However, building such mechanisms into robot learning systems remains an open research problem. We introduce mindmap (Spatial Memory in Deep Feature Maps for 3D Action Policies), a 3D diffusion policy that generates robot trajectories based on a semantic 3D reconstruction of the environment. We show in simulation experiments that our approach is effective at solving tasks where state-of-the-art approaches without memory mechanisms struggle. We release our reconstruction system, training code, and evaluation tasks to spur research in this direction.
comment: Accepted to CoRL 2025 Workshop RemembeRL
Parse-Augment-Distill: Learning Generalizable Bimanual Visuomotor Policies from Single Human Video
Learning visuomotor policies from expert demonstrations is an important frontier in modern robotics research, however, most popular methods require copious efforts for collecting teleoperation data and struggle to generalize out-ofdistribution. Scaling data collection has been explored through leveraging human videos, as well as demonstration augmentation techniques. The latter approach typically requires expensive simulation rollouts and trains policies with synthetic image data, therefore introducing a sim-to-real gap. In parallel, alternative state representations such as keypoints have shown great promise for category-level generalization. In this work, we bring these avenues together in a unified framework: PAD (Parse-AugmentDistill), for learning generalizable bimanual policies from a single human video. Our method relies on three steps: (a) parsing a human video demo into a robot-executable keypoint-action trajectory, (b) employing bimanual task-and-motion-planning to augment the demonstration at scale without simulators, and (c) distilling the augmented trajectories into a keypoint-conditioned policy. Empirically, we showcase that PAD outperforms state-ofthe-art bimanual demonstration augmentation works relying on image policies with simulation rollouts, both in terms of success rate and sample/cost efficiency. We deploy our framework in six diverse real-world bimanual tasks such as pouring drinks, cleaning trash and opening containers, producing one-shot policies that generalize in unseen spatial arrangements, object instances and background distractors. Supplementary material can be found in the project webpage https://gtziafas.github.io/PAD_project/.
HL-IK: A Lightweight Implementation of Human-Like Inverse Kinematics in Humanoid Arms
Traditional IK methods for redundant humanoid manipulators emphasize end-effector (EE) tracking, frequently producing configurations that are valid mechanically but not human-like. We present Human-Like Inverse Kinematics (HL-IK), a lightweight IK framework that preserves EE tracking while shaping whole-arm configurations to appear human-like, without full-body sensing at runtime. The key idea is a learned elbow prior: using large-scale human motion data retargeted to the robot, we train a FiLM-modulated spatio-temporal attention network (FiSTA) to predict the next-step elbow pose from the EE target and a short history of EE-elbow states.This prediction is incorporated as a small residual alongside EE and smoothness terms in a standard Levenberg-Marquardt optimizer, making HL-IK a drop-in addition to numerical IK stacks. Over 183k simulation steps, HL-IK reduces arm-similarity position and direction error by 30.6% and 35.4% on average, and by 42.2% and 47.4% on the most challenging trajectories. Hardware teleoperation on a robot distinct from simulation further confirms the gains in anthropomorphism. HL-IK is simple to integrate, adaptable across platforms via our pipeline, and adds minimal computation, enabling human-like motions for humanoid robots. Project page: https://hl-ik.github.io/
AnchDrive: Bootstrapping Diffusion Policies with Hybrid Trajectory Anchors for End-to-End Driving
End-to-end multi-modal planning has become a transformative paradigm in autonomous driving, effectively addressing behavioral multi-modality and the generalization challenge in long-tail scenarios. We propose AnchDrive, a framework for end-to-end driving that effectively bootstraps a diffusion policy to mitigate the high computational cost of traditional generative models. Rather than denoising from pure noise, AnchDrive initializes its planner with a rich set of hybrid trajectory anchors. These anchors are derived from two complementary sources: a static vocabulary of general driving priors and a set of dynamic, context-aware trajectories. The dynamic trajectories are decoded in real-time by a Transformer that processes dense and sparse perceptual features. The diffusion model then learns to refine these anchors by predicting a distribution of trajectory offsets, enabling fine-grained refinement. This anchor-based bootstrapping design allows for efficient generation of diverse, high-quality trajectories. Experiments on the NAVSIM benchmark confirm that AnchDrive sets a new state-of-the-art and shows strong gen?eralizability
comment: IWACIII 2025
Techno-Economic analysis for Smart Hangar inspection operations through Sensing and Localisation at scale
The accuracy, resilience, and affordability of localisation are fundamental to autonomous robotic inspection within aircraft maintenance and overhaul (MRO) hangars. Hangars typically feature tall ceilings and are often made of materials such as metal. Due to its nature, it is considered a GPS-denied environment, with extensive multipath effects and stringent operational constraints that collectively create a uniquely challenging environment. This persistent gap highlights the need for domain-specific comparative studies, including rigorous cost, accuracy, and integration assessments, to inform a reliable and scalable deployment of a localisation system in the Smart Hangar. This paper presents the first techno-economic roadmap that benchmarks motion capture (MoCap), ultra-wideband (UWB), and a ceiling-mounted camera network across three operational scenarios: robot localisation, asset tracking, and surface defect detection within a 40x50 m hangar bay. A dual-layer optimisation for camera selection and positioning framework is introduced, which couples market-based camera-lens selection with an optimisation solver, producing camera layouts that minimise hardware while meeting accuracy targets. The roadmap equips MRO planners with an actionable method to balance accuracy, coverage, and budget, demonstrating that an optimised vision architecture has the potential to unlock robust and cost-effective sensing for next-generation Smart Hangars.
A Biomimetic Vertebraic Soft Robotic Tail for High-Speed, High-Force Dynamic Maneuvering
Robotic tails can enhance the stability and maneuverability of mobile robots, but current designs face a trade-off between the power of rigid systems and the safety of soft ones. Rigid tails generate large inertial effects but pose risks in unstructured environments, while soft tails lack sufficient speed and force. We present a Biomimetic Vertebraic Soft Robotic (BVSR) tail that resolves this challenge through a compliant pneumatic body reinforced by a passively jointed vertebral column inspired by musculoskeletal structures. This hybrid design decouples load-bearing and actuation, enabling high-pressure actuation (up to 6 bar) for superior dynamics while preserving compliance. A dedicated kinematic and dynamic model incorporating vertebral constraints is developed and validated experimentally. The BVSR tail achieves angular velocities above 670{\deg}/s and generates inertial forces and torques up to 5.58 N and 1.21 Nm, indicating over 200% improvement compared to non-vertebraic designs. Demonstrations on rapid cart stabilization, obstacle negotiation, high-speed steering, and quadruped integration confirm its versatility and practical utility for agile robotic platforms.
comment: 20 pages, 11 figures, 4 tables. Submitted Under Review
Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving
End-to-End (E2E) solutions have emerged as a mainstream approach for autonomous driving systems, with Vision-Language-Action (VLA) models representing a new paradigm that leverages pre-trained multimodal knowledge from Vision-Language Models (VLMs) to interpret and interact with complex real-world environments. However, these methods remain constrained by the limitations of imitation learning, which struggles to inherently encode physical rules during training. Existing approaches often rely on complex rule-based post-refinement, employ reinforcement learning that remains largely limited to simulation, or utilize diffusion guidance that requires computationally expensive gradient calculations. To address these challenges, we introduce ReflectDrive, a novel learning-based framework that integrates a reflection mechanism for safe trajectory generation via discrete diffusion. We first discretize the two-dimensional driving space to construct an action codebook, enabling the use of pre-trained Diffusion Language Models for planning tasks through fine-tuning. Central to our approach is a safety-aware reflection mechanism that performs iterative self-correction without gradient computation. Our method begins with goal-conditioned trajectory generation to model multi-modal driving behaviors. Based on this, we apply local search methods to identify unsafe tokens and determine feasible solutions, which then serve as safe anchors for inpainting-based regeneration. Evaluated on the NAVSIM benchmark, ReflectDrive demonstrates significant advantages in safety-critical trajectory generation, offering a scalable and reliable solution for autonomous driving systems.
Hyperspectral Adapter for Semantic Segmentation with Vision Foundation Models
Hyperspectral imaging (HSI) captures spatial information along with dense spectral measurements across numerous narrow wavelength bands. This rich spectral content has the potential to facilitate robust robotic perception, particularly in environments with complex material compositions, varying illumination, or other visually challenging conditions. However, current HSI semantic segmentation methods underperform due to their reliance on architectures and learning frameworks optimized for RGB inputs. In this work, we propose a novel hyperspectral adapter that leverages pretrained vision foundation models to effectively learn from hyperspectral data. Our architecture incorporates a spectral transformer and a spectrum-aware spatial prior module to extract rich spatial-spectral features. Additionally, we introduce a modality-aware interaction block that facilitates effective integration of hyperspectral representations and frozen vision Transformer features through dedicated extraction and injection mechanisms. Extensive evaluations on three benchmark autonomous driving datasets demonstrate that our architecture achieves state-of-the-art semantic segmentation performance while directly using HSI inputs, outperforming both vision-based and hyperspectral segmentation methods. We make the code available at https://hyperspectraladapter.cs.uni-freiburg.de.
Hybrid Safety Verification of Multi-Agent Systems using $ψ$-Weighted CBFs and PAC Guarantees
This study proposes a hybrid safety verification framework for closed-loop multi-agent systems under bounded stochastic disturbances. The proposed approach augments control barrier functions with a novel $\psi$-weighted formulation that encodes directional control alignment between agents into the safety constraints. Deterministic admissibility is combined with empirical validation via Monte Carlo rollouts, and a PAC-style guarantee is derived based on margin-aware safety violations to provide a probabilistic safety certificate. The results from the experiments conducted under different bounded stochastic disturbances validate the feasibility of the proposed approach.
C-3TO: Continuous 3D Trajectory Optimization on Neural Euclidean Signed Distance Fields ICRA 2026
This paper introduces a novel framework for continuous 3D trajectory optimization in cluttered environments, leveraging online neural Euclidean Signed Distance Fields (ESDFs). Unlike prior approaches that rely on discretized ESDF grids with interpolation, our method directly optimizes smooth trajectories represented by fifth-order polynomials over a continuous neural ESDF, ensuring precise gradient information throughout the entire trajectory. The framework integrates a two-stage nonlinear optimization pipeline that balances efficiency, safety and smoothness. Experimental results demonstrate that C-3TO produces collision-aware and dynamically feasible trajectories. Moreover, its flexibility in defining local window sizes and optimization parameters enables straightforward adaptation to diverse user's needs without compromising performance. By combining continuous trajectory parameterization with a continuously updated neural ESDF, C-3TO establishes a robust and generalizable foundation for safe and efficient local replanning in aerial robotics.
comment: 9 pages, 5 figures, submitted to ICRA 2026
Orbital Stabilization and Time Synchronization of Unstable Periodic Motions in Underactuated Robots
This paper presents a control methodology for achieving orbital stabilization with simultaneous time synchronization of periodic trajectories in underactuated robotic systems. The proposed approach extends the classical transverse linearization framework to explicitly incorporate time-desynchronization dynamics. To stabilize the resulting extended transverse dynamics, we employ a combination of time-varying LQR and sliding-mode control. The theoretical results are validated experimentally through the implementation of both centralized and decentralized control strategies on a group of six Butterfly robots.
DB-TSDF: Directional Bitmask-based Truncated Signed Distance Fields for Efficient Volumetric Mapping
This paper presents a high-efficiency, CPU-only volumetric mapping framework based on a Truncated Signed Distance Field (TSDF). The system incrementally fuses raw LiDAR point-cloud data into a voxel grid using a directional bitmask-based integration scheme, producing dense and consistent TSDF representations suitable for real-time 3D reconstruction. A key feature of the approach is that the processing time per point-cloud remains constant, regardless of the voxel grid resolution, enabling high resolution mapping without sacrificing runtime performance. In contrast to most recent TSDF/ESDF methods that rely on GPU acceleration, our method operates entirely on CPU, achieving competitive results in speed. Experiments on real-world open datasets demonstrate that the generated maps attain accuracy on par with contemporary mapping techniques.
Queryable 3D Scene Representation: A Multi-Modal Framework for Semantic Reasoning and Robotic Task Planning
To enable robots to comprehend high-level human instructions and perform complex tasks, a key challenge lies in achieving comprehensive scene understanding: interpreting and interacting with the 3D environment in a meaningful way. This requires a smart map that fuses accurate geometric structure with rich, human-understandable semantics. To address this, we introduce the 3D Queryable Scene Representation (3D QSR), a novel framework built on multimedia data that unifies three complementary 3D representations: (1) 3D-consistent novel view rendering and segmentation from panoptic reconstruction, (2) precise geometry from 3D point clouds, and (3) structured, scalable organization via 3D scene graphs. Built on an object-centric design, the framework integrates with large vision-language models to enable semantic queryability by linking multimodal object embeddings, and supporting object-level retrieval of geometric, visual, and semantic information. The retrieved data are then loaded into a robotic task planner for downstream execution. We evaluate our approach through simulated robotic task planning scenarios in Unity, guided by abstract language instructions and using the indoor public dataset Replica. Furthermore, we apply it in a digital duplicate of a real wet lab environment to test QSR-supported robotic task planning for emergency response. The results demonstrate the framework's ability to facilitate scene understanding and integrate spatial and semantic reasoning, effectively translating high-level human instructions into precise robotic task planning in complex 3D environments.
LLM Trainer: Automated Robotic Data Generating via Demonstration Augmentation using LLMs ICRA 2026
We present LLM Trainer, a fully automated pipeline that leverages the world knowledge of Large Language Models (LLMs) to transform a small number of human demonstrations (as few as one) into a large robot dataset for imitation learning. Our approach decomposes demonstration generation into two steps: (1) offline demonstration annotation that extracts keyframes, salient objects, and pose-object relations; and (2) online keypose retargeting that adapts those keyframes to a new scene, given an initial observation. Using these modified keypoints, our system warps the original demonstration to generate a new trajectory, which is then executed, and the resulting demo, if successful, is saved. Because the annotation is reusable across scenes, we use Thompson sampling to optimize the annotation, significantly improving generation success rate. We evaluate our method on a range of tasks, and find that our data annotation method consistently outperforms expert-engineered baselines. We further show an ensemble policy that combines the optimized LLM feed-forward plan with a learned feedback imitation learning controller. Finally, we demonstrate hardware feasibility on a Franka Emika Panda robot. For additional materials and demonstration videos, please see the project website: https://sites.google.com/andrew.cmu.edu/llm-trainer
comment: 9 pages, 5 figures, 4 tables. Submitted to ICRA 2026
MARG: MAstering Risky Gap Terrains for Legged Robots with Elevation Mapping
Deep Reinforcement Learning (DRL) controllers for quadrupedal locomotion have demonstrated impressive performance on challenging terrains, allowing robots to execute complex skills such as climbing, running, and jumping. However, existing blind locomotion controllers often struggle to ensure safety and efficient traversal through risky gap terrains, which are typically highly complex, requiring robots to perceive terrain information and select appropriate footholds during locomotion accurately. Meanwhile, existing perception-based controllers still present several practical limitations, including a complex multi-sensor deployment system and expensive computing resource requirements. This paper proposes a DRL controller named MAstering Risky Gap Terrains (MARG), which integrates terrain maps and proprioception to dynamically adjust the action and enhance the robot's stability in these tasks. During the training phase, our controller accelerates policy optimization by selectively incorporating privileged information (e.g., center of mass, friction coefficients) that are available in simulation but unmeasurable directly in real-world deployments due to sensor limitations. We also designed three foot-related rewards to encourage the robot to explore safe footholds. More importantly, a terrain map generation (TMG) model is proposed to reduce the drift existing in mapping and provide accurate terrain maps using only one LiDAR, providing a foundation for zero-shot transfer of the learned policy. The experimental results indicate that MARG maintains stability in various risky terrain tasks.
Embodied AI: From LLMs to World Models
Embodied Artificial Intelligence (AI) is an intelligent system paradigm for achieving Artificial General Intelligence (AGI), serving as the cornerstone for various applications and driving the evolution from cyberspace to physical systems. Recent breakthroughs in Large Language Models (LLMs) and World Models (WMs) have drawn significant attention for embodied AI. On the one hand, LLMs empower embodied AI via semantic reasoning and task decomposition, bringing high-level natural language instructions and low-level natural language actions into embodied cognition. On the other hand, WMs empower embodied AI by building internal representations and future predictions of the external world, facilitating physical law-compliant embodied interactions. As such, this paper comprehensively explores the literature in embodied AI from basics to advances, covering both LLM driven and WM driven works. In particular, we first present the history, key technologies, key components, and hardware systems of embodied AI, as well as discuss its development via looking from unimodal to multimodal angle. We then scrutinize the two burgeoning fields of embodied AI, i.e., embodied AI with LLMs/multimodal LLMs (MLLMs) and embodied AI with WMs, meticulously delineating their indispensable roles in end-to-end embodied cognition and physical laws-driven embodied interactions. Building upon the above advances, we further share our insights on the necessity of the joint MLLM-WM driven embodied AI architecture, shedding light on its profound significance in enabling complex tasks within physical worlds. In addition, we examine representative applications of embodied AI, demonstrating its wide applicability in real-world scenarios. Last but not least, we point out future research directions of embodied AI that deserve further investigation.
comment: Accepted by IEEE CASM
Lidar-based Tracking of Traffic Participants with Sensor Nodes in Existing Urban Infrastructure
This paper presents a lidar-only state estimation and tracking framework, along with a roadside sensing unit for integration with existing urban infrastructure. Urban deployments demand scalable, real-time tracking solutions, yet traditional remote sensing remains costly and computationally intensive, especially under perceptually degraded conditions. Our sensor node couples a single lidar with an edge computing unit and runs a computationally efficient, GPU-free observer that simultaneously estimates object state, class, dimensions, and existence probability. The pipeline performs: (i) state updates via an extended Kalman filter, (ii) dimension estimation using a 1D grid-map/Bayesian update, (iii) class updates via a lookup table driven by the most probable footprint, and (iv) existence estimation from track age and bounding-box consistency. Experiments in dynamic urban-like scenes with diverse traffic participants demonstrate real-time performance and high precision: The complete end-to-end pipeline finishes within \SI{100}{\milli\second} for \SI{99.88}{\%} of messages, with an excellent detection rate. Robustness is further confirmed under simulated wind and sensor vibration. These results indicate that reliable, real-time roadside tracking is feasible on CPU-only edge hardware, enabling scalable, privacy-friendly deployments within existing city infrastructure. The framework integrates with existing poles, traffic lights, and buildings, reducing deployment costs and simplifying large-scale urban rollouts and maintenance efforts.
comment: 21 pages, 9 figures, this work was submitted to Wileys'Advanced Intelligent Systems for review
An effective control of large systems of active particles: An application to evacuation problem
Manipulation of large systems of active particles is a serious challenge across diverse domains, including crowd management, control of robotic swarms, and coordinated material transport. The development of advanced control strategies for complex scenarios is hindered, however, by the lack of scalability and robustness of the existing methods, in particular, due to the need of an individual control for each agent. One possible solution involves controlling a system through a leader or a group of leaders, which other agents tend to follow. Using such an approach we develop an effective control strategy for a leader, combining reinforcement learning (RL) with artificial forces acting on the system. To describe the guidance of active particles by a leader we introduce the generalized Vicsek model. This novel method is then applied to the problem of the effective evacuation by a robot-rescuer (leader) of large groups of people from hazardous places. We demonstrate, that while a straightforward application of RL yields suboptimal results, even for advanced architectures, our approach provides a robust and efficient evacuation strategy. The source code supporting this study is publicly available at: https://github.com/cinemere/evacuation.
Generalist Robot Manipulation beyond Action Labeled Data
Recent advances in generalist robot manipulation leverage pre-trained Vision-Language Models (VLMs) and large-scale robot demonstrations to tackle diverse tasks in a zero-shot manner. A key challenge remains: scaling high-quality, action-labeled robot demonstration data, which existing methods rely on for robustness and generalization. To address this, we propose a method that benefits from videos without action labels - featuring humans and/or robots in action - enhancing open-vocabulary performance and enabling data-efficient learning of new tasks. Our method extracts dense, dynamic 3D point clouds at the hand or gripper location and uses a proposed 3D dynamics predictor for self-supervision. This predictor is then tuned to an action predictor using a smaller labeled dataset for action alignment. We show that our method not only learns from unlabeled human and robot demonstrations - improving downstream generalist robot policies - but also enables robots to learn new tasks without action labels (i.e., out-of-action generalization) in both real-world and simulated settings.
comment: Accepted at Conference on Robot Learning 2025
Robot Trajectron V2: A Probabilistic Shared Control Framework for Navigation
We propose a probabilistic shared-control solution for navigation, called Robot Trajectron V2 (RT-V2), that enables accurate intent prediction and safe, effective assistance in human-robot interaction. RT-V2 jointly models a user's long-term behavioral patterns and their noisy, low-dimensional control signals by combining a prior intent model with a posterior update that accounts for real-time user input and environmental context. The prior captures the multimodal and history-dependent nature of user intent using recurrent neural networks and conditional variational autoencoders, while the posterior integrates this with uncertain user commands to infer desired actions. We conduct extensive experiments to validate RT-V2 across synthetic benchmarks, human-computer interaction studies with keyboard input, and brain-machine interface experiments with non-human primates. Results show that RT-V2 outperforms the state of the art in intent estimation, provides safe and efficient navigation support, and adequately balances user autonomy with assistive intervention. By unifying probabilistic modeling, reinforcement learning, and safe optimization, RT-V2 offers a principled and generalizable approach to shared control for diverse assistive technologies.
comment: 26 pages, 20 figures
GUIDE: A Diffusion-Based Autonomous Robot Exploration Framework Using Global Graph Inference
Autonomous exploration in structured and complex indoor environments remains a challenging task, as existing methods often struggle to appropriately model unobserved space and plan globally efficient paths. To address these limitations, we propose GUIDE, a novel exploration framework that synergistically combines global graph inference with diffusion-based decision-making. We introduce a region-evaluation global graph representation that integrates both observed environmental data and predictions of unexplored areas, enhanced by a region-level evaluation mechanism to prioritize reliable structural inferences while discounting uncertain predictions. Building upon this enriched representation, a diffusion policy network generates stable, foresighted action sequences with significantly reduced denoising steps. Extensive simulations and real-world deployments demonstrate that GUIDE consistently outperforms state-of-the-art methods, achieving up to 18.3% faster coverage completion and a 34.9% reduction in redundant movements.
D3Grasp: Diverse and Deformable Dexterous Grasping for General Objects
Achieving diverse and stable dexterous grasping for general and deformable objects remains a fundamental challenge in robotics, due to high-dimensional action spaces and uncertainty in perception. In this paper, we present D3Grasp, a multimodal perception-guided reinforcement learning framework designed to enable Diverse and Deformable Dexterous Grasping. We firstly introduce a unified multimodal representation that integrates visual and tactile perception to robustly grasp common objects with diverse properties. Second, we propose an asymmetric reinforcement learning architecture that exploits privileged information during training while preserving deployment realism, enhancing both generalization and sample efficiency. Third, we meticulously design a training strategy to synthesize contact-rich, penetration-free, and kinematically feasible grasps with enhanced adaptability to deformable and contact-sensitive objects. Extensive evaluations confirm that D3Grasp delivers highly robust performance across large-scale and diverse object categories, and substantially advances the state of the art in dexterous grasping for deformable and compliant objects, even under perceptual uncertainty and real-world disturbances. D3Grasp achieves an average success rate of 95.1% in real-world trials,outperforming prior methods on both rigid and deformable objects benchmarks.
SAGE:State-Aware Guided End-to-End Policy for Multi-Stage Sequential Tasks via Hidden Markov Decision Process
Multi-stage sequential (MSS) robotic manipulation tasks are prevalent and crucial in robotics. They often involve state ambiguity, where visually similar observations correspond to different actions. We present SAGE, a state-aware guided imitation learning framework that models tasks as a Hidden Markov Decision Process (HMDP) to explicitly capture latent task stages and resolve ambiguity. We instantiate the HMDP with a state transition network that infers hidden states, and a state-aware action policy that conditions on both observations and hidden states to produce actions, thereby enabling disambiguation across task stages. To reduce manual annotation effort, we propose a semi-automatic labeling pipeline combining active learning and soft label interpolation. In real-world experiments across multiple complex MSS tasks with state ambiguity, SAGE achieved 100% task success under the standard evaluation protocol, markedly surpassing the baselines. Ablation studies further show that such performance can be maintained with manual labeling for only about 13% of the states, indicating its strong effectiveness.
Where Did I Leave My Glasses? Open-Vocabulary Semantic Exploration in Real-World Semi-Static Environments
Robots deployed in real-world environments, such as homes, must not only navigate safely but also understand their surroundings and adapt to environment changes. To perform tasks efficiently, they must build and maintain a semantic map that accurately reflects the current state of the environment. Existing research on semantic exploration largely focuses on static scenes without persistent object-level instance tracking. A consistent map is, however, crucial for real-world robotic applications where objects in the environment can be removed, reintroduced, or shifted over time. In this work, to close this gap, we propose an open-vocabulary, semantic exploration system for semi-static environments. Our system maintains a consistent map by building a probabilistic model of object instance stationarity, systematically tracking semi-static changes, and actively exploring areas that have not been visited for a prolonged period of time. In addition to active map maintenance, our approach leverages the map's semantic richness with LLM-based reasoning for open-vocabulary object-goal navigation. This enables the robot to search more efficiently by prioritizing contextually relevant areas. We evaluate our approach across multiple real-world semi-static environments. Our system detects 95% of map changes on average, improving efficiency by more than 29% as compared to random and patrol baselines. Overall, our approach achieves a mapping precision within 2% of a fully rebuilt map while requiring substantially less exploration and further completes object goal navigation tasks about 14% faster than the next-best tested strategy (coverage patrolling). A video of our work can be found at http://tiny.cc/sem-explor-semi-static .
PersONAL: Towards a Comprehensive Benchmark for Personalized Embodied Agents
Recent advances in Embodied AI have enabled agents to perform increasingly complex tasks and adapt to diverse environments. However, deploying such agents in realistic human-centered scenarios, such as domestic households, remains challenging, particularly due to the difficulty of modeling individual human preferences and behaviors. In this work, we introduce PersONAL (PERSonalized Object Navigation And Localization, a comprehensive benchmark designed to study personalization in Embodied AI. Agents must identify, retrieve, and navigate to objects associated with specific users, responding to natural-language queries such as "find Lily's backpack". PersONAL comprises over 2,000 high-quality episodes across 30+ photorealistic homes from the HM3D dataset. Each episode includes a natural-language scene description with explicit associations between objects and their owners, requiring agents to reason over user-specific semantics. The benchmark supports two evaluation modes: (1) active navigation in unseen environments, and (2) object grounding in previously mapped scenes. Experiments with state-of-the-art baselines reveal a substantial gap to human performance, highlighting the need for embodied agents capable of perceiving, reasoning, and memorizing over personalized information; paving the way towards real-world assistive robot.
DynaFlow: Dynamics-embedded Flow Matching for Physically Consistent Motion Generation from State-only Demonstrations
This paper introduces DynaFlow, a novel framework that embeds a differentiable simulator directly into a flow matching model. By generating trajectories in the action space and mapping them to dynamically feasible state trajectories via the simulator, DynaFlow ensures all outputs are physically consistent by construction. This end-to-end differentiable architecture enables training on state-only demonstrations, allowing the model to simultaneously generate physically consistent state trajectories while inferring the underlying action sequences required to produce them. We demonstrate the effectiveness of our approach through quantitative evaluations and showcase its real-world applicability by deploying the generated actions onto a physical Go1 quadruped robot. The robot successfully reproduces diverse gait present in the dataset, executes long-horizon motions in open-loop control and translates infeasible kinematic demonstrations into dynamically executable, stylistic behaviors. These hardware experiments validate that DynaFlow produces deployable, highly effective motions on real-world hardware from state-only demonstrations, effectively bridging the gap between kinematic data and real-world execution.
comment: 8 pages
RDAR: Reward-Driven Agent Relevance Estimation for Autonomous Driving
Human drivers focus only on a handful of agents at any one time. On the other hand, autonomous driving systems process complex scenes with numerous agents, regardless of whether they are pedestrians on a crosswalk or vehicles parked on the side of the road. While attention mechanisms offer an implicit way to reduce the input to the elements that affect decisions, existing attention mechanisms for capturing agent interactions are quadratic, and generally computationally expensive. We propose RDAR, a strategy to learn per-agent relevance -- how much each agent influences the behavior of the controlled vehicle -- by identifying which agents can be excluded from the input to a pre-trained behavior model. We formulate the masking procedure as a Markov Decision Process where the action consists of a binary mask indicating agent selection. We evaluate RDAR on a large-scale driving dataset, and demonstrate its ability to learn an accurate numerical measure of relevance by achieving comparable driving performance, in terms of overall progress, safety and performance, while processing significantly fewer agents compared to a state of the art behavior model.
comment: 10 pages, 6 figures
Beyond Human Demonstrations: Diffusion-Based Reinforcement Learning to Generate Data for VLA Training
Vision-language-action (VLA) models have shown strong generalization across tasks and embodiments; however, their reliance on large-scale human demonstrations limits their scalability owing to the cost and effort of manual data collection. Reinforcement learning (RL) offers a potential alternative to generate demonstrations autonomously, yet conventional RL algorithms often struggle on long-horizon manipulation tasks with sparse rewards. In this paper, we propose a modified diffusion policy optimization algorithm to generate high-quality and low-variance trajectories, which contributes to a diffusion RL-powered VLA training pipeline. Our algorithm benefits from not only the high expressiveness of diffusion models to explore complex and diverse behaviors but also the implicit regularization of the iterative denoising process to yield smooth and consistent demonstrations. We evaluate our approach on the LIBERO benchmark, which includes 130 long-horizon manipulation tasks, and show that the generated trajectories are smoother and more consistent than both human demonstrations and those from standard Gaussian RL policies. Further, training a VLA model exclusively on the diffusion RL-generated data achieves an average success rate of 81.9%, which outperforms the model trained on human data by +5.3% and that on Gaussian RL-generated data by +12.6%. The results highlight our diffusion RL as an effective alternative for generating abundant, high-quality, and low-variance demonstrations for VLA models.
Trajectory Planning Using Safe Ellipsoidal Corridors as Projections of Orthogonal Trust Regions
Planning collision free trajectories in complex environments remains a core challenge in robotics. Existing corridor based planners which rely on decomposition of the free space into collision free subsets scale poorly with environmental complexity and require explicit allocations of time windows to trajectory segments. We introduce a new trajectory parameterization that represents trajectories in a nonconvex collision free corridor as being in a convex cartesian product of balls. This parameterization allows us to decouple problem size from geometric complexity of the solution and naturally avoids explicit time allocation by allowing trajectories to evolve continuously inside ellipsoidal corridors. Building on this representation, we formulate the Orthogonal Trust Region Problem (Orth-TRP), a specialized convex program with separable block constraints, and develop a solver that exploits this parallel structure and the unique structure of each parallel subproblem for efficient optimization. Experiments on a quadrotor trajectory planning benchmark show that our approach produces smoother trajectories and lower runtimes than state-of-the-art corridor based planners, especially in highly complicated environments.
Simultaneous estimation of contact position and tool shape with high-dimensional parameters using force measurements and particle filtering
Estimating the contact state between a grasped tool and the environment is essential for performing contact tasks such as assembly and object manipulation. Force signals are valuable for estimating the contact state, as they can be utilized even when the contact location is obscured by the tool. Previous studies proposed methods for estimating contact positions using force/torque signals; however, most methods require the geometry of the tool surface to be known. Although several studies have proposed methods that do not require the tool shape, these methods require considerable time for estimation or are limited to tools with low-dimensional shape parameters. Here, we propose a method for simultaneously estimating the contact position and tool shape, where the tool shape is represented by a grid, which is high-dimensional (more than 1000 dimensional). The proposed method uses a particle filter in which each particle has individual tool shape parameters, thereby to avoid directly handling a high-dimensional parameter space. The proposed method is evaluated through simulations and experiments using tools with curved shapes on a plane. Consequently, the proposed method can estimate the shape of the tool simultaneously with the contact positions, making the contact-position estimation more accurate.
comment: Accepted to The International Journal of Robotics Research (IJRR)
Towards Autonomous Robotic Electrosurgery via Thermal Imaging IROS 2025
Electrosurgery is a surgical technique that can improve tissue cutting by reducing cutting force and bleeding. However, electrosurgery adds a risk of thermal injury to surrounding tissue. Expert surgeons estimate desirable cutting velocities based on experience but have no quantifiable reference to indicate if a particular velocity is optimal. Furthermore, prior demonstrations of autonomous electrosurgery have primarily used constant tool velocity, which is not robust to changes in electrosurgical tissue characteristics, power settings, or tool type. Thermal imaging feedback provides information that can be used to reduce thermal injury while balancing cutting force by controlling tool velocity. We introduce Thermography for Electrosurgical Rate Modulation via Optimization (ThERMO) to autonomously reduce thermal injury while balancing cutting force by intelligently controlling tool velocity. We demonstrate ThERMO in tissue phantoms and compare its performance to the constant velocity approach. Overall, ThERMO improves cut success rate by a factor of three and can reduce peak cutting force by a factor of two. ThERMO responds to varying environmental disturbances, reduces damage to tissue, and completes cutting tasks that would otherwise result in catastrophic failure for the constant velocity approach.
comment: Accepted for publication in the proceedings of the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025)
VIMD: Monocular Visual-Inertial Motion and Depth Estimation
Accurate and efficient dense metric depth estimation is crucial for 3D visual perception in robotics and XR. In this paper, we develop a monocular visual-inertial motion and depth (VIMD) learning framework to estimate dense metric depth by leveraging accurate and efficient MSCKF-based monocular visual-inertial motion tracking. At the core the proposed VIMD is to exploit multi-view information to iteratively refine per-pixel scale, instead of globally fitting an invariant affine model as in the prior work. The VIMD framework is highly modular, making it compatible with a variety of existing depth estimation backbones. We conduct extensive evaluations on the TartanAir and VOID datasets and demonstrate its zero-shot generalization capabilities on the AR Table dataset. Our results show that VIMD achieves exceptional accuracy and robustness, even with extremely sparse points as few as 10-20 metric depth points per image. This makes the proposed VIMD a practical solution for deployment in resource constrained settings, while its robust performance and strong generalization capabilities offer significant potential across a wide range of scenarios.
TopoCut: Learning Multi-Step Cutting with Spectral Rewards and Discrete Diffusion Policies
Robotic manipulation tasks involving cutting deformable objects remain challenging due to complex topological behaviors, difficulties in perceiving dense object states, and the lack of efficient evaluation methods for cutting outcomes. In this paper, we introduce TopoCut, a comprehensive benchmark for multi-step robotic cutting tasks that integrates a cutting environment and generalized policy learning. TopoCut is built upon three core components: (1) We introduce a high-fidelity simulation environment based on a particle-based elastoplastic solver with compliant von Mises constitutive models, augmented by a novel damage-driven topology discovery mechanism that enables accurate tracking of multiple cutting pieces. (2) We develop a comprehensive reward design that integrates the topology discovery with a pose-invariant spectral reward model based on Laplace-Beltrami eigenanalysis, facilitating consistent and robust assessment of cutting quality. (3) We propose an integrated policy learning pipeline, where a dynamics-informed perception module predicts topological evolution and produces particle-wise, topology-aware embeddings to support PDDP (Particle-based Score-Entropy Discrete Diffusion Policy) for goal-conditioned policy learning. Extensive experiments demonstrate that TopoCut supports trajectory generation, scalable learning, precise evaluation, and strong generalization across diverse object geometries, scales, poses, and cutting goals.
Diffusion-Based Impedance Learning for Contact-Rich Manipulation Tasks
Learning methods excel at motion generation in the information domain but are not primarily designed for physical interaction in the energy domain. Impedance Control shapes physical interaction but requires task-aware tuning by selecting feasible impedance parameters. We present Diffusion-Based Impedance Learning, a framework that combines both domains. A Transformer-based Diffusion Model with cross-attention to external wrenches reconstructs a simulated Zero-Force Trajectory (sZFT). This captures both translational and rotational task-space behavior. For rotations, we introduce a novel SLERP-based quaternion noise scheduler that ensures geometric consistency. The reconstructed sZFT is then passed to an energy-based estimator that updates stiffness and damping parameters. A directional rule is applied that reduces impedance along non task axes while preserving rigidity along task directions. Training data were collected for a parkour scenario and robotic-assisted therapy tasks using teleoperation with Apple Vision Pro. With only tens of thousands of samples, the model achieved sub-millimeter positional accuracy and sub-degree rotational accuracy. Its compact model size enabled real-time torque control and autonomous stiffness adaptation on a KUKA LBR iiwa robot. The controller achieved smooth parkour traversal within force and velocity limits and 30/30 success rates for cylindrical, square, and star peg insertions without any peg-specific demonstrations in the training data set. All code for the Transformer-based Diffusion Model, the robot controller, and the Apple Vision Pro telemanipulation framework is publicly available. These results mark an important step towards Physical AI, fusing model-based control for physical interaction with learning-based methods for trajectory generation.
comment: 15 pages, 12 figures
Formal Safety Verification and Refinement for Generative Motion Planners via Certified Local Stabilization
We present a method for formal safety verification of learning-based generative motion planners. Generative motion planners (GMPs) offer advantages over traditional planners, but verifying the safety and dynamic feasibility of their outputs is difficult since neural network verification (NNV) tools scale only to a few hundred neurons, while GMPs often contain millions. To preserve GMP expressiveness while enabling verification, our key insight is to imitate the GMP by stabilizing references sampled from the GMP with a small neural tracking controller and then applying NNV to the closed-loop dynamics. This yields reachable sets that rigorously certify closed-loop safety, while the controller enforces dynamic feasibility. Building on this, we construct a library of verified GMP references and deploy them online in a way that imitates the original GMP distribution whenever it is safe to do so, improving safety without retraining. We evaluate across diverse planners, including diffusion, flow matching, and vision-language models, improving safety in simulation (on ground robots and quadcopters) and on hardware (differential-drive robot).
comment: 10 pages, 12 figures
Memory-Augmented Potential Field Theory: A Framework for Adaptive Control in Non-Convex Domains NeurIPS 2025
Stochastic optimal control methods often struggle in complex non-convex landscapes, frequently becoming trapped in local optima due to their inability to learn from historical trajectory data. This paper introduces Memory-Augmented Potential Field Theory, a unified mathematical framework that integrates historical experience into stochastic optimal control. Our approach dynamically constructs memory-based potential fields that identify and encode key topological features of the state space, enabling controllers to automatically learn from past experiences and adapt their optimization strategy. We provide a theoretical analysis showing that memory-augmented potential fields possess non-convex escape properties, asymptotic convergence characteristics, and computational efficiency. We implement this theoretical framework in a Memory-Augmented Model Predictive Path Integral (MPPI) controller that demonstrates significantly improved performance in challenging non-convex environments. The framework represents a generalizable approach to experience-based learning within control systems (especially robotic dynamics), enhancing their ability to navigate complex state spaces without requiring specialized domain knowledge or extensive offline training.
comment: Accepted by NeurIPS 2025
RoboSSM: Scalable In-context Imitation Learning via State-Space Models
In-context imitation learning (ICIL) enables robots to learn tasks from prompts consisting of just a handful of demonstrations. By eliminating the need for parameter updates at deployment time, this paradigm supports few-shot adaptation to novel tasks. However, recent ICIL methods rely on Transformers, which have computational limitations and tend to underperform when handling longer prompts than those seen during training. In this work, we introduce RoboSSM, a scalable recipe for in-context imitation learning based on state-space models (SSM). Specifically, RoboSSM replaces Transformers with Longhorn -- a state-of-the-art SSM that provides linear-time inference and strong extrapolation capabilities, making it well-suited for long-context prompts. We evaluate our approach on the LIBERO benchmark and compare it against strong Transformer-based ICIL baselines. Experiments show that RoboSSM extrapolates effectively to varying numbers of in-context demonstrations, yields high performance on unseen tasks, and remains robust in long-horizon scenarios. These results highlight the potential of SSMs as an efficient and scalable backbone for ICIL. Our code is available at https://github.com/youngjuY/RoboSSM.
comment: 8 pages, 11 figures
Latent Activation Editing: Inference-Time Refinement of Learned Policies for Safer Multirobot Navigation
Reinforcement learning has enabled significant progress in complex domains such as coordinating and navigating multiple quadrotors. However, even well-trained policies remain vulnerable to collisions in obstacle-rich environments. Addressing these infrequent but critical safety failures through retraining or fine-tuning is costly and risks degrading previously learned skills. Inspired by activation steering in large language models and latent editing in computer vision, we introduce a framework for inference-time Latent Activation Editing (LAE) that refines the behavior of pre-trained policies without modifying their weights or architecture. The framework operates in two stages: (i) an online classifier monitors intermediate activations to detect states associated with undesired behaviors, and (ii) an activation editing module that selectively modifies flagged activations to shift the policy towards safer regimes. In this work, we focus on improving safety in multi-quadrotor navigation. We hypothesize that amplifying a policy's internal perception of risk can induce safer behaviors. We instantiate this idea through a latent collision world model trained to predict future pre-collision activations, thereby prompting earlier and more cautious avoidance responses. Extensive simulations and real-world Crazyflie experiments demonstrate that LAE achieves statistically significant reduction in collisions (nearly 90% fewer cumulative collisions compared to the unedited baseline) and substantially increases the fraction of collision-free trajectories, while preserving task completion. More broadly, our results establish LAE as a lightweight paradigm, feasible on resource-constrained hardware, for post-deployment refinement of learned robot policies.
Uncertainty-Aware Active Source Tracking of Marine Pollution using Unmanned Surface Vehicles IROS 2025
This paper proposes an uncertainty-aware marine pollution source tracking framework for unmanned surface vehicles (USVs). By integrating high-fidelity marine pollution dispersion simulation with informative path planning techniques, we demonstrate effective identification of pollution sources in marine environments. The proposed approach is implemented based on Robot Operating System (ROS), processing real-time sensor data to update probabilistic source location estimates. The system progressively refines the estimation of source location while quantifying uncertainty levels in its predictions. Experiments conducted in simulated environments with varying source locations, flow conditions, and starting positions demonstrate the framework's ability to localise pollution sources with high accuracy. Results show that the proposed approach achieves reliable source localisation efficiently. This work contributes to the development of full autonomous environmental monitoring capabilities essential for rapid response to marine pollution incidents.
comment: Accepted for presentation at Oceantech: Marine Robotics & Science Workshop, IROS 2025
Large Pre-Trained Models for Bimanual Manipulation in 3D
We investigate the integration of attention maps from a pre-trained Vision Transformer into voxel representations to enhance bimanual robotic manipulation. Specifically, we extract attention maps from DINOv2, a self-supervised ViT model, and interpret them as pixel-level saliency scores over RGB images. These maps are lifted into a 3D voxel grid, resulting in voxel-level semantic cues that are incorporated into a behavior cloning policy. When integrated into a state-of-the-art voxel-based policy, our attention-guided featurization yields an average absolute improvement of 8.2% and a relative gain of 21.9% across all tasks in the RLBench bimanual benchmark.
comment: Accepted to 2025 IEEE-RAS 24th International Conference on Humanoid Robots
GraspFactory: A Large Object-Centric Grasping Dataset
Robotic grasping is a crucial task in industrial automation, where robots are increasingly expected to handle a wide range of objects. However, a significant challenge arises when robot grasping models trained on limited datasets encounter novel objects. In real-world environments such as warehouses or manufacturing plants, the diversity of objects can be vast, and grasping models need to generalize to this diversity. Training large, generalizable robot-grasping models requires geometrically diverse datasets. In this paper, we introduce GraspFactory, a dataset containing over 109 million 6-DoF grasps collectively for the Franka Panda (with 14,690 objects) and Robotiq 2F-85 grippers (with 33,710 objects). GraspFactory is designed for training data-intensive models, and we demonstrate the generalization capabilities of one such model trained on a subset of GraspFactory in both simulated and real-world settings. The dataset and tools are made available for download at https://graspfactory.github.io/.
Selective Progress-Aware Querying for Human-in-the-Loop Reinforcement Learning
Human feedback can greatly accelerate robot learning, but in real-world settings, such feedback is costly and limited. Existing human-in-the-loop reinforcement learning (HiL-RL) methods often assume abundant feedback, limiting their practicality for physical robot deployment. In this work, we introduce SPARQ, a progress-aware query policy that requests feedback only when learning stagnates or worsens, thereby reducing unnecessary oracle calls. We evaluate SPARQ on a simulated UR5 cube-picking task in PyBullet, comparing against three baselines: no feedback, random querying, and always querying. Our experiments show that SPARQ achieves near-perfect task success, matching the performance of always querying while consuming about half the feedback budget. It also provides more stable and efficient learning than random querying, and significantly improves over training without feedback. These findings suggest that selective, progress-based query strategies can make HiL-RL more efficient and scalable for robots operating under realistic human effort constraints.
comment: Preprint. 8 pages, 3 figures, 1 table, 1 algorithm. CoRL 2025 style (preprint). Code/data to be released
Action-Informed Estimation and Planning: Clearing Clutter on Staircases via Quadrupedal Pedipulation
For robots to operate autonomously in densely cluttered environments, they must reason about and potentially physically interact with obstacles to clear a path. Safely clearing a path on challenging terrain, such as a cluttered staircase, requires controlled interaction. For example, a quadrupedal robot that pushes objects out of the way with one leg while maintaining a stable stance with its three other legs. However, tightly coupled physical actions, such as one-legged pushing, create new constraints on the system that can be difficult to predict at design time. In this work, we present a new method that addresses one such constraint, wherein the object being pushed by a quadrupedal robot with one of its legs becomes occluded from the robot's sensors during manipulation. To address this challenge, we present a tightly coupled perception-action framework that enables the robot to perceive clutter, reason about feasible push paths, and execute the clearing maneuver. Our core contribution is an interaction-aware state estimation loop that uses proprioceptive feedback regarding foot contact and leg position to predict an object's displacement during the occlusion. This prediction guides the perception system to robustly re-detect the object after the interaction, closing the loop between action and sensing to enable accurate tracking even after partial pushes. Using this feedback allows the robot to learn from physical outcomes, reclassifying an object as immovable if a push fails due to it being too heavy. We present results of implementing our approach on a Boston Dynamics Spot robot that show our interaction-aware approach achieves higher task success rates and tracking accuracy in pushing objects on stairs compared to open-loop baselines.
MELEGROS: Monolithic Elephant-inspired Gripper with Optical Sensors
The elephant trunk exemplifies a natural gripper where structure, actuation, and sensing are seamlessly integrated. Inspired by the distal morphology of the African elephant trunk, we present MELEGROS, a Monolithic ELEphant-inspired GRipper with Optical Sensors, emphasizing sensing as an intrinsic, co-fabricated capability. Unlike multi-material or tendon-based approaches, MELEGROS directly integrates six optical waveguide sensors and five pneumatic chambers into a pneumatically actuated lattice structure (12.5 mm cell size) using a single soft resin and one continuous 3D print. This eliminates mechanical mismatches between sensors, actuators, and body, reducing model uncertainty and enabling simulation-guided sensor design and placement. Only four iterations were required to achieve the final prototype, which features a continuous structure capable of elongation, compression, and bending while decoupling tactile and proprioceptive signals. MELEGROS (132 g) lifts more than twice its weight, performs bioinspired actions such as pinching, scooping, and reaching, and delicately grasps fragile items like grapes. The integrated optical sensors provide distinct responses to touch, bending, and chamber deformation, enabling multifunctional perception. MELEGROS demonstrates a new paradigm for soft robotics where fully embedded sensing and continuous structures inherently support versatile, bioinspired manipulation.
comment: 15 pages, 6 figures. SI 18 pages, 19 figures. Submitted to Wiley Advanced Science
Boosting Zero-Shot VLN via Abstract Obstacle Map-Based Waypoint Prediction with TopoGraph-and-VisitInfo-Aware Prompting
With the rapid progress of foundation models and robotics, vision-language navigation (VLN) has emerged as a key task for embodied agents with broad practical applications. We address VLN in continuous environments, a particularly challenging setting where an agent must jointly interpret natural language instructions, perceive its surroundings, and plan low-level actions. We propose a zero-shot framework that integrates a simplified yet effective waypoint predictor with a multimodal large language model (MLLM). The predictor operates on an abstract obstacle map, producing linearly reachable waypoints, which are incorporated into a dynamically updated topological graph with explicit visitation records. The graph and visitation information are encoded into the prompt, enabling reasoning over both spatial structure and exploration history to encourage exploration and equip MLLM with local path planning for error correction. Extensive experiments on R2R-CE and RxR-CE show that our method achieves state-of-the-art zero-shot performance, with success rates of 41% and 36%, respectively, outperforming prior state-of-the-art methods.
Revisiting Formal Methods for Autonomous Robots: A Structured Survey
This paper presents the initial results from our structured literature review on applications of Formal Methods (FM) to Robotic Autonomous Systems (RAS). We describe our structured survey methodology; including database selection and associated search strings, search filters and collaborative review of identified papers. We categorise and enumerate the FM approaches and formalisms that have been used for specification and verification of RAS. We investigate FM in the context of sub-symbolic AI-enabled RAS and examine the evolution of how FM is used over time in this field. This work complements a pre-existing survey in this area and we examine how this research area has matured over time. Specifically, our survey demonstrates that some trends have persisted as observed in a previous survey. Additionally, it recognized new trends that were not considered previously including a noticeable increase in adopting Formal Synthesis approaches as well as Probabilistic Verification Techniques.
comment: Appeal accepted: MOD-66548 This is an appeal request regarding our submission MOD-65174 - 6681725
Boosting LiDAR-Based Localization with Semantic Insight: Camera Projection versus Direct LiDAR Segmentation
Semantic segmentation of LiDAR data presents considerable challenges, particularly when dealing with diverse sensor types and configurations. However, incorporating semantic information can significantly enhance the accuracy and robustness of LiDAR-based localization techniques for autonomous mobile systems. We propose an approach that integrates semantic camera data with LiDAR segmentation to address this challenge. By projecting LiDAR points into the semantic segmentation space of the camera, our method enhances the precision and reliability of the LiDAR-based localization pipeline. For validation, we utilize the CoCar NextGen platform from the FZI Research Center for Information Technology, which offers diverse sensor modalities and configurations. The sensor setup of CoCar NextGen enables a thorough analysis of different sensor types. Our evaluation leverages the state-of-the-art Depth-Anything network for camera image segmentation and an adaptive segmentation network for LiDAR segmentation. To establish a reliable ground truth for LiDAR-based localization, we make us of a Global Navigation Satellite System (GNSS) solution with Real-Time Kinematic corrections (RTK). Additionally, we conduct an extensive 55 km drive through the city of Karlsruhe, Germany, covering a variety of environments, including urban areas, multi-lane roads, and rural highways. This multimodal approach paves the way for more reliable and precise autonomous navigation systems, particularly in complex real-world environments.
SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent NeurIPS 2025
Indoor scene synthesis has become increasingly important with the rise of Embodied AI, which requires 3D environments that are not only visually realistic but also physically plausible and functionally diverse. While recent approaches have advanced visual fidelity, they often remain constrained to fixed scene categories, lack sufficient object-level detail and physical consistency, and struggle to align with complex user instructions. In this work, we present SceneWeaver, a reflective agentic framework that unifies diverse scene synthesis paradigms through tool-based iterative refinement. At its core, SceneWeaver employs a language model-based planner to select from a suite of extensible scene generation tools, ranging from data-driven generative models to visual- and LLM-based methods, guided by self-evaluation of physical plausibility, visual realism, and semantic alignment with user input. This closed-loop reason-act-reflect design enables the agent to identify semantic inconsistencies, invoke targeted tools, and update the environment over successive iterations. Extensive experiments on both common and open-vocabulary room types demonstrate that SceneWeaver not only outperforms prior methods on physical, visual, and semantic metrics, but also generalizes effectively to complex scenes with diverse instructions, marking a step toward general-purpose 3D environment generation. Project website: https://scene-weaver.github.io/.
comment: Accepted by NeurIPS 2025, 26 pages
Human-Interpretable Uncertainty Explanations for Point Cloud Registration
In this paper, we address the point cloud registration problem, where well-known methods like ICP fail under uncertainty arising from sensor noise, pose-estimation errors, and partial overlap due to occlusion. We develop a novel approach, Gaussian Process Concept Attribution (GP-CA), which not only quantifies registration uncertainty but also explains it by attributing uncertainty to well-known sources of errors in registration problems. Our approach leverages active learning to discover new uncertainty sources in the wild by querying informative instances. We validate GP-CA on three publicly available datasets and in our real-world robot experiment. Extensive ablations substantiate our design choices. Our approach outperforms other state-of-the-art methods in terms of runtime, high sample-efficiency with active learning, and high accuracy. Our real-world experiment clearly demonstrates its applicability. Our video also demonstrates that GP-CA enables effective failure-recovery behaviors, yielding more robust robotic perception.
Do You Need Proprioceptive States in Visuomotor Policies?
Imitation-learning-based visuomotor policies have been widely used in robot manipulation, where both visual observations and proprioceptive states are typically adopted together for precise control. However, in this study, we find that this common practice makes the policy overly reliant on the proprioceptive state input, which causes overfitting to the training trajectories and results in poor spatial generalization. On the contrary, we propose the State-free Policy, removing the proprioceptive state input and predicting actions only conditioned on visual observations. The State-free Policy is built in the relative end-effector action space, and should ensure the full task-relevant visual observations, here provided by dual wide-angle wrist cameras. Empirical results demonstrate that the State-free policy achieves significantly stronger spatial generalization than the state-based policy: in real-world tasks such as pick-and-place, challenging shirt-folding, and complex whole-body manipulation, spanning multiple robot embodiments, the average success rate improves from 0% to 85% in height generalization and from 6% to 64% in horizontal generalization. Furthermore, they also show advantages in data efficiency and cross-embodiment adaptation, enhancing their practicality for real-world deployment. Discover more by visiting: https://statefreepolicy.github.io.
comment: Project page: https://statefreepolicy.github.io
ByteWrist: A Parallel Robotic Wrist Enabling Flexible and Anthropomorphic Motion for Confined Spaces
This paper introduces ByteWrist, a novel highly-flexible and anthropomorphic parallel wrist for robotic manipulation. ByteWrist addresses the critical limitations of existing serial and parallel wrists in narrow-space operations through a compact three-stage parallel drive mechanism integrated with arc-shaped end linkages. The design achieves precise RPY (Roll-Pitch-Yaw) motion while maintaining exceptional compactness, making it particularly suitable for complex unstructured environments such as home services, medical assistance, and precision assembly. The key innovations include: (1) a nested three-stage motor-driven linkages that minimize volume while enabling independent multi-DOF control, (2) arc-shaped end linkages that optimize force transmission and expand motion range, and (3) a central supporting ball functioning as a spherical joint that enhances structural stiffness without compromising flexibility. Meanwhile, we present comprehensive kinematic modeling including forward / inverse kinematics and a numerical Jacobian solution for precise control. Empirically, we observe ByteWrist demonstrates strong performance in narrow-space maneuverability and dual-arm cooperative manipulation tasks, outperforming Kinova-based systems. Results indicate significant improvements in compactness, efficiency, and stiffness compared to traditional designs, establishing ByteWrist as a promising solution for next-generation robotic manipulation in constrained environments.
comment: Tech Report.13 pages, 9 figures. Project page: https://bytewrist.github.io/
AORRTC: Almost-Surely Asymptotically Optimal Planning with RRT-Connect
Finding high-quality solutions quickly is an important objective in motion planning. This is especially true for high-degree-of-freedom robots. Satisficing planners have traditionally found feasible solutions quickly but provide no guarantees on their optimality, while almost-surely asymptotically optimal (a.s.a.o.) planners have probabilistic guarantees on their convergence towards an optimal solution but are more computationally expensive. This paper uses the AO-x meta-algorithm to extend the satisficing RRT-Connect planner to optimal planning. The resulting Asymptotically Optimal RRT-Connect (AORRTC) finds initial solutions in similar times as RRT-Connect and uses any additional planning time to converge towards the optimal solution in an anytime manner. It is proven to be probabilistically complete and a.s.a.o. AORRTC was tested with the Panda (7 DoF) and Fetch (8 DoF) robotic arms on the MotionBenchMaker dataset. These experiments show that AORRTC finds initial solutions as fast as RRT-Connect and faster than the tested state-of-the-art a.s.a.o. algorithms while converging to better solutions faster. AORRTC finds solutions to difficult high-DoF planning problems in milliseconds where the other a.s.a.o. planners could not consistently find solutions in seconds. This performance was demonstrated both with and without single instruction/multiple data (SIMD) acceleration.
comment: IEEE Robotics and Automation Letters (RA-L). 8 pages, 4 figures, 1 table. A video of AORRTC can be found at https://www.youtube.com/watch?v=j1itxP3KuiM . Information on the implementation of AORRTC is available at https://robotic-esp.com/code/aorrtc/
Data-fused Model Predictive Control with Guarantees: Application to Flying Humanoid Robots
This paper introduces a Data-Fused Model Predictive Control (DFMPC) framework that combines physics-based models with data-driven representations of unknown dynamics. Leveraging Willems' Fundamental Lemma and an artificial equilibrium formulation, the method enables tracking of changing, potentially unreachable setpoints while explicitly handling measurement noise through slack variables and regularization. We provide guarantees of recursive feasibility and practical stability under input-output constraints for a specific class of reference signals. The approach is validated on the iRonCub flying humanoid robot, integrating analytical momentum models with data-driven turbine dynamics. Simulations show improved tracking and robustness compared to a purely model-based MPC, while maintaining real-time feasibility.
comment: 8 pages, 3 figures
Towards Data-Driven Adaptive Exoskeleton Assistance for Post-stroke Gait
Recent work has shown that exoskeletons controlled through data-driven methods can dynamically adapt assistance to various tasks for healthy young adults. However, applying these methods to populations with neuromotor gait deficits, such as post-stroke hemiparesis, is challenging. This is due not only to high population heterogeneity and gait variability but also to a lack of post-stroke gait datasets to train accurate models. Despite these challenges, data-driven methods offer a promising avenue for control, potentially allowing exoskeletons to function safely and effectively in unstructured community settings. This work presents a first step towards enabling adaptive plantarflexion and dorsiflexion assistance from data-driven torque estimation during post-stroke walking. We trained a multi-task Temporal Convolutional Network (TCN) using collected data from four post-stroke participants walking on a treadmill ($R^2$ of $0.74 \pm 0.13$). The model uses data from three inertial measurement units (IMU) and was pretrained on healthy walking data from 6 participants. We implemented a wearable prototype for our ankle torque estimation approach for exoskeleton control and demonstrated the viability of real-time sensing, estimation, and actuation with one post-stroke participant.
comment: 8 pages, 6 figures, 2 tables
Optimal Multi-agent Path Finding in Continuous Time
Continuous-time Conflict Based-Search (CCBS) has long been viewed as the standard optimal baseline for multi-agent path finding in continuous time (MAPFR), yet recent critiques show that the theoretically described CCBS can fail to terminate on solvable MAPFR problems while the publicly available reference implementation can return sub-optimal solutions. This work presents an analytical framework that yields simple and sufficient conditions under which any CCBS-style algorithm is both sound and solution complete. Investigating the reference CCBS implementation reveals that it violates our sufficient conditions for soundness, with counterexamples demonstrating sub-optimality. Leveraging the framework, we introduce a branching rule ($\delta$-BR) and prove it restores soundness and termination guarantees. Consequently, the resulting CCBS variant is both sound and solution complete. To our knowledge, this is the first MAPFR solver matching the guarantees of the discrete-time CBS. On a constructed example, CCBS with $\delta$-BR improves sum-of-costs from 10.707 to 9.000 ($\approx$ 16% lower) compared to the reference CCBS implementation. Across benchmarks, the reference CCBS implementation is generally able to find solutions faster than CCBS with $\delta$-BR due to its more aggressive pruning. However, this comes at the cost of occasional sub-optimality and potential non-termination when all solutions are pruned, whereas $\delta$-BR preserves optimality and guarantees termination by design. Because $\delta$-BR largely only affects the branching step, it can be adopted as a drop-in replacement in existing codebases. Beyond CCBS, the analytical framework and termination criterion provide a systematic way to evaluate other CCBS-like MAPFR solvers and future extensions, thereby offering tools for rigorous analysis of next-generation MAPFR algorithms.
comment: 35 pages
U-ARM : Ultra low-cost general teleoperation interface for robot manipulation
We propose U-Arm, a low-cost and rapidly adaptable leader-follower teleoperation framework designed to interface with most of commercially available robotic arms. Our system supports teleoperation through three structurally distinct 3D-printed leader arms that share consistent control logic, enabling seamless compatibility with diverse commercial robot configurations. Compared with previous open-source leader-follower interfaces, we further optimized both the mechanical design and servo selection, achieving a bill of materials (BOM) cost of only \$50.5 for the 6-DoF leader arm and \$56.8 for the 7-DoF version. To enhance usability, we mitigate the common challenge in controlling redundant degrees of freedom by %engineering methods mechanical and control optimizations. Experimental results demonstrate that U-Arm achieves 39\% higher data collection efficiency and comparable task success rates across multiple manipulation scenarios compared with Joycon, another low-cost teleoperation interface. We have open-sourced all CAD models of three configs and also provided simulation support for validating teleoperation workflows. We also open-sourced real-world manipulation data collected with U-Arm. The project website is https://github.com/MINT-SJTU/LeRobot-Anything-U-Arm.
GraphEQA: Using 3D Semantic Scene Graphs for Real-time Embodied Question Answering
In Embodied Question Answering (EQA), agents must explore and develop a semantic understanding of an unseen environment to answer a situated question with confidence. This problem remains challenging in robotics, due to the difficulties in obtaining useful semantic representations, updating these representations online, and leveraging prior world knowledge for efficient planning and exploration. To address these limitations, we propose GraphEQA, a novel approach that utilizes real-time 3D metric-semantic scene graphs (3DSGs) and task relevant images as multi-modal memory for grounding Vision-Language Models (VLMs) to perform EQA tasks in unseen environments. We employ a hierarchical planning approach that exploits the hierarchical nature of 3DSGs for structured planning and semantics-guided exploration. We evaluate GraphEQA in simulation on two benchmark datasets, HM-EQA and OpenEQA, and demonstrate that it outperforms key baselines by completing EQA tasks with higher success rates and fewer planning steps. We further demonstrate GraphEQA in multiple real-world home and office environments.
comment: Project website: https://saumyasaxena.github.io/grapheqa
ExoStart: Efficient learning for dexterous manipulation with sensorized exoskeleton demonstrations
Recent advancements in teleoperation systems have enabled high-quality data collection for robotic manipulators, showing impressive results in learning manipulation at scale. This progress suggests that extending these capabilities to robotic hands could unlock an even broader range of manipulation skills, especially if we could achieve the same level of dexterity that human hands exhibit. However, teleoperating robotic hands is far from a solved problem, as it presents a significant challenge due to the high degrees of freedom of robotic hands and the complex dynamics occurring during contact-rich settings. In this work, we present ExoStart, a general and scalable learning framework that leverages human dexterity to improve robotic hand control. In particular, we obtain high-quality data by collecting direct demonstrations without a robot in the loop using a sensorized low-cost wearable exoskeleton, capturing the rich behaviors that humans can demonstrate with their own hands. We also propose a simulation-based dynamics filter that generates dynamically feasible trajectories from the collected demonstrations and use the generated trajectories to bootstrap an auto-curriculum reinforcement learning method that relies only on simple sparse rewards. The ExoStart pipeline is generalizable and yields robust policies that transfer zero-shot to the real robot. Our results demonstrate that ExoStart can generate dexterous real-world hand skills, achieving a success rate above 50% on a wide range of complex tasks such as opening an AirPods case or inserting and turning a key in a lock. More details and videos can be found in https://sites.google.com/view/exostart.
Design optimization and robustness analysis of rigid-link flapping mechanisms
Rigid link flapping mechanisms remain the most practical choice for flapping wing micro-aerial vehicles (MAVs) to carry useful payloads and onboard batteries for free flight due to their long-term durability and reliability. However, MAVs with these mechanisms require significant weight reduction to achieve high agility and maneuverability. One approach involves using single-DOF planar rigid linkages, which are rarely optimized dimensionally for high lift and low power, considering their sweeping kinematics and the unsteady aerodynamic effects. We integrated a mechanism simulator based on a quasistatic nonlinear finite element method with an unsteady vortex lattice method-based aerodynamic analysis tool within an optimization routine. We optimized three different mechanism topologies from the literature. Significant power savings were observed up to 34% in some cases, due to increased amplitude and higher lift coefficients resulting from optimized asymmetric sweeping velocity profiles. We also conducted a robustness analysis to quantify performance sensitivity to manufacturing tolerances. It provided a trade-off between performance and reliability and revealed the need for tight manufacturing tolerances and careful material selection. Finally, the analysis helped select the best mechanism topology, as we observed significant variation in sensitivity to manufacturing tolerances and peak input torque values across different topologies for a given design lift value. The presented unified computational tool can find application in flapping mechanism topology optimization, as it can simulate any generic single-DOF planar rigid linkage without supplying kinematics manually.
ASC-SW: Atrous strip convolution network with sliding windows
With the rapid development of lightweight visual neural network architectures, traditional high-performance vision models have undergone significant compression, enhancing their computational and energy efficiency and enabling deployment on resource-constrained edge devices. In order to enable the mobile robot to avoid the ground wires, we propose a visual-assisted navigation framework called Atrous Strip Convolution Sliding Window (ASC-SW). This framework compensates for the limitations of traditional light detection and range (LiDAR) sensors to detect ground-level obstacles such as wires. A lightweight and efficient segmentation model, Atrous Strip Convolution Network (ASCnet) was proposed, for detecting deformable linear objects (DLOs). Atrous Strip Convolution Spatial Pyramid Pooling (ASCSPP) is designed to extract DLOs features effectively. Atrous Strip Convolution is integrated into ASCSPP to accurately identify the linear structure of DLOs with low computational cost. Additionally, a Sliding Window (SW) post processing module is proposed to denoise the output in complex environments, improving recognition accuracy. ASC-SW achieves 75.3% MIoU at 217 FPS on a self-built real world dataset and real-robot experiment was demonstrated that our proposed framework. It can be successfully verified on the real-robot on the edge device(Jetson platform) at that were originally inoperable.
TrojanRobot: Physical-world Backdoor Attacks Against VLM-based Robotic Manipulation
Robotic manipulation in the physical world is increasingly empowered by \textit{large language models} (LLMs) and \textit{vision-language models} (VLMs), leveraging their understanding and perception capabilities. Recently, various attacks against such robotic policies have been proposed, with backdoor attacks drawing considerable attention for their high stealth and strong persistence capabilities. However, existing backdoor efforts are limited to simulators and suffer from physical-world realization. To address this, we propose \textit{TrojanRobot}, a highly stealthy and broadly effective robotic backdoor attack in the physical world. Specifically, we introduce a module-poisoning approach by embedding a backdoor module into the modular robotic policy, enabling backdoor control over the policy's visual perception module thereby backdooring the entire robotic policy. Our vanilla implementation leverages a backdoor-finetuned VLM to serve as the backdoor module. To enhance its generalization in physical environments, we propose a prime implementation, leveraging the LVLM-as-a-backdoor paradigm and developing three types of prime attacks, \ie, \textit{permutation}, \textit{stagnation}, and \textit{intentional} attacks, thus achieving finer-grained backdoors. Extensive experiments on the UR3e manipulator with 18 task instructions using robotic policies based on four VLMs demonstrate the broad effectiveness and physical-world stealth of TrojanRobot. Our attack's video demonstrations are available via a github link https://trojanrobot.github.io.
SEM: Enhancing Spatial Understanding for Robust Robot Manipulation
A key challenge in robot manipulation lies in developing policy models with strong spatial understanding, the ability to reason about 3D geometry, object relations, and robot embodiment. Existing methods often fall short: 3D point cloud models lack semantic abstraction, while 2D image encoders struggle with spatial reasoning. To address this, we propose SEM (Spatial Enhanced Manipulation model), a novel diffusion-based policy framework that explicitly enhances spatial understanding from two complementary perspectives. A spatial enhancer augments visual representations with 3D geometric context, while a robot state encoder captures embodiment-aware structure through graphbased modeling of joint dependencies. By integrating these modules, SEM significantly improves spatial understanding, leading to robust and generalizable manipulation across diverse tasks that outperform existing baselines.
RG-Attn: Radian Glue Attention for Multi-modality Multi-agent Cooperative Perception ICCV 2025
Cooperative perception enhances autonomous driving by leveraging Vehicle-to-Everything (V2X) communication for multi-agent sensor fusion. However, most existing methods rely on single-modal data sharing, limiting fusion performance, particularly in heterogeneous sensor settings involving both LiDAR and cameras across vehicles and roadside units (RSUs). To address this, we propose Radian Glue Attention (RG-Attn), a lightweight and generalizable cross-modal fusion module that unifies intra-agent and inter-agent fusion via transformation-based coordinate alignment and a unified sampling/inversion strategy. RG-Attn efficiently aligns features through a radian-based attention constraint, operating column-wise on geometrically consistent regions to reduce overhead and preserve spatial coherence, thereby enabling accurate and robust fusion. Building upon RG-Attn, we propose three cooperative architectures. The first, Paint-To-Puzzle (PTP), prioritizes communication efficiency but assumes all agents have LiDAR, optionally paired with cameras. The second, Co-Sketching-Co-Coloring (CoS-CoCo), offers maximal flexibility, supporting any sensor setup (e.g., LiDAR-only, camera-only, or both) and enabling strong cross-modal generalization for real-world deployment. The third, Pyramid-RG-Attn Fusion (PRGAF), aims for peak detection accuracy with the highest computational overhead. Extensive evaluations on simulated and real-world datasets show our framework delivers state-of-the-art detection accuracy with high flexibility and efficiency. GitHub Link: https://github.com/LantaoLi/RG-Attn
comment: Accepted by ICCV 2025 DriveX workshop (Final Version)
VLM See, Robot Do: Human Demo Video to Robot Action Plan via Vision Language Model
Vision Language Models (VLMs) have recently been adopted in robotics for their capability in common sense reasoning and generalizability. Existing work has applied VLMs to generate task and motion planning from natural language instructions and simulate training data for robot learning. In this work, we explore using VLM to interpret human demonstration videos and generate robot task planning. Our method integrates keyframe selection, visual perception, and VLM reasoning into a pipeline. We named it SeeDo because it enables the VLM to ''see'' human demonstrations and explain the corresponding plans to the robot for it to ''do''. To validate our approach, we collected a set of long-horizon human videos demonstrating pick-and-place tasks in three diverse categories and designed a set of metrics to comprehensively benchmark SeeDo against several baselines, including state-of-the-art video-input VLMs. The experiments demonstrate SeeDo's superior performance. We further deployed the generated task plans in both a simulation environment and on a real robot arm.
Online Language Splatting
To enable AI agents to interact seamlessly with both humans and 3D environments, they must not only perceive the 3D world accurately but also align human language with 3D spatial representations. While prior work has made significant progress by integrating language features into geometrically detailed 3D scene representations using 3D Gaussian Splatting (GS), these approaches rely on computationally intensive offline preprocessing of language features for each input image, limiting adaptability to new environments. In this work, we introduce Online Language Splatting, the first framework to achieve online, near real-time, open-vocabulary language mapping within a 3DGS-SLAM system without requiring pre-generated language features. The key challenge lies in efficiently fusing high-dimensional language features into 3D representations while balancing the computation speed, memory usage, rendering quality and open-vocabulary capability. To this end, we innovatively design: (1) a high-resolution CLIP embedding module capable of generating detailed language feature maps in 18ms per frame, (2) a two-stage online auto-encoder that compresses 768-dimensional CLIP features to 15 dimensions while preserving open-vocabulary capabilities, and (3) a color-language disentangled optimization approach to improve rendering quality. Experimental results show that our online method not only surpasses the state-of-the-art offline methods in accuracy but also achieves more than 40x efficiency boost, demonstrating the potential for dynamic and interactive AI applications.
AURA: Autonomous Upskilling with Retrieval-Augmented Agents
Designing reinforcement learning curricula for agile robots traditionally requires extensive manual tuning of reward functions, environment randomizations, and training configurations. We introduce AURA (Autonomous Upskilling with Retrieval-Augmented Agents), a schema-validated curriculum reinforcement learning (RL) framework that leverages Large Language Models (LLMs) as autonomous designers of multi-stage curricula. AURA transforms user prompts into YAML workflows that encode full reward functions, domain randomization strategies, and training configurations. All files are statically validated before any GPU time is used, ensuring efficient and reliable execution. A retrieval-augmented feedback loop allows specialized LLM agents to design, execute, and refine curriculum stages based on prior training results stored in a vector database, enabling continual improvement over time. Quantitative experiments show that AURA consistently outperforms LLM-guided baselines in generation success rate, humanoid locomotion, and manipulation tasks. Ablation studies highlight the importance of schema validation and retrieval for curriculum quality. AURA successfully trains end-to-end policies directly from user prompts and deploys them zero-shot on a custom humanoid robot in multiple environments - capabilities that did not exist previously with manually designed controllers. By abstracting the complexity of curriculum design, AURA enables scalable and adaptive policy learning pipelines that would be complex to construct by hand.
CHILD (Controller for Humanoid Imitation and Live Demonstration): a Whole-Body Humanoid Teleoperation System
Recent advances in teleoperation have demonstrated robots performing complex manipulation tasks. However, existing works rarely support whole-body joint-level teleoperation for humanoid robots, limiting the diversity of tasks that can be accomplished. This work presents Controller for Humanoid Imitation and Live Demonstration (CHILD), a compact reconfigurable teleoperation system that enables joint level control over humanoid robots. CHILD fits within a standard baby carrier, allowing the operator control over all four limbs, and supports both direct joint mapping for full-body control and loco-manipulation. Adaptive force feedback is incorporated to enhance operator experience and prevent unsafe joint movements. We validate the capabilities of this system by conducting loco-manipulation and full-body control demonstrations on a humanoid robot and multiple dual-arm systems. Lastly, we open-source the design of the hardware promoting accessibility and reproducibility. Additional details and open-source information are available at our project website: https://uiuckimlab.github.io/CHILD-pages.
comment: 2025 IEEE-RAS 24th International Conference on Humanoid Robots (Humanoids)
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation NeurIPS 2025
While spatial reasoning has made progress in object localization relationships, it often overlooks object orientation-a key factor in 6-DoF fine-grained manipulation. Traditional pose representations rely on pre-defined frames or templates, limiting generalization and semantic grounding. In this paper, we introduce the concept of semantic orientation, which defines object orientations using natural language in a reference-frame-free manner (e.g., the "plug-in" direction of a USB or the "handle" direction of a cup). To support this, we construct OrienText300K, a large-scale dataset of 3D objects annotated with semantic orientations, and develop PointSO, a general model for zero-shot semantic orientation prediction. By integrating semantic orientation into VLM agents, our SoFar framework enables 6-DoF spatial reasoning and generates robotic actions. Extensive experiments demonstrated the effectiveness and generalization of our SoFar, e.g., zero-shot 48.7% successful rate on Open6DOR and zero-shot 74.9% successful rate on SIMPLER-Env.
comment: Accepted at NeurIPS 2025 Spotlight
Generalizable Domain Adaptation for Sim-and-Real Policy Co-Training
Behavior cloning has shown promise for robot manipulation, but real-world demonstrations are costly to acquire at scale. While simulated data offers a scalable alternative, particularly with advances in automated demonstration generation, transferring policies to the real world is hampered by various simulation and real domain gaps. In this work, we propose a unified sim-and-real co-training framework for learning generalizable manipulation policies that primarily leverages simulation and only requires a few real-world demonstrations. Central to our approach is learning a domain-invariant, task-relevant feature space. Our key insight is that aligning the joint distributions of observations and their corresponding actions across domains provides a richer signal than aligning observations (marginals) alone. We achieve this by embedding an Optimal Transport (OT)-inspired loss within the co-training framework, and extend this to an Unbalanced OT framework to handle the imbalance between abundant simulation data and limited real-world examples. We validate our method on challenging manipulation tasks, showing it can leverage abundant simulation data to achieve up to a 30% improvement in the real-world success rate and even generalize to scenarios seen only in simulation.
Streaming Flow Policy: Simplifying diffusion/flow-matching policies by treating action trajectories as flow trajectories
Recent advances in diffusion$/$flow-matching policies have enabled imitation learning of complex, multi-modal action trajectories. However, they are computationally expensive because they sample a trajectory of trajectories: a diffusion$/$flow trajectory of action trajectories. They discard intermediate action trajectories, and must wait for the sampling process to complete before any actions can be executed on the robot. We simplify diffusion$/$flow policies by treating action trajectories as flow trajectories. Instead of starting from pure noise, our algorithm samples from a narrow Gaussian around the last action. Then, it incrementally integrates a velocity field learned via flow matching to produce a sequence of actions that constitute a single trajectory. This enables actions to be streamed to the robot on-the-fly during the flow sampling process, and is well-suited for receding horizon policy execution. Despite streaming, our method retains the ability to model multi-modal behavior. We train flows that stabilize around demonstration trajectories to reduce distribution shift and improve imitation learning performance. Streaming flow policy outperforms prior methods while enabling faster policy execution and tighter sensorimotor loops for learning-based robot control. Project website: https://streaming-flow-policy.github.io/
comment: Conference on Robot Learning (CoRL) 2025
Model Agnostic Defense against Adversarial Patch Attacks on Object Detection in Unmanned Aerial Vehicles IROS 2024
Object detection forms a key component in Unmanned Aerial Vehicles (UAVs) for completing high-level tasks that depend on the awareness of objects on the ground from an aerial perspective. In that scenario, adversarial patch attacks on an onboard object detector can severely impair the performance of upstream tasks. This paper proposes a novel model-agnostic defense mechanism against the threat of adversarial patch attacks in the context of UAV-based object detection. We formulate adversarial patch defense as an occlusion removal task. The proposed defense method can neutralize adversarial patches located on objects of interest, without exposure to adversarial patches during training. Our lightweight single-stage defense approach allows us to maintain a model-agnostic nature, that once deployed does not require to be updated in response to changes in the object detection pipeline. The evaluations in digital and physical domains show the feasibility of our method for deployment in UAV object detection pipelines, by significantly decreasing the Attack Success Ratio without incurring significant processing costs. As a result, the proposed defense solution can improve the reliability of object detection for UAVs.
comment: published in IROS 2024
OpenGVL - Benchmarking Visual Temporal Progress for Data Curation
Data scarcity remains one of the most limiting factors in driving progress in robotics. However, the amount of available robotics data in the wild is growing exponentially, creating new opportunities for large-scale data utilization. Reliable temporal task completion prediction could help automatically annotate and curate this data at scale. The Generative Value Learning (GVL) approach was recently proposed, leveraging the knowledge embedded in vision-language models (VLMs) to predict task progress from visual observations. Building upon GVL, we propose OpenGVL, a comprehensive benchmark for estimating task progress across diverse challenging manipulation tasks involving both robotic and human embodiments. We evaluate the capabilities of publicly available open-source foundation models, showing that open-source model families significantly underperform closed-source counterparts, achieving only approximately $70\%$ of their performance on temporal progress prediction tasks. Furthermore, we demonstrate how OpenGVL can serve as a practical tool for automated data curation and filtering, enabling efficient quality assessment of large-scale robotics datasets. We release the benchmark along with the complete codebase at \href{github.com/budzianowski/opengvl}{OpenGVL}.
Systematic Constraint Formulation and Collision-Free Trajectory Planning Using Space-Time Graphs of Convex Sets
In this paper, we create optimal, collision-free, time-dependent trajectories through cluttered dynamic environments. The many spatial and temporal constraints make finding an initial guess for a numerical solver difficult. Graphs of Convex Sets (GCS) and the recently developed Space-Time Graphs of Convex Sets (ST-GCS) enable us to generate minimum distance collision-free trajectories without providing an initial guess to the solver. We also explore the derivation of general GCS-compatible constraints and document an intuitive strategy for adapting general constraints to the framework. We show that ST-GCS produces equivalent trajectories to the standard GCS formulation when the environment is static, as well as globally optimal trajectories in cluttered dynamic environments.
comment: 16 pages with references, 20 figures
AnyPlace: Learning Generalized Object Placement for Robot Manipulation
Object placement in robotic tasks is inherently challenging due to the diversity of object geometries and placement configurations. To address this, we propose AnyPlace, a two-stage method trained entirely on synthetic data, capable of predicting a wide range of feasible placement poses for real-world tasks. Our key insight is that by leveraging a Vision-Language Model (VLM) to identify rough placement locations, we focus only on the relevant regions for local placement, which enables us to train the low-level placement-pose-prediction model to capture diverse placements efficiently. For training, we generate a fully synthetic dataset of randomly generated objects in different placement configurations (insertion, stacking, hanging) and train local placement-prediction models. We conduct extensive evaluations in simulation, demonstrating that our method outperforms baselines in terms of success rate, coverage of possible placement modes, and precision. In real-world experiments, we show how our approach directly transfers models trained purely on synthetic data to the real world, where it successfully performs placements in scenarios where other models struggle -- such as with varying object geometries, diverse placement modes, and achieving high precision for fine placement. More at: https://any-place.github.io.
comment: Accepted at CoRL 2025
GAF: Gaussian Action Field as a 4D Representation for Dynamic World Modeling in Robotic Manipulation
Accurate scene perception is critical for vision-based robotic manipulation. Existing approaches typically follow either a Vision-to-Action (V-A) paradigm, predicting actions directly from visual inputs, or a Vision-to-3D-to-Action (V-3D-A) paradigm, leveraging intermediate 3D representations. However, these methods often struggle with action inaccuracies due to the complexity and dynamic nature of manipulation scenes. In this paper, we adopt a V-4D-A framework that enables direct action reasoning from motion-aware 4D representations via a Gaussian Action Field (GAF). GAF extends 3D Gaussian Splatting (3DGS) by incorporating learnable motion attributes, allowing 4D modeling of dynamic scenes and manipulation actions. To learn time-varying scene geometry and action-aware robot motion, GAF provides three interrelated outputs: reconstruction of the current scene, prediction of future frames, and estimation of init action via Gaussian motion. Furthermore, we employ an action-vision-aligned denoising framework, conditioned on a unified representation that combines the init action and the Gaussian perception, both generated by the GAF, to further obtain more precise actions. Extensive experiments demonstrate significant improvements, with GAF achieving +11.5385 dB PSNR, +0.3864 SSIM and -0.5574 LPIPS improvements in reconstruction quality, while boosting the average +7.3% success rate in robotic manipulation tasks over state-of-the-art methods.
comment: http://chaiying1.github.io/GAF.github.io/project_page/
Multiagent Systems
Adaptive Event-Triggered Policy Gradient for Multi-Agent Reinforcement Learning
Conventional multi-agent reinforcement learning (MARL) methods rely on time-triggered execution, where agents sample and communicate actions at fixed intervals. This approach is often computationally expensive and communication-intensive. To address this limitation, we propose ET-MAPG (Event-Triggered Multi-Agent Policy Gradient reinforcement learning), a framework that jointly learns an agent's control policy and its event-triggering policy. Unlike prior work that decouples these mechanisms, ET-MAPG integrates them into a unified learning process, enabling agents to learn not only what action to take but also when to execute it. For scenarios with inter-agent communication, we introduce AET-MAPG, an attention-based variant that leverages a self-attention mechanism to learn selective communication patterns. AET-MAPG empowers agents to determine not only when to trigger an action but also with whom to communicate and what information to exchange, thereby optimizing coordination. Both methods can be integrated with any policy gradient MARL algorithm. Extensive experiments across diverse MARL benchmarks demonstrate that our approaches achieve performance comparable to state-of-the-art, time-triggered baselines while significantly reducing both computational load and communication overhead.
On Robustness of Consensus over Pseudo-Undirected Path Graphs
Consensus over networked agents is typically studied using undirected or directed communication graphs. Undirected graphs enforce symmetry in information exchange, leading to convergence to the average of initial states, while directed graphs permit asymmetry but make consensus dependent on root nodes and their influence. Both paradigms impose inherent restrictions on achievable consensus values and network robustness. This paper introduces a theoretical framework for achieving consensus over a class of network topologies, termed pseudo-undirected graphs, which retains bidirectional connectivity between node pairs but allows the corresponding edge weights to differ, including the possibility of negative values under bounded conditions. The resulting Laplacian is generally non-symmetric, yet it guarantees consensus under connectivity assumptions, to expand the solution space, which enables the system to achieve a stable consensus value that can lie outside the convex hull of the initial state set. We derive admissibility bounds for negative weights for a pseudo-undirected path graph, and show an application in the simultaneous interception of a moving target.
Choose Your Battles: Distributed Learning Over Multiple Tug of War Games
Consider N players and K games taking place simultaneously. Each of these games is modeled as a Tug-of-War (ToW) game where increasing the action of one player decreases the reward for all other players. Each player participates in only one game at any given time. At each time step, a player decides the game in which they wish to participate in and the action they take in that game. Their reward depends on the actions of all players that are in the same game. This system of K games is termed `Meta Tug-of-War' (Meta-ToW) game. These games can model scenarios such as power control, distributed task allocation, and activation in sensor networks. We propose the Meta Tug-of-Peace algorithm, a distributed algorithm where the action updates are done using a simple stochastic approximation algorithm, and the decision to switch games is made using an infrequent 1-bit communication between the players. We prove that in Meta-ToW games, our algorithm converges to an equilibrium that satisfies a target Quality of Service reward vector for the players. We then demonstrate the efficacy of our algorithm through simulations for the scenarios mentioned above.
comment: Submitted to IEEE TAC
RadAgents: Multimodal Agentic Reasoning for Chest X-ray Interpretation with Radiologist-like Workflows
Agentic systems offer a potential path to solve complex clinical tasks through collaboration among specialized agents, augmented by tool use and external knowledge bases. Nevertheless, for chest X-ray (CXR) interpretation, prevailing methods remain limited: (i) reasoning is frequently neither clinically interpretable nor aligned with guidelines, reflecting mere aggregation of tool outputs; (ii) multimodal evidence is insufficiently fused, yielding text-only rationales that are not visually grounded; and (iii) systems rarely detect or resolve cross-tool inconsistencies and provide no principled verification mechanisms. To bridge the above gaps, we present RadAgents, a multi-agent framework for CXR interpretation that couples clinical priors with task-aware multimodal reasoning. In addition, we integrate grounding and multimodal retrieval-augmentation to verify and resolve context conflicts, resulting in outputs that are more reliable, transparent, and consistent with clinical practice.
comment: In progress
Structuring Collective Action with LLM-Guided Evolution: From Ill-Structured Problems to Executable Heuristics
Collective action problems, which require aligning individual incentives with collective goals, are classic examples of Ill-Structured Problems (ISPs). For an individual agent, the causal links between local actions and global outcomes are unclear, stakeholder objectives often conflict, and no single, clear algorithm can bridge micro-level choices with macro-level welfare. We present ECHO-MIMIC, a computational framework that converts this global complexity into a tractable, Well-Structured Problem (WSP) for each agent by discovering compact, executable heuristics and persuasive rationales. The framework operates in two stages: ECHO (Evolutionary Crafting of Heuristics from Outcomes) evolves snippets of Python code that encode candidate behavioral policies, while MIMIC (Mechanism Inference & Messaging for Individual-to-Collective Alignment) evolves companion natural language messages that motivate agents to adopt those policies. Both phases employ a large-language-model-driven evolutionary search: the LLM proposes diverse and context-aware code or text variants, while population-level selection retains those that maximize collective performance in a simulated environment. We demonstrate this framework on a canonical ISP in agricultural landscape management, where local farming decisions impact global ecological connectivity. Results show that ECHO-MIMIC discovers high-performing heuristics compared to baselines and crafts tailored messages that successfully align simulated farmer behavior with landscape-level ecological goals. By coupling algorithmic rule discovery with tailored communication, ECHO-MIMIC transforms the cognitive burden of collective action into a simple set of agent-level instructions, making previously ill-structured problems solvable in practice and opening a new path toward scalable, adaptive policy design.
Learning to Lead Themselves: Agentic AI in MAS using MARL
As autonomous systems move from prototypes to real deployments, the ability of multiple agents to make decentralized, cooperative decisions becomes a core requirement. This paper examines how agentic artificial intelligence, agents that act independently, adaptively and proactively can improve task allocation and coordination in multi-agent systems, with primary emphasis on drone delivery and secondary relevance to warehouse automation. We formulate the problem in a cooperative multi-agent reinforcement learning setting and implement a lightweight multi-agent Proximal Policy Optimization, called IPPO, approach in PyTorch under a centralized-training, decentralized-execution paradigm. Experiments are conducted in PettingZoo environment, where multiple homogeneous drones or agents must self-organize to cover distinct targets without explicit communication.
comment: Exploring foundational behaviours of agentic ai using MARL 39 pages - 25 minute read, 5 tables, 24 equation, 9 figures
PromptSculptor: Multi-Agent Based Text-to-Image Prompt Optimization EMNLP 2025
The rapid advancement of generative AI has democratized access to powerful tools such as Text-to-Image models. However, to generate high-quality images, users must still craft detailed prompts specifying scene, style, and context-often through multiple rounds of refinement. We propose PromptSculptor, a novel multi-agent framework that automates this iterative prompt optimization process. Our system decomposes the task into four specialized agents that work collaboratively to transform a short, vague user prompt into a comprehensive, refined prompt. By leveraging Chain-of-Thought reasoning, our framework effectively infers hidden context and enriches scene and background details. To iteratively refine the prompt, a self-evaluation agent aligns the modified prompt with the original input, while a feedback-tuning agent incorporates user feedback for further refinement. Experimental results demonstrate that PromptSculptor significantly enhances output quality and reduces the number of iterations needed for user satisfaction. Moreover, its model-agnostic design allows seamless integration with various T2I models, paving the way for industrial applications.
comment: Accepted to EMNLP 2025 System Demonstration Track
Optimal Multi-agent Path Finding in Continuous Time
Continuous-time Conflict Based-Search (CCBS) has long been viewed as the standard optimal baseline for multi-agent path finding in continuous time (MAPFR), yet recent critiques show that the theoretically described CCBS can fail to terminate on solvable MAPFR problems while the publicly available reference implementation can return sub-optimal solutions. This work presents an analytical framework that yields simple and sufficient conditions under which any CCBS-style algorithm is both sound and solution complete. Investigating the reference CCBS implementation reveals that it violates our sufficient conditions for soundness, with counterexamples demonstrating sub-optimality. Leveraging the framework, we introduce a branching rule ($\delta$-BR) and prove it restores soundness and termination guarantees. Consequently, the resulting CCBS variant is both sound and solution complete. To our knowledge, this is the first MAPFR solver matching the guarantees of the discrete-time CBS. On a constructed example, CCBS with $\delta$-BR improves sum-of-costs from 10.707 to 9.000 ($\approx$ 16% lower) compared to the reference CCBS implementation. Across benchmarks, the reference CCBS implementation is generally able to find solutions faster than CCBS with $\delta$-BR due to its more aggressive pruning. However, this comes at the cost of occasional sub-optimality and potential non-termination when all solutions are pruned, whereas $\delta$-BR preserves optimality and guarantees termination by design. Because $\delta$-BR largely only affects the branching step, it can be adopted as a drop-in replacement in existing codebases. Beyond CCBS, the analytical framework and termination criterion provide a systematic way to evaluate other CCBS-like MAPFR solvers and future extensions, thereby offering tools for rigorous analysis of next-generation MAPFR algorithms.
comment: 35 pages
Homotopy-Aware Multi-Agent Path Planning on Plane
We propose an efficient framework using Dynnikov coordinates for homotopy-aware multi-agent path planning in planar domains that may contain obstacles. We developed a method for generating multiple homotopically distinct solutions for the multi-agent path planning problem in planar domains by combining our framework with revised prioritized planning and proved its completeness under specific assumptions. Experimentally, we demonstrated that our method is significantly faster than a method without Dynnikov coordinates. We also confirmed experimentally that homotopy-aware planning contributes to avoiding locally optimal solutions when searching for low-cost trajectories for a swarm of agents in a continuous environment.
comment: 23 pages with 5 pages of references and appendices, 25 figures
Systems and Control (CS)
Approximately Optimal Toll Design for Efficiency and Equity in Arc-Based Traffic Assignment Models
Congestion pricing policies have emerged as promising traffic management tools to alleviate traffic congestion caused by travelers' selfish routing behaviors. The core principle behind deploying tolls is to impose monetary costs on frequently overcrowded routes, to incentivize self-interested travelers to select less easily congested routes. Recent literature has focused on toll design based on arc-based traffic assignment models (TAMs), which characterize commuters as traveling through a traffic network by successively selecting an outgoing arc from every intermediate node along their journey. However, existing tolling mechanisms predicated on arc-based TAMs often target the design of a single congestion-minimizing toll, ignoring crucial fairness considerations, such as the financial impact of high congestion fees on low-income travelers. To address these shortcomings, in this paper, we pose the dual considerations of efficiency and equity in traffic routing as bilevel optimization problems. Since such problems are in general computationally intractable to solve precisely, we construct a linear program approximation by introducing a polytope approximation for the set of all tolls that induce congestion-minimizing traffic flow patterns. Finally, we provide numerical results that validate our theoretical conclusions.
Adaptive Event-Triggered Policy Gradient for Multi-Agent Reinforcement Learning
Conventional multi-agent reinforcement learning (MARL) methods rely on time-triggered execution, where agents sample and communicate actions at fixed intervals. This approach is often computationally expensive and communication-intensive. To address this limitation, we propose ET-MAPG (Event-Triggered Multi-Agent Policy Gradient reinforcement learning), a framework that jointly learns an agent's control policy and its event-triggering policy. Unlike prior work that decouples these mechanisms, ET-MAPG integrates them into a unified learning process, enabling agents to learn not only what action to take but also when to execute it. For scenarios with inter-agent communication, we introduce AET-MAPG, an attention-based variant that leverages a self-attention mechanism to learn selective communication patterns. AET-MAPG empowers agents to determine not only when to trigger an action but also with whom to communicate and what information to exchange, thereby optimizing coordination. Both methods can be integrated with any policy gradient MARL algorithm. Extensive experiments across diverse MARL benchmarks demonstrate that our approaches achieve performance comparable to state-of-the-art, time-triggered baselines while significantly reducing both computational load and communication overhead.
Adversarial Pursuits in Cislunar Space
Cislunar space is becoming a critical domain for future lunar and interplanetary missions, yet its remoteness, sparse infrastructure, and unstable dynamics create single points of failure. Adversaries in cislunar orbits can exploit these vulnerabilities to pursue and jam co-located communication relays, potentially severing communications between lunar missions and the Earth. We study a pursuit-evasion scenario between two spacecraft in a cislunar orbit, where the evader must avoid a pursuer-jammer while remaining close to its nominal trajectory. We model the evader-pursuer interaction as a zero-sum adversarial differential game cast in the circular restricted three-body problem. This formulation incorporates critical aspects of cislunar orbital dynamics, including autonomous adjustment of the reference orbit phasing to enable aggressive evading maneuvers, and shaping of the evader's cost with the orbit's stable and unstable manifolds. We solve the resulting nonlinear game locally using a continuous-time differential dynamic programming variant, which iteratively applies linear-quadratic approximations to the Hamilton-Jacobi-Isaacs equation. We simulate the evader's behavior against both a worst-case and a linear-quadratic pursuer. Our results pave the way for securing future missions in cislunar space against emerging cyber threats.
comment: 17 pages, 9 figures
On Robustness of Consensus over Pseudo-Undirected Path Graphs
Consensus over networked agents is typically studied using undirected or directed communication graphs. Undirected graphs enforce symmetry in information exchange, leading to convergence to the average of initial states, while directed graphs permit asymmetry but make consensus dependent on root nodes and their influence. Both paradigms impose inherent restrictions on achievable consensus values and network robustness. This paper introduces a theoretical framework for achieving consensus over a class of network topologies, termed pseudo-undirected graphs, which retains bidirectional connectivity between node pairs but allows the corresponding edge weights to differ, including the possibility of negative values under bounded conditions. The resulting Laplacian is generally non-symmetric, yet it guarantees consensus under connectivity assumptions, to expand the solution space, which enables the system to achieve a stable consensus value that can lie outside the convex hull of the initial state set. We derive admissibility bounds for negative weights for a pseudo-undirected path graph, and show an application in the simultaneous interception of a moving target.
Certified Learning-Enabled Noise-Aware Motion Planning for Urban Air Mobility
Urban Air Mobility (UAM) has emerged as a promising solution to alleviate urban congestion and transportation challenges. Nevertheless, the noise generated by eVTOL aircrafts poses a significant barrier to public acceptance and regulatory approval, potentially limiting the operational scope and scalability of UAM systems. Hence, the successful adoption of UAM systems hinges on the ability to predict generated noise levels, and further develop motion planning strategies that comply with community-level noise regulations while maintaining operational efficiency. To this end, this paper proposes a novel noise-aware motion planning framework for UAM systems that ensures compliance with noise regulations. We first develop a certifiable neural network model to accurately predict eVTOL noise propagation patterns in urban environments, providing provable bounds on its correctness. To achieve a desired level of accuracy, we propose an active sampling strategy to efficiently build the dataset used to train and test the noise model. Next, we develop a noise-aware motion planning algorithm that utilizes the noise model to generate eVTOL trajectories that guarantee compliance with community noise regulations. The algorithm exploits the monotonic structure of the noise model to efficiently sample the configuration space, ensuring that the generated trajectories are both noise-compliant and operationally efficient. We demonstrate the effectiveness of the proposed framework through a number of experiments for Vahana eVTOLs. The results show that the framework can generate noise-compliant flight plans for a fleet of eVTOLs that adhere to community noise regulations while optimizing operational efficiency.
comment: 16 pages, 10 figures
Choose Your Battles: Distributed Learning Over Multiple Tug of War Games
Consider N players and K games taking place simultaneously. Each of these games is modeled as a Tug-of-War (ToW) game where increasing the action of one player decreases the reward for all other players. Each player participates in only one game at any given time. At each time step, a player decides the game in which they wish to participate in and the action they take in that game. Their reward depends on the actions of all players that are in the same game. This system of K games is termed `Meta Tug-of-War' (Meta-ToW) game. These games can model scenarios such as power control, distributed task allocation, and activation in sensor networks. We propose the Meta Tug-of-Peace algorithm, a distributed algorithm where the action updates are done using a simple stochastic approximation algorithm, and the decision to switch games is made using an infrequent 1-bit communication between the players. We prove that in Meta-ToW games, our algorithm converges to an equilibrium that satisfies a target Quality of Service reward vector for the players. We then demonstrate the efficacy of our algorithm through simulations for the scenarios mentioned above.
comment: Submitted to IEEE TAC
Koopman-Operator-Based Model Predictive Control for Drag-free Satellite
This paper presents a data-driven modelling method for nonlinear dynamics of drag-free satellite based on Koopman operator theory, and a model predictive controller is designed based on the identified model. The nonlinear dynamics of drag-free satellite are identified and controlled based on Sparse Identification of Nonlinear Dynamics (SINDy). Using the manually constructed nonlinear function dictionary as observables, the system approximation is obtained by SINDy algorithm, and a linear Model Predictive Control (MPC) controller is designed for test mass capture based on the SINDy model. Finally, the effectiveness of MPC control is verified by numerical examples.
Orbital Stabilization and Time Synchronization of Unstable Periodic Motions in Underactuated Robots
This paper presents a control methodology for achieving orbital stabilization with simultaneous time synchronization of periodic trajectories in underactuated robotic systems. The proposed approach extends the classical transverse linearization framework to explicitly incorporate time-desynchronization dynamics. To stabilize the resulting extended transverse dynamics, we employ a combination of time-varying LQR and sliding-mode control. The theoretical results are validated experimentally through the implementation of both centralized and decentralized control strategies on a group of six Butterfly robots.
Distributed Koopman Operator Learning from Sequential Observations
This paper presents a distributed Koopman operator learning framework for modeling unknown nonlinear dynamics using sequential observations from multiple agents. Each agent estimates a local Koopman approximation based on lifted data and collaborates over a communication graph to reach exponential consensus on a consistent distributed approximation. The approach supports distributed computation under asynchronous and resource-constrained sensing. Its performance is demonstrated through simulation results, validating convergence and predictive accuracy under sensing-constrained scenarios and limited communication.
Voltage-sensitive distribution factors for contingency analysis and topology optimization
Topology optimization is a promising approach for mitigating congestion and managing changing grid conditions, but it is computationally challenging and requires approximations. Conventional distribution factors like PTDFs and LODFs, based on DC power flow, fail to capture voltage variations, reactive power, and losses - limiting their use in detailed optimization tasks such as busbar splitting. This paper introduces generalized distribution factors derived from a voltage-sensitive linearization of the full AC power flow equations. The proposed formulation accurately reflects reactive power flows, Ohmic losses, and voltage deviations while remaining computationally efficient. We derive and evaluate generalized PTDFs, LODFs, and topology modification factors using matrix identities. We discuss potential applications including voltage-aware N-1 security analysis, and topology optimization with a focus on busbar splitting. Numerical experiments demonstrate close agreement with full AC solutions, significantly outperforming the traditional DC approximation.
comment: 7 pages, 4 figures
Control and Navigation of a 2-D Electric Rocket
This work addresses the control and navigation of a simulated two-dimensional electric rocket. The model provides a simplified framework that neglects actuator dynamics and aerodynamic effects while capturing the complexities of underactuation and state coupling. Trajectory tracking is achieved through a modularized and layered control architecture, with employement of a Linear Quadratic Regulator (LQR) and Lyapunov theory. Full-state estimation is achieved through Kalman filtering techniques, part of the navigation module. The solutions are thoroughly evaluated in a custom-built MATLAB/Simulink testbed, simulating real-world conditions while maintaining a simplified setup. The results reveal limitations along the lateral axis, whose resolution is suggested for future work.
Modeling and Control of Deep Sign-Definite Dynamics with Application to Hybrid Powertrain Control
Deep learning is increasingly used for complex, large-scale systems where first-principles modeling is difficult. However, standard deep learning models often fail to enforce physical structure or preserve convexity in downstream control, leading to physically inconsistent predictions and discontinuous inputs owing to nonconvexity. We introduce sign constraints--sign restrictions on Jacobian entries--that unify monotonicity, positivity, and sign-definiteness; additionally, we develop model-construction methods that enforce them, together with a control-synthesis procedure. In particular, we design exactly linearizable deep models satisfying these constraints and formulate model predictive control as a convex quadratic program, which yields a unique optimizer and a Lipschitz continuous control law. On a two-tank system and a hybrid powertrain, the proposed approach improves prediction accuracy and produces smoother control inputs than existing methods.
comment: Submitted to Automatica
Scalable and Approximation-free Symbolic Control for Unknown Euler-Lagrange Systems
We propose a novel symbolic control framework for enforcing temporal logic specifications in Euler-Lagrange systems that addresses the key limitations of traditional abstraction-based approaches. Unlike existing methods that require exact system models and provide guarantees only at discrete sampling instants, our approach relies only on bounds on system parameters and input constraints, and ensures correctness for the full continuous-time trajectory. The framework combines scalable abstraction of a simplified virtual system with a closed-form, model-free controller that guarantees trajectories satisfy the original specification while respecting input bounds and remaining robust to unknown but bounded disturbances. We provide feasibility conditions for the construction of confinement regions and analyze the trade-off between efficiency and conservatism. Case studies on pendulum dynamics, a two-link manipulator, and multi-agent systems, including hardware experiments, demonstrate that the proposed approach ensures both correctness and safety while significantly reducing computational time and memory requirements. These results highlight its scalability and practicality for real-world robotic systems where precise models are unavailable and continuous-time guarantees are essential.
CollaPipe: Adaptive Segment-Optimized Pipeline Parallelism for Collaborative LLM Training in Heterogeneous Edge Networks
The increasing demand for intelligent mobile applications has made multi-agent collaboration with Transformer-based large language models (LLMs) essential in mobile edge computing (MEC) networks. However, training LLMs in such environments remains challenging due to heavy computation, high end-to-end latency, and limited model generalization. We introduce CollaPipe, a hybrid distributed learning framework that integrates collaborative pipeline parallelism with federated aggregation to support self-evolving intelligent networks. In CollaPipe, the encoder part is adaptively partitioned into variable-sized segments and deployed across mobile devices for pipeline-parallel training, while the decoder is deployed on edge servers to handle generative tasks. Then we perform global model update via federated aggregation. To enhance training efficiency, we formulate a joint optimization problem that adaptively allocates model segments, micro-batches, bandwidth, and transmission power. We derive and use a closed-form convergence bound to design an Dynamic Segment Scheduling and Resource Allocation (DSSDA) algorithm based on Lyapunov optimization, ensuring system stability under long-term constraints. Extensive experiments on downstream tasks with Transformer and BERT models show that CollaPipe improves computation efficiency by up to 15.09%, reduces end-to-end latency by at least 48.98%, and cuts single device memory usage by more than half, enabling online learning in heterogeneous and dynamic communication environments.
comment: Submitted to IEEE for review
An early termination strategy for the distributed biased min-consensus protocol under disturbances SC 2025
The distributed biased min-consensus (DBMC) protocol is an iterative scheme that solves the shortest path problem asymptotically, requiring only local information exchange between neighboring nodes. By appropriately designing the gain function, prior work [1] proposed a DBMC-based system that ensures convergence within a pre-specified time interval. However, this guarantee assumes the absence of disturbances. In this paper, we study the DBMC-based system under disturbances affecting the edge weights. We first establish rigorous error bounds on the resulting state estimates. Building on this analysis, we then propose a practical early termination strategy to prevent potential singularities, specifically, unbounded gain, that may arise in the presence of disturbances, while still ensuring that the shortest paths are correctly identified.Simulations are performed to validate and illustrate the theoretical results.
comment: paper accepted to IEEE ICNSC 2025
Zonotope-Based Elastic Tube Model Predictive Control
Tube-based Model Predictive Control (MPC) is a widely adopted robust control framework for constrained linear systems under additive disturbance. The paper is focused on reducing the numerical complexity associated with the tube parameterization, described as a sequence of elastically-scaled zonotopic sets. A new class of scaled-zonotope inclusion conditions is proposed, alleviating the need for a priori specification of certain set-containment constraints and achieving significant reductions in complexity. A comprehensive complexity analysis is provided for both the polyhedral and the zonotopic setting, illustrating the trade-off between an enlarged domain of attraction and the required computational effort. The proposed approach is validated through extensive numerical experiments.
Dispersion Formation Control: from Geometry to Distribution
We introduce and develop the concept of dispersion formation control, bridging a gap between shape-assembly studies in physics and biology and formation control theory. In current formation control studies, the control objectives typically focus on achieving desired local geometric properties, such as inter-agent distances, bearings, or relative positions. In contrast, our dispersion formation control approach enables agents to directly regulate the dispersion of their spatial distribution, a global variable associated with a covariance matrix. Specifically, we introduce the notion of covariance similarity to define the target spatial dispersion of agents. Building on this framework, we propose two control strategies: a centralized approach to illustrate the key ideas, and a distributed approach that enables agents to control the global dispersion but using only local information. Our stability analysis demonstrates that both strategies ensure exponential convergence of the agents' distribution to the desired dispersion. Notably, controlling a global variable rather than multiple local ones enhances the resiliency of the system, particularly against malfunctioning agents. Simulations validate the effectiveness of the proposed dispersion formation control.
comment: 8 pages, 4 figures
Formal Safety Verification and Refinement for Generative Motion Planners via Certified Local Stabilization
We present a method for formal safety verification of learning-based generative motion planners. Generative motion planners (GMPs) offer advantages over traditional planners, but verifying the safety and dynamic feasibility of their outputs is difficult since neural network verification (NNV) tools scale only to a few hundred neurons, while GMPs often contain millions. To preserve GMP expressiveness while enabling verification, our key insight is to imitate the GMP by stabilizing references sampled from the GMP with a small neural tracking controller and then applying NNV to the closed-loop dynamics. This yields reachable sets that rigorously certify closed-loop safety, while the controller enforces dynamic feasibility. Building on this, we construct a library of verified GMP references and deploy them online in a way that imitates the original GMP distribution whenever it is safe to do so, improving safety without retraining. We evaluate across diverse planners, including diffusion, flow matching, and vision-language models, improving safety in simulation (on ground robots and quadcopters) and on hardware (differential-drive robot).
comment: 10 pages, 12 figures
SoK: A Systematic Review of Malware Ontologies and Taxonomies and Implications for the Quantum Era
The threat of quantum malware is real and a growing security concern that will have catastrophic scientific and technological impacts, if not addressed early. If weaponised or exploited especially by the wrong hands, malware will undermine highly sophisticated critical systems supported by next-generation quantum architectures, for example, in defence, communications, energy, and space. This paper explores the fundamental nature and implications of quantum malware to enable the future development of appropriate mitigations and defences, thereby protecting critical infrastructure. By conducting a systematic literature review (SLR) that draws on knowledge frameworks such as ontologies and taxonomies to explore malware, this provides insights into how malicious behaviours can be translated into attacks on quantum technologies, thereby providing a lens to analyse the severity of malware against quantum technologies. This study employs the European Competency Framework for Quantum Technologies (CFQT) as a guide to map malware behaviour to several competency layers, creating a foundation in this emerging field.
comment: 40 pages, 9 figures, 5 tables
Training Task Reasoning LLM Agents for Multi-turn Task Planning via Single-turn Reinforcement Learning
Large Language Models (LLMs) have demonstrated remarkable capabilities in knowledge acquisition, reasoning, and tool use, making them promising candidates for autonomous agent applications. However, training LLM agents for complex multi-turn task planning faces significant challenges, including sparse episode-wise rewards, credit assignment across long horizons, and the computational overhead of reinforcement learning in multi-turn interaction settings. To this end, this paper introduces a novel approach that transforms multi-turn task planning into single-turn task reasoning problems, enabling efficient policy optimization through Group Relative Policy Optimization (GRPO) with dense and verifiable reward from expert trajectories. Our theoretical analysis shows that GRPO improvement on single-turn task reasoning results in higher multi-turn success probability under the minimal turns, as well as the generalization to subtasks with shorter horizons. Experimental evaluation on the complex task planning benchmark demonstrates that our 1.5B parameter model trained with single-turn GRPO achieves superior performance compared to larger baseline models up to 14B parameters, with success rates of 70% for long-horizon planning tasks with over 30 steps. We also theoretically and empirically validate the strong cross-task generalizability that the models trained on complex tasks can lead to the successful completion of all simpler subtasks.
Data-Driven State Observers for Measure-Preserving Systems
The increasing use of data-driven control strategies gives rise to the problem of learning-based state observation. Motivated by this need, the present work proposes a data-driven approach for the synthesis of state observers for discrete-time nonlinear systems with measure-preserving dynamics. To this end, Kazantzis-Kravaris/Luenburger (KKL) observers are shown to be well-defined, where the observer design boils down to determining a nonlinear injective mapping of states and its pseudo-inverse. For its learning-based construction, the KKL observer is related to the Koopman and Perron-Frobenius operators, defined on a Sobolev-type reproducing kernel Hilbert space (RKHS) on which they are shown to be normal operators and thus have a spectral resolution. Hence, observer synthesis algorithms, based on kernel interpolation/regression routines for the desired injective mapping in the observer and its pseudo-inverse, have been proposed in various settings of available dataset -- (i) many orbits, (ii) single long orbit, and (iii) snapshots. Theoretical error analyses are provided, and numerical studies on a chaotic Lorenz system are demonstrated.
comment: 40 pages, 11 figures, submitted to Journal of Nonlinear Science
PIRF: Physics-Informed Reward Fine-Tuning for Diffusion Models NeurIPS 2025
Diffusion models have demonstrated strong generative capabilities across scientific domains, but often produce outputs that violate physical laws. We propose a new perspective by framing physics-informed generation as a sparse reward optimization problem, where adherence to physical constraints is treated as a reward signal. This formulation unifies prior approaches under a reward-based paradigm and reveals a shared bottleneck: reliance on diffusion posterior sampling (DPS)-style value function approximations, which introduce non-negligible errors and lead to training instability and inference inefficiency. To overcome this, we introduce Physics-Informed Reward Fine-tuning (PIRF), a method that bypasses value approximation by computing trajectory-level rewards and backpropagating their gradients directly. However, a naive implementation suffers from low sample efficiency and compromised data fidelity. PIRF mitigates these issues through two key strategies: (1) a layer-wise truncated backpropagation method that leverages the spatiotemporally localized nature of physics-based rewards, and (2) a weight-based regularization scheme that improves efficiency over traditional distillation-based methods. Across five PDE benchmarks, PIRF consistently achieves superior physical enforcement under efficient sampling regimes, highlighting the potential of reward fine-tuning for advancing scientific generative modeling.
comment: 18 pages, 6 figures; NeurIPS 2025 AI for science workshop
Adaptive Altitude Control of a Tethered Multirotor Autogyro under Varying Wind Speeds using Differential Rotor Braking
A tethered multirotor autogyro can function as an unmanned aerial vehicle for energy-efficient and prolonged deployment, as it uses the available wind energy to sustain flight. This article presents an adaptive altitude control strategy for such a device. At a constant wind speed, the equilibrium altitude can be approximated by a quadratic function of the pitch angle. The proposed adaptive control estimates the coefficients of this quadratic function. The estimates are used for altitude control and to attain the maximum altitude (and minimum horizontal drift) for a given wind speed. A feedback controller based on regenerative differential rotor braking is used as the actuation to modulate the autogyro's pitch angle. Implementation of the controller using a control-oriented, higher-order dynamic model demonstrates the controller's capability to regulate the altitude and maintain stable flights under varying wind speeds. Based on the system's maximum altitude tracking performance, the adaptive control is adjusted to improve performance under substantial changes in wind speeds.
A Crime/S.I.R. optimal control problem
This paper presents and discusses a mathematical model inspired by control theory to derive optimal public policies for minimizing costs associated with the reduction and control of criminal activity in a population. Specifically, we analyze the optimal control problem \begin{equation*} \min G(u_1, u_2, u_3) = \int_{0}^{t_{\text{F}}} \left( I(t) - R(t) + \frac{B_1}{2} u_1^2(t) + \frac{B_2}{2} u_2^2(t) + \frac{B_3}{2} u_3^2(t) \right) \, dt. \end{equation*} where $I=I(t)$ and $R=R(t)$ satisfies the system of equations \begin{equation*} \left\{ \begin{aligned} \dot{S} &= \Lambda - (1-u_1)SI - \mu S + ((1+u_3)\gamma_2)I + \rho \Omega R,\\ \dot{I} &= (1-u_1)SI - (\mu + \delta_1)I - ((1+u_2)\gamma_1)I - ((1+u_3)\gamma_2)I + (1-\Omega)\rho R,\\ \dot{R} &= ((1+u_2)\gamma_1)I - (\mu + \delta_2 + \rho)R. \end{aligned} \right. \end{equation*} Our approach assumes that the social and economic effects of criminal behavior can be modeled by a dynamic SIR-type system, which serves as a constraint on a cost functional associated with the strategies implemented by government and law enforcement authorities to reduce criminal behavior. Using optimal control theory, the proposed controls, i.e., preventive policies (such as community and social cohesion programs), are expected to have a significant and positive impact on crime reduction, generating opportunities for the most disadvantaged sectors of Cali society and contributing to long-term security. Given that resources to address this problem are limited, this research aims to determine an optimal combination of public interventions and policies that minimize criminality at the lowest possible economic cost, using an SIR model, tools from variational calculus, and optimal control theory.
Probability-One Optimization of Generalized Rayleigh Quotient Sum For Multi-Source Generalized Total Least-Squares
This paper addresses the global optimization of the sum of the Rayleigh quotient and the generalized Rayleigh quotient on the unit sphere. While various methods have been proposed for this problem, they fail to reliably converge to the global maximizer. To overcome this limitation, we propose an extension of the Riemannian Trust Region algorithm based on the probability-one homotopy optimization method, which enhances convergence to a global maximizer and, under certain conditions, ensures convergence to the global maximizer. In addition to the proposed method, existing state-of-the-art approaches are also presented, along with an explanation of their limitations and their connection to the proposed method. The proposed method is evaluated alongside the state-of-the-art approaches through numerical experiments, assessing convergence speed, success in reaching the global maximizer, and scalability with increasing problem dimension. Furthermore, we demonstrate how this ties in with the multi-source Bayesian Generalized Total Least-Squares (B-GTLS) problem, illustrating its applicability.
comment: This is the preprint version prior to peer review
Data-fused Model Predictive Control with Guarantees: Application to Flying Humanoid Robots
This paper introduces a Data-Fused Model Predictive Control (DFMPC) framework that combines physics-based models with data-driven representations of unknown dynamics. Leveraging Willems' Fundamental Lemma and an artificial equilibrium formulation, the method enables tracking of changing, potentially unreachable setpoints while explicitly handling measurement noise through slack variables and regularization. We provide guarantees of recursive feasibility and practical stability under input-output constraints for a specific class of reference signals. The approach is validated on the iRonCub flying humanoid robot, integrating analytical momentum models with data-driven turbine dynamics. Simulations show improved tracking and robustness compared to a purely model-based MPC, while maintaining real-time feasibility.
comment: 8 pages, 3 figures
MAPPO for Edge Server Monitoring
In this paper, we consider a goal-oriented communication problem for edge server monitoring, where jobs arrive intermittently at multiple dispatchers and must be assigned to shared edge servers with finite queues and time-varying availability. Accurate knowledge of server status is critical for sustaining high throughput, yet remains challenging under dynamic workloads and partial observability. To address this challenge, each dispatcher maintains server knowledge through two complementary mechanisms: (i) active status queries that provide instantaneous updates at a communication cost, and (ii) job execution feedback that reveals server conditions opportunistically. We formulate a cooperative multi-agent distributed decision-making problem in which dispatchers jointly optimize query scheduling to balance throughput against communication overhead. To solve this problem, we propose a Multi-Agent Proximal Policy Optimization (MAPPO)-based algorithm that leverages centralized training with decentralized execution (CTDE) to learn distributed query-and-dispatch policies under partial and stale observations. Numerical evaluations show that MAPPO achieves superior throughput-cost tradeoffs and significantly outperforms baseline strategies, achieving on average a 30% improvement over the closest baseline.
comment: 6 pages, 4 figures. Accepted to IEEE MILCOM 2025
Unlocking Transmission Flexibility under Uncertainty: Getting Dynamic Line Ratings into Electricity Markets
Static transmission line ratings may lead to underutilization of line capacity due to overly conservative assumptions. Grid-enhancing technologies (GETs) such as dynamic line ratings (DLRs), which adjust line capacity based on real-time conditions, are a techno-economically viable alternative to increase the utilization of existing power lines. Nonetheless, their adoption has been slow, partly due to the absence of operational tools that effectively account for simultaneous impacts on dispatch and pricing. In this paper, we represent transmission capacity with DLRs as a stock-like resource with time-variant interdependency, which is modeled via an approximation of line temperature evolution process, decoupling the impacts of ambient weather conditions and power flow on transmission line temperature and thus capacity. We integrate DLRs into a multi-period DC optimal power flow problem, with chance constrains addressing correlated uncertainty in DLRs and renewable generation. This yields non-convex problems that we transform into a tractable convex form by linearization. We derive locational marginal energy and ancillary services prices consistent with a competitive equilibrium. Numerical experiments on the 11-zone and 1814-node NYISO systems demonstrate its performance, including impacts on dispatch, pricing, and marginal carbon emissions.
Minimum Time Consensus of Multi-agent System under Fuel Constraints
This work addresses the problem of finding minimum time consensus point in the state space for a set of $N$ identical double integrator agents with bounded inputs and fixed fuel budget constraint. To address the problem, characterization of the attainable set for each agent subject to bounded inputs and fixed fuel budget constraints is done. Such attainable set is shown to be a convex set. The minimum time to consensus is the least time when the attainable sets of all agents intersect and the corresponding consensus state is the point of intersection. Using Helly's theorem, it is shown that the intersection is not empty at the time when all triplets of agents exhibit a non-empty intersection. Thus, a closed-form expression for the minimum time to consensus for a triplet of agents is obtained. The calculation of minimum time consensus for each of the triplets is performed independently and is distributed evenly among $N$ agents. The overall minimum time to consensus of $N$ agents is then given by the triplet that has the greatest minimum time to consensus. In the process, the set of initial conditions for agents from which consensus is possible under these input and fuel budget constraints is also characterized.
Coordination Control of Discrete Event Systems under Cyber Attacks
In this paper, coordination control of discrete event systems under joint sensor and actuator attacks is investigated. Sensor attacks are described by a set of attack languages using a proposed ALTER model. Several local supervisors are used to control the system. The goal is to design local supervisors to ensure safety of the system even under cyber attacks (CA). The necessary and sufficient conditions for the existence of such supervisors are derived in terms of conditional decomposability, CA-controllability and CA-observability. A method is developed to calculate local state estimates under sensor attacks. Two methods are also developed to design local supervisors, one for discrete event systems satisfying conditional decomposability, CA-controllability and CA-observability, and one for discrete event systems satisfying conditional decomposability only. The approach works for both stealthy and non-stealthy attacks. A practical example is given to illustrate the results.
comment: 23 pages, attack model was redefined, description was improved
Exploring Noncommutative Polynomial Equation Methods for Discrete-Time Quaternionic Control
We present new polynomial-based methods for discrete-time quaternionic systems, highlighting how noncommutative multiplication modifies classical control approaches. Defining quaternionic polynomials via a backward-shift operator, we examine left and right fraction representations of transfer functions, showing that right zeros correspond to similarity classes of quaternionic matrix right eigenvalues. We then propose a feedback design procedure that generalizes pole placement to quaternions - a first approach using a genuine quaternionic polynomial equation.
comment: Published in: 2025 33rd Mediterranean Conference on Control and Automation (MED), Tangier, Morocco, 2025. DOI: 10.1109/MED64031.2025.11073531. \c{opyright} 2025 IEEE. Personal use permitted; all other uses by permission of IEEE. Published version: https://doi.org/10.1109/MED64031.2025.11073531
TuneS: Patient-specific model-based optimization of contact configuration in deep brain stimulation
Objective: The objective of this study is to develop and evaluate a systematic approach to optimize Deep Brain Stimulation (DBS) parameters, addressing the challenge of identifying patient-specific settings and optimal stimulation targets for various neurological and mental disorders. Methods: TuneS, a novel pipeline to predict clinically optimal DBS contact configurations based on predefined targets and constraints, is introduced. The method relies upon patient-specific models of stimulation spread and extends optimization beyond traditional neural structures to include automated, model-based targeting of streamlines. Results: Initial findings show that both the STN motor subdivision and STN motor streamlines are consistently engaged under clinical settings, while regions of avoidance receive minimal stimulation. Given these findings, the value of model-based contact predictions for assessing stimulation targets while observing anatomical constraints is demonstrated at the example of a small cohort of Parkinson's disease patients. The predicted settings were generally found to achieve higher target coverages while providing a better trade-off between maximizing target coverage and minimizing stimulation of regions associated with side effects. Conclusion: TuneS shows promise as a research tool, enabling systematic assessment of DBS target effectiveness and facilitating constraint-aware optimization of stimulation parameters. Significance: The presented pipeline offers a pathway to improve patient-specific DBS therapies and contributes to the broader understanding of effective DBS targeting strategies.
comment: 8 pages, 9 figures, under review for IEEE Transactions on Biomedical Engineering
Scalable Two-Stage Stochastic Optimal Power Flow via Separable Approximation
This paper proposes a Separable Projective Approximation Routine-Optimal Power Flow (SPAR-OPF) framework for solving two-stage stochastic optimization problems in power systems. The framework utilizes a separable piecewise linear approximation of the value function and learns the function based on sample sub-gradient information. We present two formulations to model the learned value function, and compare their effectiveness. Additionally, an efficient statistical method is introduced to assess the quality of the obtained solutions. The effectiveness of the proposed framework is validated using distributed generation siting and sizing problem in three-phase unbalanced power distribution systems as an example. Results show that the framework approximates the value function with over 98% accuracy and provides high-quality solutions with an optimality gap of less than 1%. The framework scales efficiently with system size, generating high-quality solutions in a short time when applied to a 9500-node distribution system with 1200 scenarios, while the extensive formulations and progressive hedging failed to solve the problem.
Explicit Steady-State Approximations for Parallel Server Systems with Heterogeneous Servers
We study the steady-state performance of parallel-server systems under an immediate routing architecture with two sources of heterogeneity: servers and job classes, subject to compatibility constraints. We focus on the weighted-workload-task-allocation (WWTA) policy, a load-balancing scheme known to be throughput-optimal for such systems. Under a relaxed complete-resource-pooling (CRP) condition, we prove a "strong form" of state-space collapse in heavy traffic and that the scaled workload of each server converges in distribution to an exponential random variable, whose parameter is explicitly given by system primitives. Our analysis yields three main insights. First, the conventional heavy-traffic requirement of a unique static allocation plan can be dropped; a relaxed CRP condition suffices. Second, the limiting workload distribution is shown to be independent of local scheduling policy on server side, allowing substantial flexibility. Third, the inefficient (non-basic) activities prescribed by static allocation plan is proved to receive an asymptotically negligible fraction of routing and service, even though WWTA has no prior knowledge of which activities are basic, highlighting its robustness to changing arrival rates.
Universal Dynamics with Globally Controlled Analog Quantum Simulators
Analog quantum simulators with global control fields have emerged as powerful platforms for exploring complex quantum phenomena. Recent breakthroughs, such as the coherent control of thousands of atoms, highlight the growing potential for quantum applications at scale. Despite these advances, a fundamental theoretical question remains unresolved: to what extent can such systems realize universal quantum dynamics under global control? Here we establish a necessary and sufficient condition for universal quantum computation using only global pulse control, proving that a broad class of analog quantum simulators is, in fact, universal. We further extend this framework to fermionic and bosonic systems, including modern platforms such as ultracold atoms in optical superlattices. Crucially, to connect the theoretical possibility with experimental reality, we introduce a new control technique into the experiment - direct quantum optimal control. This method enables the synthesis of complex effective Hamiltonians and allows us to incorporate realistic hardware constraints. To show its practical power, we experimentally engineer three-body interactions outside the blockade regime and demonstrate topological dynamics on a Rydberg atom array. Using the new control framework, we overcome key experimental challenges, including hardware limitations and atom position fluctuations in the non-blockade regime, by identifying smooth, short-duration pulses that achieve high-fidelity dynamics. Experimental measurements reveal dynamical signatures of symmetry-protected-topological edge modes, confirming both the expressivity and feasibility of our approach. Our work opens a new avenue for quantum simulation beyond native hardware Hamiltonians, enabling the engineering of effective multi-body interactions and advancing the frontier of quantum information processing with globally-controlled analog platforms.
comment: 10 pages, 5 figures with Methods. HYH, AMG, and LC contributed equally to this work. Updated acknowledgement
Systematic Constraint Formulation and Collision-Free Trajectory Planning Using Space-Time Graphs of Convex Sets
In this paper, we create optimal, collision-free, time-dependent trajectories through cluttered dynamic environments. The many spatial and temporal constraints make finding an initial guess for a numerical solver difficult. Graphs of Convex Sets (GCS) and the recently developed Space-Time Graphs of Convex Sets (ST-GCS) enable us to generate minimum distance collision-free trajectories without providing an initial guess to the solver. We also explore the derivation of general GCS-compatible constraints and document an intuitive strategy for adapting general constraints to the framework. We show that ST-GCS produces equivalent trajectories to the standard GCS formulation when the environment is static, as well as globally optimal trajectories in cluttered dynamic environments.
comment: 16 pages with references, 20 figures
MESS+: Dynamically Learned Inference-Time LLM Routing in Model Zoos with Service Level Guarantees NeurIPS 2025
Open-weight large language model (LLM) zoos provide access to numerous high-quality models, but selecting the appropriate model for specific tasks remains challenging and requires technical expertise. Most users simply want factually correct, safe, and satisfying responses without concerning themselves with model technicalities, while inference service providers prioritize minimizing operating costs. These competing interests are typically mediated through service level agreements (SLAs) that guarantee minimum service quality. We introduce MESS+, a stochastic optimization algorithm for cost-optimal LLM request routing while providing rigorous SLA compliance guarantees. MESS+ learns request satisfaction probabilities of LLMs in real-time as users interact with the system, based on which model selection decisions are made by solving a per-request optimization problem. Our algorithm includes a novel combination of virtual queues and request satisfaction prediction, along with a theoretical analysis of cost optimality and constraint satisfaction. Across a wide range of state-of-the-art LLM benchmarks, MESS+ achieves an average of $2\times$ cost savings compared to existing LLM routing techniques.
comment: NeurIPS 2025. Code: https://github.com/laminair/mess-plus
Adaptive Policy Learning to Additional Tasks
This paper develops a policy learning method for tuning a pre-trained policy to adapt to additional tasks without altering the original task. A method named Adaptive Policy Gradient (APG) is proposed in this paper, which combines Bellman's principle of optimality with the policy gradient approach to improve the convergence rate. This paper provides theoretical analysis which guarantees the convergence rate and sample complexity of $\mathcal{O}(1/T)$ and $\mathcal{O}(1/\epsilon)$, respectively, where $T$ denotes the number of iterations and $\epsilon$ denotes the accuracy of the resulting stationary policy. Furthermore, several challenging numerical simulations, including cartpole, lunar lander, and robot arm, are provided to show that APG obtains similar performance compared to existing deterministic policy gradient methods while utilizing much less data and converging at a faster rate.
comment: The uploaded paper has technique issues, and we decide to withdraw it
Systems and Control (EESS)
Approximately Optimal Toll Design for Efficiency and Equity in Arc-Based Traffic Assignment Models
Congestion pricing policies have emerged as promising traffic management tools to alleviate traffic congestion caused by travelers' selfish routing behaviors. The core principle behind deploying tolls is to impose monetary costs on frequently overcrowded routes, to incentivize self-interested travelers to select less easily congested routes. Recent literature has focused on toll design based on arc-based traffic assignment models (TAMs), which characterize commuters as traveling through a traffic network by successively selecting an outgoing arc from every intermediate node along their journey. However, existing tolling mechanisms predicated on arc-based TAMs often target the design of a single congestion-minimizing toll, ignoring crucial fairness considerations, such as the financial impact of high congestion fees on low-income travelers. To address these shortcomings, in this paper, we pose the dual considerations of efficiency and equity in traffic routing as bilevel optimization problems. Since such problems are in general computationally intractable to solve precisely, we construct a linear program approximation by introducing a polytope approximation for the set of all tolls that induce congestion-minimizing traffic flow patterns. Finally, we provide numerical results that validate our theoretical conclusions.
Adaptive Event-Triggered Policy Gradient for Multi-Agent Reinforcement Learning
Conventional multi-agent reinforcement learning (MARL) methods rely on time-triggered execution, where agents sample and communicate actions at fixed intervals. This approach is often computationally expensive and communication-intensive. To address this limitation, we propose ET-MAPG (Event-Triggered Multi-Agent Policy Gradient reinforcement learning), a framework that jointly learns an agent's control policy and its event-triggering policy. Unlike prior work that decouples these mechanisms, ET-MAPG integrates them into a unified learning process, enabling agents to learn not only what action to take but also when to execute it. For scenarios with inter-agent communication, we introduce AET-MAPG, an attention-based variant that leverages a self-attention mechanism to learn selective communication patterns. AET-MAPG empowers agents to determine not only when to trigger an action but also with whom to communicate and what information to exchange, thereby optimizing coordination. Both methods can be integrated with any policy gradient MARL algorithm. Extensive experiments across diverse MARL benchmarks demonstrate that our approaches achieve performance comparable to state-of-the-art, time-triggered baselines while significantly reducing both computational load and communication overhead.
Adversarial Pursuits in Cislunar Space
Cislunar space is becoming a critical domain for future lunar and interplanetary missions, yet its remoteness, sparse infrastructure, and unstable dynamics create single points of failure. Adversaries in cislunar orbits can exploit these vulnerabilities to pursue and jam co-located communication relays, potentially severing communications between lunar missions and the Earth. We study a pursuit-evasion scenario between two spacecraft in a cislunar orbit, where the evader must avoid a pursuer-jammer while remaining close to its nominal trajectory. We model the evader-pursuer interaction as a zero-sum adversarial differential game cast in the circular restricted three-body problem. This formulation incorporates critical aspects of cislunar orbital dynamics, including autonomous adjustment of the reference orbit phasing to enable aggressive evading maneuvers, and shaping of the evader's cost with the orbit's stable and unstable manifolds. We solve the resulting nonlinear game locally using a continuous-time differential dynamic programming variant, which iteratively applies linear-quadratic approximations to the Hamilton-Jacobi-Isaacs equation. We simulate the evader's behavior against both a worst-case and a linear-quadratic pursuer. Our results pave the way for securing future missions in cislunar space against emerging cyber threats.
comment: 17 pages, 9 figures
On Robustness of Consensus over Pseudo-Undirected Path Graphs
Consensus over networked agents is typically studied using undirected or directed communication graphs. Undirected graphs enforce symmetry in information exchange, leading to convergence to the average of initial states, while directed graphs permit asymmetry but make consensus dependent on root nodes and their influence. Both paradigms impose inherent restrictions on achievable consensus values and network robustness. This paper introduces a theoretical framework for achieving consensus over a class of network topologies, termed pseudo-undirected graphs, which retains bidirectional connectivity between node pairs but allows the corresponding edge weights to differ, including the possibility of negative values under bounded conditions. The resulting Laplacian is generally non-symmetric, yet it guarantees consensus under connectivity assumptions, to expand the solution space, which enables the system to achieve a stable consensus value that can lie outside the convex hull of the initial state set. We derive admissibility bounds for negative weights for a pseudo-undirected path graph, and show an application in the simultaneous interception of a moving target.
Certified Learning-Enabled Noise-Aware Motion Planning for Urban Air Mobility
Urban Air Mobility (UAM) has emerged as a promising solution to alleviate urban congestion and transportation challenges. Nevertheless, the noise generated by eVTOL aircrafts poses a significant barrier to public acceptance and regulatory approval, potentially limiting the operational scope and scalability of UAM systems. Hence, the successful adoption of UAM systems hinges on the ability to predict generated noise levels, and further develop motion planning strategies that comply with community-level noise regulations while maintaining operational efficiency. To this end, this paper proposes a novel noise-aware motion planning framework for UAM systems that ensures compliance with noise regulations. We first develop a certifiable neural network model to accurately predict eVTOL noise propagation patterns in urban environments, providing provable bounds on its correctness. To achieve a desired level of accuracy, we propose an active sampling strategy to efficiently build the dataset used to train and test the noise model. Next, we develop a noise-aware motion planning algorithm that utilizes the noise model to generate eVTOL trajectories that guarantee compliance with community noise regulations. The algorithm exploits the monotonic structure of the noise model to efficiently sample the configuration space, ensuring that the generated trajectories are both noise-compliant and operationally efficient. We demonstrate the effectiveness of the proposed framework through a number of experiments for Vahana eVTOLs. The results show that the framework can generate noise-compliant flight plans for a fleet of eVTOLs that adhere to community noise regulations while optimizing operational efficiency.
comment: 16 pages, 10 figures
Choose Your Battles: Distributed Learning Over Multiple Tug of War Games
Consider N players and K games taking place simultaneously. Each of these games is modeled as a Tug-of-War (ToW) game where increasing the action of one player decreases the reward for all other players. Each player participates in only one game at any given time. At each time step, a player decides the game in which they wish to participate in and the action they take in that game. Their reward depends on the actions of all players that are in the same game. This system of K games is termed `Meta Tug-of-War' (Meta-ToW) game. These games can model scenarios such as power control, distributed task allocation, and activation in sensor networks. We propose the Meta Tug-of-Peace algorithm, a distributed algorithm where the action updates are done using a simple stochastic approximation algorithm, and the decision to switch games is made using an infrequent 1-bit communication between the players. We prove that in Meta-ToW games, our algorithm converges to an equilibrium that satisfies a target Quality of Service reward vector for the players. We then demonstrate the efficacy of our algorithm through simulations for the scenarios mentioned above.
comment: Submitted to IEEE TAC
Koopman-Operator-Based Model Predictive Control for Drag-free Satellite
This paper presents a data-driven modelling method for nonlinear dynamics of drag-free satellite based on Koopman operator theory, and a model predictive controller is designed based on the identified model. The nonlinear dynamics of drag-free satellite are identified and controlled based on Sparse Identification of Nonlinear Dynamics (SINDy). Using the manually constructed nonlinear function dictionary as observables, the system approximation is obtained by SINDy algorithm, and a linear Model Predictive Control (MPC) controller is designed for test mass capture based on the SINDy model. Finally, the effectiveness of MPC control is verified by numerical examples.
Orbital Stabilization and Time Synchronization of Unstable Periodic Motions in Underactuated Robots
This paper presents a control methodology for achieving orbital stabilization with simultaneous time synchronization of periodic trajectories in underactuated robotic systems. The proposed approach extends the classical transverse linearization framework to explicitly incorporate time-desynchronization dynamics. To stabilize the resulting extended transverse dynamics, we employ a combination of time-varying LQR and sliding-mode control. The theoretical results are validated experimentally through the implementation of both centralized and decentralized control strategies on a group of six Butterfly robots.
Distributed Koopman Operator Learning from Sequential Observations
This paper presents a distributed Koopman operator learning framework for modeling unknown nonlinear dynamics using sequential observations from multiple agents. Each agent estimates a local Koopman approximation based on lifted data and collaborates over a communication graph to reach exponential consensus on a consistent distributed approximation. The approach supports distributed computation under asynchronous and resource-constrained sensing. Its performance is demonstrated through simulation results, validating convergence and predictive accuracy under sensing-constrained scenarios and limited communication.
Voltage-sensitive distribution factors for contingency analysis and topology optimization
Topology optimization is a promising approach for mitigating congestion and managing changing grid conditions, but it is computationally challenging and requires approximations. Conventional distribution factors like PTDFs and LODFs, based on DC power flow, fail to capture voltage variations, reactive power, and losses - limiting their use in detailed optimization tasks such as busbar splitting. This paper introduces generalized distribution factors derived from a voltage-sensitive linearization of the full AC power flow equations. The proposed formulation accurately reflects reactive power flows, Ohmic losses, and voltage deviations while remaining computationally efficient. We derive and evaluate generalized PTDFs, LODFs, and topology modification factors using matrix identities. We discuss potential applications including voltage-aware N-1 security analysis, and topology optimization with a focus on busbar splitting. Numerical experiments demonstrate close agreement with full AC solutions, significantly outperforming the traditional DC approximation.
comment: 7 pages, 4 figures
Control and Navigation of a 2-D Electric Rocket
This work addresses the control and navigation of a simulated two-dimensional electric rocket. The model provides a simplified framework that neglects actuator dynamics and aerodynamic effects while capturing the complexities of underactuation and state coupling. Trajectory tracking is achieved through a modularized and layered control architecture, with employement of a Linear Quadratic Regulator (LQR) and Lyapunov theory. Full-state estimation is achieved through Kalman filtering techniques, part of the navigation module. The solutions are thoroughly evaluated in a custom-built MATLAB/Simulink testbed, simulating real-world conditions while maintaining a simplified setup. The results reveal limitations along the lateral axis, whose resolution is suggested for future work.
Modeling and Control of Deep Sign-Definite Dynamics with Application to Hybrid Powertrain Control
Deep learning is increasingly used for complex, large-scale systems where first-principles modeling is difficult. However, standard deep learning models often fail to enforce physical structure or preserve convexity in downstream control, leading to physically inconsistent predictions and discontinuous inputs owing to nonconvexity. We introduce sign constraints--sign restrictions on Jacobian entries--that unify monotonicity, positivity, and sign-definiteness; additionally, we develop model-construction methods that enforce them, together with a control-synthesis procedure. In particular, we design exactly linearizable deep models satisfying these constraints and formulate model predictive control as a convex quadratic program, which yields a unique optimizer and a Lipschitz continuous control law. On a two-tank system and a hybrid powertrain, the proposed approach improves prediction accuracy and produces smoother control inputs than existing methods.
comment: Submitted to Automatica
Scalable and Approximation-free Symbolic Control for Unknown Euler-Lagrange Systems
We propose a novel symbolic control framework for enforcing temporal logic specifications in Euler-Lagrange systems that addresses the key limitations of traditional abstraction-based approaches. Unlike existing methods that require exact system models and provide guarantees only at discrete sampling instants, our approach relies only on bounds on system parameters and input constraints, and ensures correctness for the full continuous-time trajectory. The framework combines scalable abstraction of a simplified virtual system with a closed-form, model-free controller that guarantees trajectories satisfy the original specification while respecting input bounds and remaining robust to unknown but bounded disturbances. We provide feasibility conditions for the construction of confinement regions and analyze the trade-off between efficiency and conservatism. Case studies on pendulum dynamics, a two-link manipulator, and multi-agent systems, including hardware experiments, demonstrate that the proposed approach ensures both correctness and safety while significantly reducing computational time and memory requirements. These results highlight its scalability and practicality for real-world robotic systems where precise models are unavailable and continuous-time guarantees are essential.
CollaPipe: Adaptive Segment-Optimized Pipeline Parallelism for Collaborative LLM Training in Heterogeneous Edge Networks
The increasing demand for intelligent mobile applications has made multi-agent collaboration with Transformer-based large language models (LLMs) essential in mobile edge computing (MEC) networks. However, training LLMs in such environments remains challenging due to heavy computation, high end-to-end latency, and limited model generalization. We introduce CollaPipe, a hybrid distributed learning framework that integrates collaborative pipeline parallelism with federated aggregation to support self-evolving intelligent networks. In CollaPipe, the encoder part is adaptively partitioned into variable-sized segments and deployed across mobile devices for pipeline-parallel training, while the decoder is deployed on edge servers to handle generative tasks. Then we perform global model update via federated aggregation. To enhance training efficiency, we formulate a joint optimization problem that adaptively allocates model segments, micro-batches, bandwidth, and transmission power. We derive and use a closed-form convergence bound to design an Dynamic Segment Scheduling and Resource Allocation (DSSDA) algorithm based on Lyapunov optimization, ensuring system stability under long-term constraints. Extensive experiments on downstream tasks with Transformer and BERT models show that CollaPipe improves computation efficiency by up to 15.09%, reduces end-to-end latency by at least 48.98%, and cuts single device memory usage by more than half, enabling online learning in heterogeneous and dynamic communication environments.
comment: Submitted to IEEE for review
An early termination strategy for the distributed biased min-consensus protocol under disturbances SC 2025
The distributed biased min-consensus (DBMC) protocol is an iterative scheme that solves the shortest path problem asymptotically, requiring only local information exchange between neighboring nodes. By appropriately designing the gain function, prior work [1] proposed a DBMC-based system that ensures convergence within a pre-specified time interval. However, this guarantee assumes the absence of disturbances. In this paper, we study the DBMC-based system under disturbances affecting the edge weights. We first establish rigorous error bounds on the resulting state estimates. Building on this analysis, we then propose a practical early termination strategy to prevent potential singularities, specifically, unbounded gain, that may arise in the presence of disturbances, while still ensuring that the shortest paths are correctly identified.Simulations are performed to validate and illustrate the theoretical results.
comment: paper accepted to IEEE ICNSC 2025
Zonotope-Based Elastic Tube Model Predictive Control
Tube-based Model Predictive Control (MPC) is a widely adopted robust control framework for constrained linear systems under additive disturbance. The paper is focused on reducing the numerical complexity associated with the tube parameterization, described as a sequence of elastically-scaled zonotopic sets. A new class of scaled-zonotope inclusion conditions is proposed, alleviating the need for a priori specification of certain set-containment constraints and achieving significant reductions in complexity. A comprehensive complexity analysis is provided for both the polyhedral and the zonotopic setting, illustrating the trade-off between an enlarged domain of attraction and the required computational effort. The proposed approach is validated through extensive numerical experiments.
Dispersion Formation Control: from Geometry to Distribution
We introduce and develop the concept of dispersion formation control, bridging a gap between shape-assembly studies in physics and biology and formation control theory. In current formation control studies, the control objectives typically focus on achieving desired local geometric properties, such as inter-agent distances, bearings, or relative positions. In contrast, our dispersion formation control approach enables agents to directly regulate the dispersion of their spatial distribution, a global variable associated with a covariance matrix. Specifically, we introduce the notion of covariance similarity to define the target spatial dispersion of agents. Building on this framework, we propose two control strategies: a centralized approach to illustrate the key ideas, and a distributed approach that enables agents to control the global dispersion but using only local information. Our stability analysis demonstrates that both strategies ensure exponential convergence of the agents' distribution to the desired dispersion. Notably, controlling a global variable rather than multiple local ones enhances the resiliency of the system, particularly against malfunctioning agents. Simulations validate the effectiveness of the proposed dispersion formation control.
comment: 8 pages, 4 figures
Formal Safety Verification and Refinement for Generative Motion Planners via Certified Local Stabilization
We present a method for formal safety verification of learning-based generative motion planners. Generative motion planners (GMPs) offer advantages over traditional planners, but verifying the safety and dynamic feasibility of their outputs is difficult since neural network verification (NNV) tools scale only to a few hundred neurons, while GMPs often contain millions. To preserve GMP expressiveness while enabling verification, our key insight is to imitate the GMP by stabilizing references sampled from the GMP with a small neural tracking controller and then applying NNV to the closed-loop dynamics. This yields reachable sets that rigorously certify closed-loop safety, while the controller enforces dynamic feasibility. Building on this, we construct a library of verified GMP references and deploy them online in a way that imitates the original GMP distribution whenever it is safe to do so, improving safety without retraining. We evaluate across diverse planners, including diffusion, flow matching, and vision-language models, improving safety in simulation (on ground robots and quadcopters) and on hardware (differential-drive robot).
comment: 10 pages, 12 figures
SoK: A Systematic Review of Malware Ontologies and Taxonomies and Implications for the Quantum Era
The threat of quantum malware is real and a growing security concern that will have catastrophic scientific and technological impacts, if not addressed early. If weaponised or exploited especially by the wrong hands, malware will undermine highly sophisticated critical systems supported by next-generation quantum architectures, for example, in defence, communications, energy, and space. This paper explores the fundamental nature and implications of quantum malware to enable the future development of appropriate mitigations and defences, thereby protecting critical infrastructure. By conducting a systematic literature review (SLR) that draws on knowledge frameworks such as ontologies and taxonomies to explore malware, this provides insights into how malicious behaviours can be translated into attacks on quantum technologies, thereby providing a lens to analyse the severity of malware against quantum technologies. This study employs the European Competency Framework for Quantum Technologies (CFQT) as a guide to map malware behaviour to several competency layers, creating a foundation in this emerging field.
comment: 40 pages, 9 figures, 5 tables
Training Task Reasoning LLM Agents for Multi-turn Task Planning via Single-turn Reinforcement Learning
Large Language Models (LLMs) have demonstrated remarkable capabilities in knowledge acquisition, reasoning, and tool use, making them promising candidates for autonomous agent applications. However, training LLM agents for complex multi-turn task planning faces significant challenges, including sparse episode-wise rewards, credit assignment across long horizons, and the computational overhead of reinforcement learning in multi-turn interaction settings. To this end, this paper introduces a novel approach that transforms multi-turn task planning into single-turn task reasoning problems, enabling efficient policy optimization through Group Relative Policy Optimization (GRPO) with dense and verifiable reward from expert trajectories. Our theoretical analysis shows that GRPO improvement on single-turn task reasoning results in higher multi-turn success probability under the minimal turns, as well as the generalization to subtasks with shorter horizons. Experimental evaluation on the complex task planning benchmark demonstrates that our 1.5B parameter model trained with single-turn GRPO achieves superior performance compared to larger baseline models up to 14B parameters, with success rates of 70% for long-horizon planning tasks with over 30 steps. We also theoretically and empirically validate the strong cross-task generalizability that the models trained on complex tasks can lead to the successful completion of all simpler subtasks.
Data-Driven State Observers for Measure-Preserving Systems
The increasing use of data-driven control strategies gives rise to the problem of learning-based state observation. Motivated by this need, the present work proposes a data-driven approach for the synthesis of state observers for discrete-time nonlinear systems with measure-preserving dynamics. To this end, Kazantzis-Kravaris/Luenburger (KKL) observers are shown to be well-defined, where the observer design boils down to determining a nonlinear injective mapping of states and its pseudo-inverse. For its learning-based construction, the KKL observer is related to the Koopman and Perron-Frobenius operators, defined on a Sobolev-type reproducing kernel Hilbert space (RKHS) on which they are shown to be normal operators and thus have a spectral resolution. Hence, observer synthesis algorithms, based on kernel interpolation/regression routines for the desired injective mapping in the observer and its pseudo-inverse, have been proposed in various settings of available dataset -- (i) many orbits, (ii) single long orbit, and (iii) snapshots. Theoretical error analyses are provided, and numerical studies on a chaotic Lorenz system are demonstrated.
comment: 40 pages, 11 figures, submitted to Journal of Nonlinear Science
PIRF: Physics-Informed Reward Fine-Tuning for Diffusion Models NeurIPS 2025
Diffusion models have demonstrated strong generative capabilities across scientific domains, but often produce outputs that violate physical laws. We propose a new perspective by framing physics-informed generation as a sparse reward optimization problem, where adherence to physical constraints is treated as a reward signal. This formulation unifies prior approaches under a reward-based paradigm and reveals a shared bottleneck: reliance on diffusion posterior sampling (DPS)-style value function approximations, which introduce non-negligible errors and lead to training instability and inference inefficiency. To overcome this, we introduce Physics-Informed Reward Fine-tuning (PIRF), a method that bypasses value approximation by computing trajectory-level rewards and backpropagating their gradients directly. However, a naive implementation suffers from low sample efficiency and compromised data fidelity. PIRF mitigates these issues through two key strategies: (1) a layer-wise truncated backpropagation method that leverages the spatiotemporally localized nature of physics-based rewards, and (2) a weight-based regularization scheme that improves efficiency over traditional distillation-based methods. Across five PDE benchmarks, PIRF consistently achieves superior physical enforcement under efficient sampling regimes, highlighting the potential of reward fine-tuning for advancing scientific generative modeling.
comment: 18 pages, 6 figures; NeurIPS 2025 AI for science workshop
Adaptive Altitude Control of a Tethered Multirotor Autogyro under Varying Wind Speeds using Differential Rotor Braking
A tethered multirotor autogyro can function as an unmanned aerial vehicle for energy-efficient and prolonged deployment, as it uses the available wind energy to sustain flight. This article presents an adaptive altitude control strategy for such a device. At a constant wind speed, the equilibrium altitude can be approximated by a quadratic function of the pitch angle. The proposed adaptive control estimates the coefficients of this quadratic function. The estimates are used for altitude control and to attain the maximum altitude (and minimum horizontal drift) for a given wind speed. A feedback controller based on regenerative differential rotor braking is used as the actuation to modulate the autogyro's pitch angle. Implementation of the controller using a control-oriented, higher-order dynamic model demonstrates the controller's capability to regulate the altitude and maintain stable flights under varying wind speeds. Based on the system's maximum altitude tracking performance, the adaptive control is adjusted to improve performance under substantial changes in wind speeds.
A Crime/S.I.R. optimal control problem
This paper presents and discusses a mathematical model inspired by control theory to derive optimal public policies for minimizing costs associated with the reduction and control of criminal activity in a population. Specifically, we analyze the optimal control problem \begin{equation*} \min G(u_1, u_2, u_3) = \int_{0}^{t_{\text{F}}} \left( I(t) - R(t) + \frac{B_1}{2} u_1^2(t) + \frac{B_2}{2} u_2^2(t) + \frac{B_3}{2} u_3^2(t) \right) \, dt. \end{equation*} where $I=I(t)$ and $R=R(t)$ satisfies the system of equations \begin{equation*} \left\{ \begin{aligned} \dot{S} &= \Lambda - (1-u_1)SI - \mu S + ((1+u_3)\gamma_2)I + \rho \Omega R,\\ \dot{I} &= (1-u_1)SI - (\mu + \delta_1)I - ((1+u_2)\gamma_1)I - ((1+u_3)\gamma_2)I + (1-\Omega)\rho R,\\ \dot{R} &= ((1+u_2)\gamma_1)I - (\mu + \delta_2 + \rho)R. \end{aligned} \right. \end{equation*} Our approach assumes that the social and economic effects of criminal behavior can be modeled by a dynamic SIR-type system, which serves as a constraint on a cost functional associated with the strategies implemented by government and law enforcement authorities to reduce criminal behavior. Using optimal control theory, the proposed controls, i.e., preventive policies (such as community and social cohesion programs), are expected to have a significant and positive impact on crime reduction, generating opportunities for the most disadvantaged sectors of Cali society and contributing to long-term security. Given that resources to address this problem are limited, this research aims to determine an optimal combination of public interventions and policies that minimize criminality at the lowest possible economic cost, using an SIR model, tools from variational calculus, and optimal control theory.
Probability-One Optimization of Generalized Rayleigh Quotient Sum For Multi-Source Generalized Total Least-Squares
This paper addresses the global optimization of the sum of the Rayleigh quotient and the generalized Rayleigh quotient on the unit sphere. While various methods have been proposed for this problem, they fail to reliably converge to the global maximizer. To overcome this limitation, we propose an extension of the Riemannian Trust Region algorithm based on the probability-one homotopy optimization method, which enhances convergence to a global maximizer and, under certain conditions, ensures convergence to the global maximizer. In addition to the proposed method, existing state-of-the-art approaches are also presented, along with an explanation of their limitations and their connection to the proposed method. The proposed method is evaluated alongside the state-of-the-art approaches through numerical experiments, assessing convergence speed, success in reaching the global maximizer, and scalability with increasing problem dimension. Furthermore, we demonstrate how this ties in with the multi-source Bayesian Generalized Total Least-Squares (B-GTLS) problem, illustrating its applicability.
comment: This is the preprint version prior to peer review
Data-fused Model Predictive Control with Guarantees: Application to Flying Humanoid Robots
This paper introduces a Data-Fused Model Predictive Control (DFMPC) framework that combines physics-based models with data-driven representations of unknown dynamics. Leveraging Willems' Fundamental Lemma and an artificial equilibrium formulation, the method enables tracking of changing, potentially unreachable setpoints while explicitly handling measurement noise through slack variables and regularization. We provide guarantees of recursive feasibility and practical stability under input-output constraints for a specific class of reference signals. The approach is validated on the iRonCub flying humanoid robot, integrating analytical momentum models with data-driven turbine dynamics. Simulations show improved tracking and robustness compared to a purely model-based MPC, while maintaining real-time feasibility.
comment: 8 pages, 3 figures
MAPPO for Edge Server Monitoring
In this paper, we consider a goal-oriented communication problem for edge server monitoring, where jobs arrive intermittently at multiple dispatchers and must be assigned to shared edge servers with finite queues and time-varying availability. Accurate knowledge of server status is critical for sustaining high throughput, yet remains challenging under dynamic workloads and partial observability. To address this challenge, each dispatcher maintains server knowledge through two complementary mechanisms: (i) active status queries that provide instantaneous updates at a communication cost, and (ii) job execution feedback that reveals server conditions opportunistically. We formulate a cooperative multi-agent distributed decision-making problem in which dispatchers jointly optimize query scheduling to balance throughput against communication overhead. To solve this problem, we propose a Multi-Agent Proximal Policy Optimization (MAPPO)-based algorithm that leverages centralized training with decentralized execution (CTDE) to learn distributed query-and-dispatch policies under partial and stale observations. Numerical evaluations show that MAPPO achieves superior throughput-cost tradeoffs and significantly outperforms baseline strategies, achieving on average a 30% improvement over the closest baseline.
comment: 6 pages, 4 figures. Accepted to IEEE MILCOM 2025
Unlocking Transmission Flexibility under Uncertainty: Getting Dynamic Line Ratings into Electricity Markets
Static transmission line ratings may lead to underutilization of line capacity due to overly conservative assumptions. Grid-enhancing technologies (GETs) such as dynamic line ratings (DLRs), which adjust line capacity based on real-time conditions, are a techno-economically viable alternative to increase the utilization of existing power lines. Nonetheless, their adoption has been slow, partly due to the absence of operational tools that effectively account for simultaneous impacts on dispatch and pricing. In this paper, we represent transmission capacity with DLRs as a stock-like resource with time-variant interdependency, which is modeled via an approximation of line temperature evolution process, decoupling the impacts of ambient weather conditions and power flow on transmission line temperature and thus capacity. We integrate DLRs into a multi-period DC optimal power flow problem, with chance constrains addressing correlated uncertainty in DLRs and renewable generation. This yields non-convex problems that we transform into a tractable convex form by linearization. We derive locational marginal energy and ancillary services prices consistent with a competitive equilibrium. Numerical experiments on the 11-zone and 1814-node NYISO systems demonstrate its performance, including impacts on dispatch, pricing, and marginal carbon emissions.
Minimum Time Consensus of Multi-agent System under Fuel Constraints
This work addresses the problem of finding minimum time consensus point in the state space for a set of $N$ identical double integrator agents with bounded inputs and fixed fuel budget constraint. To address the problem, characterization of the attainable set for each agent subject to bounded inputs and fixed fuel budget constraints is done. Such attainable set is shown to be a convex set. The minimum time to consensus is the least time when the attainable sets of all agents intersect and the corresponding consensus state is the point of intersection. Using Helly's theorem, it is shown that the intersection is not empty at the time when all triplets of agents exhibit a non-empty intersection. Thus, a closed-form expression for the minimum time to consensus for a triplet of agents is obtained. The calculation of minimum time consensus for each of the triplets is performed independently and is distributed evenly among $N$ agents. The overall minimum time to consensus of $N$ agents is then given by the triplet that has the greatest minimum time to consensus. In the process, the set of initial conditions for agents from which consensus is possible under these input and fuel budget constraints is also characterized.
Coordination Control of Discrete Event Systems under Cyber Attacks
In this paper, coordination control of discrete event systems under joint sensor and actuator attacks is investigated. Sensor attacks are described by a set of attack languages using a proposed ALTER model. Several local supervisors are used to control the system. The goal is to design local supervisors to ensure safety of the system even under cyber attacks (CA). The necessary and sufficient conditions for the existence of such supervisors are derived in terms of conditional decomposability, CA-controllability and CA-observability. A method is developed to calculate local state estimates under sensor attacks. Two methods are also developed to design local supervisors, one for discrete event systems satisfying conditional decomposability, CA-controllability and CA-observability, and one for discrete event systems satisfying conditional decomposability only. The approach works for both stealthy and non-stealthy attacks. A practical example is given to illustrate the results.
comment: 23 pages, attack model was redefined, description was improved
Exploring Noncommutative Polynomial Equation Methods for Discrete-Time Quaternionic Control
We present new polynomial-based methods for discrete-time quaternionic systems, highlighting how noncommutative multiplication modifies classical control approaches. Defining quaternionic polynomials via a backward-shift operator, we examine left and right fraction representations of transfer functions, showing that right zeros correspond to similarity classes of quaternionic matrix right eigenvalues. We then propose a feedback design procedure that generalizes pole placement to quaternions - a first approach using a genuine quaternionic polynomial equation.
comment: Published in: 2025 33rd Mediterranean Conference on Control and Automation (MED), Tangier, Morocco, 2025. DOI: 10.1109/MED64031.2025.11073531. \c{opyright} 2025 IEEE. Personal use permitted; all other uses by permission of IEEE. Published version: https://doi.org/10.1109/MED64031.2025.11073531
TuneS: Patient-specific model-based optimization of contact configuration in deep brain stimulation
Objective: The objective of this study is to develop and evaluate a systematic approach to optimize Deep Brain Stimulation (DBS) parameters, addressing the challenge of identifying patient-specific settings and optimal stimulation targets for various neurological and mental disorders. Methods: TuneS, a novel pipeline to predict clinically optimal DBS contact configurations based on predefined targets and constraints, is introduced. The method relies upon patient-specific models of stimulation spread and extends optimization beyond traditional neural structures to include automated, model-based targeting of streamlines. Results: Initial findings show that both the STN motor subdivision and STN motor streamlines are consistently engaged under clinical settings, while regions of avoidance receive minimal stimulation. Given these findings, the value of model-based contact predictions for assessing stimulation targets while observing anatomical constraints is demonstrated at the example of a small cohort of Parkinson's disease patients. The predicted settings were generally found to achieve higher target coverages while providing a better trade-off between maximizing target coverage and minimizing stimulation of regions associated with side effects. Conclusion: TuneS shows promise as a research tool, enabling systematic assessment of DBS target effectiveness and facilitating constraint-aware optimization of stimulation parameters. Significance: The presented pipeline offers a pathway to improve patient-specific DBS therapies and contributes to the broader understanding of effective DBS targeting strategies.
comment: 8 pages, 9 figures, under review for IEEE Transactions on Biomedical Engineering
Scalable Two-Stage Stochastic Optimal Power Flow via Separable Approximation
This paper proposes a Separable Projective Approximation Routine-Optimal Power Flow (SPAR-OPF) framework for solving two-stage stochastic optimization problems in power systems. The framework utilizes a separable piecewise linear approximation of the value function and learns the function based on sample sub-gradient information. We present two formulations to model the learned value function, and compare their effectiveness. Additionally, an efficient statistical method is introduced to assess the quality of the obtained solutions. The effectiveness of the proposed framework is validated using distributed generation siting and sizing problem in three-phase unbalanced power distribution systems as an example. Results show that the framework approximates the value function with over 98% accuracy and provides high-quality solutions with an optimality gap of less than 1%. The framework scales efficiently with system size, generating high-quality solutions in a short time when applied to a 9500-node distribution system with 1200 scenarios, while the extensive formulations and progressive hedging failed to solve the problem.
Explicit Steady-State Approximations for Parallel Server Systems with Heterogeneous Servers
We study the steady-state performance of parallel-server systems under an immediate routing architecture with two sources of heterogeneity: servers and job classes, subject to compatibility constraints. We focus on the weighted-workload-task-allocation (WWTA) policy, a load-balancing scheme known to be throughput-optimal for such systems. Under a relaxed complete-resource-pooling (CRP) condition, we prove a "strong form" of state-space collapse in heavy traffic and that the scaled workload of each server converges in distribution to an exponential random variable, whose parameter is explicitly given by system primitives. Our analysis yields three main insights. First, the conventional heavy-traffic requirement of a unique static allocation plan can be dropped; a relaxed CRP condition suffices. Second, the limiting workload distribution is shown to be independent of local scheduling policy on server side, allowing substantial flexibility. Third, the inefficient (non-basic) activities prescribed by static allocation plan is proved to receive an asymptotically negligible fraction of routing and service, even though WWTA has no prior knowledge of which activities are basic, highlighting its robustness to changing arrival rates.
Universal Dynamics with Globally Controlled Analog Quantum Simulators
Analog quantum simulators with global control fields have emerged as powerful platforms for exploring complex quantum phenomena. Recent breakthroughs, such as the coherent control of thousands of atoms, highlight the growing potential for quantum applications at scale. Despite these advances, a fundamental theoretical question remains unresolved: to what extent can such systems realize universal quantum dynamics under global control? Here we establish a necessary and sufficient condition for universal quantum computation using only global pulse control, proving that a broad class of analog quantum simulators is, in fact, universal. We further extend this framework to fermionic and bosonic systems, including modern platforms such as ultracold atoms in optical superlattices. Crucially, to connect the theoretical possibility with experimental reality, we introduce a new control technique into the experiment - direct quantum optimal control. This method enables the synthesis of complex effective Hamiltonians and allows us to incorporate realistic hardware constraints. To show its practical power, we experimentally engineer three-body interactions outside the blockade regime and demonstrate topological dynamics on a Rydberg atom array. Using the new control framework, we overcome key experimental challenges, including hardware limitations and atom position fluctuations in the non-blockade regime, by identifying smooth, short-duration pulses that achieve high-fidelity dynamics. Experimental measurements reveal dynamical signatures of symmetry-protected-topological edge modes, confirming both the expressivity and feasibility of our approach. Our work opens a new avenue for quantum simulation beyond native hardware Hamiltonians, enabling the engineering of effective multi-body interactions and advancing the frontier of quantum information processing with globally-controlled analog platforms.
comment: 10 pages, 5 figures with Methods. HYH, AMG, and LC contributed equally to this work. Updated acknowledgement
Systematic Constraint Formulation and Collision-Free Trajectory Planning Using Space-Time Graphs of Convex Sets
In this paper, we create optimal, collision-free, time-dependent trajectories through cluttered dynamic environments. The many spatial and temporal constraints make finding an initial guess for a numerical solver difficult. Graphs of Convex Sets (GCS) and the recently developed Space-Time Graphs of Convex Sets (ST-GCS) enable us to generate minimum distance collision-free trajectories without providing an initial guess to the solver. We also explore the derivation of general GCS-compatible constraints and document an intuitive strategy for adapting general constraints to the framework. We show that ST-GCS produces equivalent trajectories to the standard GCS formulation when the environment is static, as well as globally optimal trajectories in cluttered dynamic environments.
comment: 16 pages with references, 20 figures
MESS+: Dynamically Learned Inference-Time LLM Routing in Model Zoos with Service Level Guarantees NeurIPS 2025
Open-weight large language model (LLM) zoos provide access to numerous high-quality models, but selecting the appropriate model for specific tasks remains challenging and requires technical expertise. Most users simply want factually correct, safe, and satisfying responses without concerning themselves with model technicalities, while inference service providers prioritize minimizing operating costs. These competing interests are typically mediated through service level agreements (SLAs) that guarantee minimum service quality. We introduce MESS+, a stochastic optimization algorithm for cost-optimal LLM request routing while providing rigorous SLA compliance guarantees. MESS+ learns request satisfaction probabilities of LLMs in real-time as users interact with the system, based on which model selection decisions are made by solving a per-request optimization problem. Our algorithm includes a novel combination of virtual queues and request satisfaction prediction, along with a theoretical analysis of cost optimality and constraint satisfaction. Across a wide range of state-of-the-art LLM benchmarks, MESS+ achieves an average of $2\times$ cost savings compared to existing LLM routing techniques.
comment: NeurIPS 2025. Code: https://github.com/laminair/mess-plus
Adaptive Policy Learning to Additional Tasks
This paper develops a policy learning method for tuning a pre-trained policy to adapt to additional tasks without altering the original task. A method named Adaptive Policy Gradient (APG) is proposed in this paper, which combines Bellman's principle of optimality with the policy gradient approach to improve the convergence rate. This paper provides theoretical analysis which guarantees the convergence rate and sample complexity of $\mathcal{O}(1/T)$ and $\mathcal{O}(1/\epsilon)$, respectively, where $T$ denotes the number of iterations and $\epsilon$ denotes the accuracy of the resulting stationary policy. Furthermore, several challenging numerical simulations, including cartpole, lunar lander, and robot arm, are provided to show that APG obtains similar performance compared to existing deterministic policy gradient methods while utilizing much less data and converging at a faster rate.
comment: The uploaded paper has technique issues, and we decide to withdraw it
Robotics
Residual Off-Policy RL for Finetuning Behavior Cloning Policies
Recent advances in behavior cloning (BC) have enabled impressive visuomotor control policies. However, these approaches are limited by the quality of human demonstrations, the manual effort required for data collection, and the diminishing returns from increasing offline data. In comparison, reinforcement learning (RL) trains an agent through autonomous interaction with the environment and has shown remarkable success in various domains. Still, training RL policies directly on real-world robots remains challenging due to sample inefficiency, safety concerns, and the difficulty of learning from sparse rewards for long-horizon tasks, especially for high-degree-of-freedom (DoF) systems. We present a recipe that combines the benefits of BC and RL through a residual learning framework. Our approach leverages BC policies as black-box bases and learns lightweight per-step residual corrections via sample-efficient off-policy RL. We demonstrate that our method requires only sparse binary reward signals and can effectively improve manipulation policies on high-degree-of-freedom (DoF) systems in both simulation and the real world. In particular, we demonstrate, to the best of our knowledge, the first successful real-world RL training on a humanoid robot with dexterous hands. Our results demonstrate state-of-the-art performance in various vision-based tasks, pointing towards a practical pathway for deploying RL in the real world. Project website: https://residual-offpolicy-rl.github.io
SOE: Sample-Efficient Robot Policy Self-Improvement via On-Manifold Exploration
Intelligent agents progress by continually refining their capabilities through actively exploring environments. Yet robot policies often lack sufficient exploration capability due to action mode collapse. Existing methods that encourage exploration typically rely on random perturbations, which are unsafe and induce unstable, erratic behaviors, thereby limiting their effectiveness. We propose Self-Improvement via On-Manifold Exploration (SOE), a framework that enhances policy exploration and improvement in robotic manipulation. SOE learns a compact latent representation of task-relevant factors and constrains exploration to the manifold of valid actions, ensuring safety, diversity, and effectiveness. It can be seamlessly integrated with arbitrary policy models as a plug-in module, augmenting exploration without degrading the base policy performance. Moreover, the structured latent space enables human-guided exploration, further improving efficiency and controllability. Extensive experiments in both simulation and real-world tasks demonstrate that SOE consistently outperforms prior methods, achieving higher task success rates, smoother and safer exploration, and superior sample efficiency. These results establish on-manifold exploration as a principled approach to sample-efficient policy self-improvement. Project website: https://ericjin2002.github.io/SOE
Imitation-Guided Bimanual Planning for Stable Manipulation under Changing External Forces
Robotic manipulation in dynamic environments often requires seamless transitions between different grasp types to maintain stability and efficiency. However, achieving smooth and adaptive grasp transitions remains a challenge, particularly when dealing with external forces and complex motion constraints. Existing grasp transition strategies often fail to account for varying external forces and do not optimize motion performance effectively. In this work, we propose an Imitation-Guided Bimanual Planning Framework that integrates efficient grasp transition strategies and motion performance optimization to enhance stability and dexterity in robotic manipulation. Our approach introduces Strategies for Sampling Stable Intersections in Grasp Manifolds for seamless transitions between uni-manual and bi-manual grasps, reducing computational costs and regrasping inefficiencies. Additionally, a Hierarchical Dual-Stage Motion Architecture combines an Imitation Learning-based Global Path Generator with a Quadratic Programming-driven Local Planner to ensure real-time motion feasibility, obstacle avoidance, and superior manipulability. The proposed method is evaluated through a series of force-intensive tasks, demonstrating significant improvements in grasp transition efficiency and motion performance. A video demonstrating our simulation results can be viewed at \href{https://youtu.be/3DhbUsv4eDo}{\textcolor{blue}{https://youtu.be/3DhbUsv4eDo}}.
Proactive-reactive detection and mitigation of intermittent faults in robot swarms
Intermittent faults are transient errors that sporadically appear and disappear. Although intermittent faults pose substantial challenges to reliability and coordination, existing studies of fault tolerance in robot swarms focus instead on permanent faults. One reason for this is that intermittent faults are prohibitively difficult to detect in the fully self-organized ad-hoc networks typical of robot swarms, as their network topologies are transient and often unpredictable. However, in the recently introduced self-organizing nervous systems (SoNS) approach, robot swarms are able to self-organize persistent network structures for the first time, easing the problem of detecting intermittent faults. To address intermittent faults in robot swarms that have persistent networks, we propose a novel proactive-reactive strategy to detection and mitigation, based on self-organized backup layers and distributed consensus in a multiplex network. Proactively, the robots self-organize dynamic backup paths before faults occur, adapting to changes in the primary network topology and the robots' relative positions. Reactively, robots use one-shot likelihood ratio tests to compare information received along different paths in the multiplex network, enabling early fault detection. Upon detection, communication is temporarily rerouted in a self-organized way, until the detected fault resolves. We validate the approach in representative scenarios of faulty positional data occurring during formation control, demonstrating that intermittent faults are prevented from disrupting convergence to desired formations, with high fault detection accuracy and low rates of false positives.
MagiClaw: A Dual-Use, Vision-Based Soft Gripper for Bridging the Human Demonstration to Robotic Deployment Gap
The transfer of manipulation skills from human demonstration to robotic execution is often hindered by a "domain gap" in sensing and morphology. This paper introduces MagiClaw, a versatile two-finger end-effector designed to bridge this gap. MagiClaw functions interchangeably as both a handheld tool for intuitive data collection and a robotic end-effector for policy deployment, ensuring hardware consistency and reliability. Each finger incorporates a Soft Polyhedral Network (SPN) with an embedded camera, enabling vision-based estimation of 6-DoF forces and contact deformation. This proprioceptive data is fused with exteroceptive environmental sensing from an integrated iPhone, which provides 6D pose, RGB video, and LiDAR-based depth maps. Through a custom iOS application, MagiClaw streams synchronized, multi-modal data for real-time teleoperation, offline policy learning, and immersive control via mixed-reality interfaces. We demonstrate how this unified system architecture lowers the barrier to collecting high-fidelity, contact-rich datasets and accelerates the development of generalizable manipulation policies. Please refer to the iOS app at https://apps.apple.com/cn/app/magiclaw/id6661033548 for further details.
comment: 8 pages, 4 figures, accepted to Data@CoRL2025 Workshop
A Multimodal Stochastic Planning Approach for Navigation and Multi-Robot Coordination
In this paper, we present a receding-horizon, sampling-based planner capable of reasoning over multimodal policy distributions. By using the cross-entropy method to optimize a multimodal policy under a common cost function, our approach increases robustness against local minima and promotes effective exploration of the solution space. We show that our approach naturally extends to multi-robot collision-free planning, enables agents to share diverse candidate policies to avoid deadlocks, and allows teams to minimize a global objective without incurring the computational complexity of centralized optimization. Numerical simulations demonstrate that employing multiple modes significantly improves success rates in trap environments and in multi-robot collision avoidance. Hardware experiments further validate the approach's real-time feasibility and practical performance.
comment: 8 Pages, 7 Figures
BiGraspFormer: End-to-End Bimanual Grasp Transformer
Bimanual grasping is essential for robots to handle large and complex objects. However, existing methods either focus solely on single-arm grasping or employ separate grasp generation and bimanual evaluation stages, leading to coordination problems including collision risks and unbalanced force distribution. To address these limitations, we propose BiGraspFormer, a unified end-to-end transformer framework that directly generates coordinated bimanual grasps from object point clouds. Our key idea is the Single-Guided Bimanual (SGB) strategy, which first generates diverse single grasp candidates using a transformer decoder, then leverages their learned features through specialized attention mechanisms to jointly predict bimanual poses and quality scores. This conditioning strategy reduces the complexity of the 12-DoF search space while ensuring coordinated bimanual manipulation. Comprehensive simulation experiments and real-world validation demonstrate that BiGraspFormer consistently outperforms existing methods while maintaining efficient inference speed (<0.05s), confirming the effectiveness of our framework. Code and supplementary materials are available at https://sites.google.com/bigraspformer
comment: 8 pages, 5 figures
A Fast Initialization Method for Neural Network Controllers: A Case Study of Image-based Visual Servoing Control for the multicopter Interception
Reinforcement learning-based controller design methods often require substantial data in the initial training phase. Moreover, the training process tends to exhibit strong randomness and slow convergence. It often requires considerable time or high computational resources. Another class of learning-based method incorporates Lyapunov stability theory to obtain a control policy with stability guarantees. However, these methods generally require an initially stable neural network control policy at the beginning of training. Evidently, a stable neural network controller can not only serve as an initial policy for reinforcement learning, allowing the training to focus on improving controller performance, but also act as an initial state for learning-based Lyapunov control methods. Although stable controllers can be designed using traditional control theory, designers still need to have a great deal of control design knowledge to address increasingly complicated control problems. The proposed neural network rapid initialization method in this paper achieves the initial training of the neural network control policy by constructing datasets that conform to the stability conditions based on the system model. Furthermore, using the image-based visual servoing control for multicopter interception as a case study, simulations and experiments were conducted to validate the effectiveness and practical performance of the proposed method. In the experiment, the trained control policy attains a final interception velocity of 15 m/s.
Spectral Signature Mapping from RGB Imagery for Terrain-Aware Navigation
Successful navigation in outdoor environments requires accurate prediction of the physical interactions between the robot and the terrain. To this end, several methods rely on geometric or semantic labels to classify traversable surfaces. However, such labels cannot distinguish visually similar surfaces that differ in material properties. Spectral sensors enable inference of material composition from surface reflectance measured across multiple wavelength bands. Although spectral sensing is gaining traction in robotics, widespread deployment remains constrained by the need for custom hardware integration, high sensor costs, and compute-intensive processing pipelines. In this paper, we present RGB Image to Spectral Signature Neural Network (RS-Net), a deep neural network designed to bridge the gap between the accessibility of RGB sensing and the rich material information provided by spectral data. RS-Net predicts spectral signatures from RGB patches, which we map to terrain labels and friction coefficients. The resulting terrain classifications are integrated into a sampling-based motion planner for a wheeled robot operating in outdoor environments. Likewise, the friction estimates are incorporated into a contact-force-based MPC for a quadruped robot navigating slippery surfaces. Thus, we introduce a framework that learns the task-relevant physical property once during training and thereafter relies solely on RGB sensing at test time. The code is available at https://github.com/prajapatisarvesh/RS-Net.
comment: 8 pages, 10 figures, submitted to Robotic Computing & Communication
FUNCanon: Learning Pose-Aware Action Primitives via Functional Object Canonicalization for Generalizable Robotic Manipulation
General-purpose robotic skills from end-to-end demonstrations often leads to task-specific policies that fail to generalize beyond the training distribution. Therefore, we introduce FunCanon, a framework that converts long-horizon manipulation tasks into sequences of action chunks, each defined by an actor, verb, and object. These chunks focus policy learning on the actions themselves, rather than isolated tasks, enabling compositionality and reuse. To make policies pose-aware and category-general, we perform functional object canonicalization for functional alignment and automatic manipulation trajectory transfer, mapping objects into shared functional frames using affordance cues from large vision language models. An object centric and action centric diffusion policy FuncDiffuser trained on this aligned data naturally respects object affordances and poses, simplifying learning and improving generalization ability. Experiments on simulated and real-world benchmarks demonstrate category-level generalization, cross-task behavior reuse, and robust sim2real deployment, showing that functional canonicalization provides a strong inductive bias for scalable imitation learning in complex manipulation domains. Details of the demo and supplemental material are available on our project website https://sites.google.com/view/funcanon.
comment: project website: https://sites.google.com/view/funcanon, 11 pages
World4RL: Diffusion World Models for Policy Refinement with Reinforcement Learning for Robotic Manipulation
Robotic manipulation policies are commonly initialized through imitation learning, but their performance is limited by the scarcity and narrow coverage of expert data. Reinforcement learning can refine polices to alleviate this limitation, yet real-robot training is costly and unsafe, while training in simulators suffers from the sim-to-real gap. Recent advances in generative models have demonstrated remarkable capabilities in real-world simulation, with diffusion models in particular excelling at generation. This raises the question of how diffusion model-based world models can be combined to enhance pre-trained policies in robotic manipulation. In this work, we propose World4RL, a framework that employs diffusion-based world models as high-fidelity simulators to refine pre-trained policies entirely in imagined environments for robotic manipulation. Unlike prior works that primarily employ world models for planning, our framework enables direct end-to-end policy optimization. World4RL is designed around two principles: pre-training a diffusion world model that captures diverse dynamics on multi-task datasets and refining policies entirely within a frozen world model to avoid online real-world interactions. We further design a two-hot action encoding scheme tailored for robotic manipulation and adopt diffusion backbones to improve modeling fidelity. Extensive simulation and real-world experiments demonstrate that World4RL provides high-fidelity environment modeling and enables consistent policy refinement, yielding significantly higher success rates compared to imitation learning and other baselines. More visualization results are available at https://world4rl.github.io/.
SlicerROS2: A Research and Development Module for Image-Guided Robotic Interventions
Image-guided robotic interventions involve the use of medical imaging in tandem with robotics. SlicerROS2 is a software module that combines 3D Slicer and robot operating system (ROS) in pursuit of a standard integration approach for medical robotics research. The first release of SlicerROS2 demonstrated the feasibility of using the C++ API from 3D Slicer and ROS to load and visualize robots in real time. Since this initial release, we've rewritten and redesigned the module to offer greater modularity, access to low-level features, access to 3D Slicer's Python API, and better data transfer protocols. In this paper, we introduce this new design as well as four applications that leverage the core functionalities of SlicerROS2 in realistic image-guided robotics scenarios.
ManipForce: Force-Guided Policy Learning with Frequency-Aware Representation for Contact-Rich Manipulation
Contact-rich manipulation tasks such as precision assembly require precise control of interaction forces, yet existing imitation learning methods rely mainly on vision-only demonstrations. We propose ManipForce, a handheld system designed to capture high-frequency force-torque (F/T) and RGB data during natural human demonstrations for contact-rich manipulation. Building on these demonstrations, we introduce the Frequency-Aware Multimodal Transformer (FMT). FMT encodes asynchronous RGB and F/T signals using frequency- and modality-aware embeddings and fuses them via bi-directional cross-attention within a transformer diffusion policy. Through extensive experiments on six real-world contact-rich manipulation tasks - such as gear assembly, box flipping, and battery insertion - FMT trained on ManipForce demonstrations achieves robust performance with an average success rate of 83% across all tasks, substantially outperforming RGB-only baselines. Ablation and sampling-frequency analyses further confirm that incorporating high-frequency F/T data and cross-modal integration improves policy performance, especially in tasks demanding high precision and stable contact.
comment: 9 pages, 9 figures
TacEva: A Performance Evaluation Framework For Vision-Based Tactile Sensors
Vision-Based Tactile Sensors (VBTSs) are widely used in robotic tasks because of the high spatial resolution they offer and their relatively low manufacturing costs. However, variations in their sensing mechanisms, structural dimension, and other parameters lead to significant performance disparities between existing VBTSs. This makes it challenging to optimize them for specific tasks, as both the initial choice and subsequent fine-tuning are hindered by the lack of standardized metrics. To address this issue, TacEva is introduced as a comprehensive evaluation framework for the quantitative analysis of VBTS performance. The framework defines a set of performance metrics that capture key characteristics in typical application scenarios. For each metric, a structured experimental pipeline is designed to ensure consistent and repeatable quantification. The framework is applied to multiple VBTSs with distinct sensing mechanisms, and the results demonstrate its ability to provide a thorough evaluation of each design and quantitative indicators for each performance dimension. This enables researchers to pre-select the most appropriate VBTS on a task by task basis, while also offering performance-guided insights into the optimization of VBTS design. A list of existing VBTS evaluation methods and additional evaluations can be found on our website: https://stevenoh2003.github.io/TacEva/
comment: 14 pages, 8 figures. Equal contribution: Qingzheng Cong, Steven Oh, Wen Fan. Corresponding author: Dandan Zhang (d.zhang17@imperial.ac.uk). Additional resources at http://stevenoh2003.github.io/TacEva/
Reduced-Order Model-Guided Reinforcement Learning for Demonstration-Free Humanoid Locomotion
We introduce Reduced-Order Model-Guided Reinforcement Learning (ROM-GRL), a two-stage reinforcement learning framework for humanoid walking that requires no motion capture data or elaborate reward shaping. In the first stage, a compact 4-DOF (four-degree-of-freedom) reduced-order model (ROM) is trained via Proximal Policy Optimization. This generates energy-efficient gait templates. In the second stage, those dynamically consistent trajectories guide a full-body policy trained with Soft Actor--Critic augmented by an adversarial discriminator, ensuring the student's five-dimensional gait feature distribution matches the ROM's demonstrations. Experiments at 1 meter-per-second and 4 meter-per-second show that ROM-GRL produces stable, symmetric gaits with substantially lower tracking error than a pure-reward baseline. By distilling lightweight ROM guidance into high-dimensional policies, ROM-GRL bridges the gap between reward-only and imitation-based locomotion methods, enabling versatile, naturalistic humanoid behaviors without any human demonstrations.
comment: 11 pages, 5 figures, 1 table, Computational Science Graduate Project
Pure Vision Language Action (VLA) Models: A Comprehensive Survey
The emergence of Vision Language Action (VLA) models marks a paradigm shift from traditional policy-based control to generalized robotics, reframing Vision Language Models (VLMs) from passive sequence generators into active agents for manipulation and decision-making in complex, dynamic environments. This survey delves into advanced VLA methods, aiming to provide a clear taxonomy and a systematic, comprehensive review of existing research. It presents a comprehensive analysis of VLA applications across different scenarios and classifies VLA approaches into several paradigms: autoregression-based, diffusion-based, reinforcement-based, hybrid, and specialized methods; while examining their motivations, core strategies, and implementations in detail. In addition, foundational datasets, benchmarks, and simulation platforms are introduced. Building on the current VLA landscape, the review further proposes perspectives on key challenges and future directions to advance research in VLA models and generalizable robotics. By synthesizing insights from over three hundred recent studies, this survey maps the contours of this rapidly evolving field and highlights the opportunities and challenges that will shape the development of scalable, general-purpose VLA methods.
Category-Level Object Shape and Pose Estimation in Less Than a Millisecond
Object shape and pose estimation is a foundational robotics problem, supporting tasks from manipulation to scene understanding and navigation. We present a fast local solver for shape and pose estimation which requires only category-level object priors and admits an efficient certificate of global optimality. Given an RGB-D image of an object, we use a learned front-end to detect sparse, category-level semantic keypoints on the target object. We represent the target object's unknown shape using a linear active shape model and pose a maximum a posteriori optimization problem to solve for position, orientation, and shape simultaneously. Expressed in unit quaternions, this problem admits first-order optimality conditions in the form of an eigenvalue problem with eigenvector nonlinearities. Our primary contribution is to solve this problem efficiently with self-consistent field iteration, which only requires computing a 4-by-4 matrix and finding its minimum eigenvalue-vector pair at each iterate. Solving a linear system for the corresponding Lagrange multipliers gives a simple global optimality certificate. One iteration of our solver runs in about 100 microseconds, enabling fast outlier rejection. We test our method on synthetic data and a variety of real-world settings, including two public datasets and a drone tracking scenario. Code is released at https://github.com/MIT-SPARK/Fast-ShapeAndPose.
Towards Robust LiDAR Localization: Deep Learning-based Uncertainty Estimation
LiDAR-based localization and SLAM often rely on iterative matching algorithms, particularly the Iterative Closest Point (ICP) algorithm, to align sensor data with pre-existing maps or previous scans. However, ICP is prone to errors in featureless environments and dynamic scenes, leading to inaccurate pose estimation. Accurately predicting the uncertainty associated with ICP is crucial for robust state estimation but remains challenging, as existing approaches often rely on handcrafted models or simplified assumptions. Moreover, a few deep learning-based methods for localizability estimation either depend on a pre-built map, which may not always be available, or provide a binary classification of localizable versus non-localizable, which fails to properly model uncertainty. In this work, we propose a data-driven framework that leverages deep learning to estimate the registration error covariance of ICP before matching, even in the absence of a reference map. By associating each LiDAR scan with a reliable 6-DoF error covariance estimate, our method enables seamless integration of ICP within Kalman filtering, enhancing localization accuracy and robustness. Extensive experiments on the KITTI dataset demonstrate the effectiveness of our approach, showing that it accurately predicts covariance and, when applied to localization using a pre-built map or SLAM, reduces localization errors and improves robustness.
Eva-VLA: Evaluating Vision-Language-Action Models' Robustness Under Real-World Physical Variations
Vision-Language-Action (VLA) models have emerged as promising solutions for robotic manipulation, yet their robustness to real-world physical variations remains critically underexplored. To bridge this gap, we propose Eva-VLA, the first unified framework that systematically evaluates the robustness of VLA models by transforming discrete physical variations into continuous optimization problems. However, comprehensively assessing VLA robustness presents two key challenges: (1) how to systematically characterize diverse physical variations encountered in real-world deployments while maintaining evaluation reproducibility, and (2) how to discover worst-case scenarios without prohibitive real-world data collection costs efficiently. To address the first challenge, we decompose real-world variations into three critical domains: object 3D transformations that affect spatial reasoning, illumination variations that challenge visual perception, and adversarial patches that disrupt scene understanding. For the second challenge, we introduce a continuous black-box optimization framework that transforms discrete physical variations into parameter optimization, enabling systematic exploration of worst-case scenarios. Extensive experiments on state-of-the-art OpenVLA models across multiple benchmarks reveal alarming vulnerabilities: all variation types trigger failure rates exceeding 60%, with object transformations causing up to 97.8% failure in long-horizon tasks. Our findings expose critical gaps between controlled laboratory success and unpredictable deployment readiness, while the Eva-VLA framework provides a practical pathway for hardening VLA-based robotic manipulation models against real-world deployment challenges.
Lang2Morph: Language-Driven Morphological Design of Robotic Hands
Designing robotic hand morphologies for diverse manipulation tasks requires balancing dexterity, manufacturability, and task-specific functionality. While open-source frameworks and parametric tools support reproducible design, they still rely on expert heuristics and manual tuning. Automated methods using optimization are often compute-intensive, simulation-dependent, and rarely target dexterous hands. Large language models (LLMs), with their broad knowledge of human-object interactions and strong generative capabilities, offer a promising alternative for zero-shot design reasoning. In this paper, we present Lang2Morph, a language-driven pipeline for robotic hand design. It uses LLMs to translate natural-language task descriptions into symbolic structures and OPH-compatible parameters, enabling 3D-printable task-specific morphologies. The pipeline consists of: (i) Morphology Design, which maps tasks into semantic tags, structural grammars, and OPH-compatible parameters; and (ii) Selection and Refinement, which evaluates design candidates based on semantic alignment and size compatibility, and optionally applies LLM-guided refinement when needed. We evaluate Lang2Morph across varied tasks, and results show that our approach can generate diverse, task-relevant morphologies. To our knowledge, this is the first attempt to develop an LLM-based framework for task-conditioned robotic hand design.
Bi-VLA: Bilateral Control-Based Imitation Learning via Vision-Language Fusion for Action Generation
We propose Bilateral Control-Based Imitation Learning via Vision-Language Fusion for Action Generation (Bi-VLA), a novel framework that extends bilateral control-based imitation learning to handle more than one task within a single model. Conventional bilateral control methods exploit joint angle, velocity, torque, and vision for precise manipulation but require task-specific models, limiting their generality. Bi-VLA overcomes this limitation by utilizing robot joint angle, velocity, and torque data from leader-follower bilateral control with visual features and natural language instructions through SigLIP and FiLM-based fusion. We validated Bi-VLA on two task types: one requiring supplementary language cues and another distinguishable solely by vision. Real-robot experiments showed that Bi-VLA successfully interprets vision-language combinations and improves task success rates compared to conventional bilateral control-based imitation learning. Our Bi-VLA addresses the single-task limitation of prior bilateral approaches and provides empirical evidence that combining vision and language significantly enhances versatility. Experimental results validate the effectiveness of Bi-VLA in real-world tasks. For additional material, please visit the website: https://mertcookimg.github.io/bi-vla/
DexSkin: High-Coverage Conformable Robotic Skin for Learning Contact-Rich Manipulation
Human skin provides a rich tactile sensing stream, localizing intentional and unintentional contact events over a large and contoured region. Replicating these tactile sensing capabilities for dexterous robotic manipulation systems remains a longstanding challenge. In this work, we take a step towards this goal by introducing DexSkin. DexSkin is a soft, conformable capacitive electronic skin that enables sensitive, localized, and calibratable tactile sensing, and can be tailored to varying geometries. We demonstrate its efficacy for learning downstream robotic manipulation by sensorizing a pair of parallel jaw gripper fingers, providing tactile coverage across almost the entire finger surfaces. We empirically evaluate DexSkin's capabilities in learning challenging manipulation tasks that require sensing coverage across the entire surface of the fingers, such as reorienting objects in hand and wrapping elastic bands around boxes, in a learning-from-demonstration framework. We then show that, critically for data-driven approaches, DexSkin can be calibrated to enable model transfer across sensor instances, and demonstrate its applicability to online reinforcement learning on real robots. Our results highlight DexSkin's suitability and practicality for learning real-world, contact-rich manipulation. Please see our project webpage for videos and visualizations: https://dex-skin.github.io/.
comment: Accepted to CoRL 2025
Application Management in C-ITS: Orchestrating Demand-Driven Deployments and Reconfigurations SC 2025
Vehicles are becoming increasingly automated and interconnected, enabling the formation of cooperative intelligent transport systems (C-ITS) and the use of offboard services. As a result, cloud-native techniques, such as microservices and container orchestration, play an increasingly important role in their operation. However, orchestrating applications in a large-scale C-ITS poses unique challenges due to the dynamic nature of the environment and the need for efficient resource utilization. In this paper, we present a demand-driven application management approach that leverages cloud-native techniques - specifically Kubernetes - to address these challenges. Taking into account the demands originating from different entities within the C-ITS, the approach enables the automation of processes, such as deployment, reconfiguration, update, upgrade, and scaling of microservices. Executing these processes on demand can, for example, reduce computing resource consumption and network traffic. A demand may include a request for provisioning an external supporting service, such as a collective environment model. The approach handles changing and new demands by dynamically reconciling them through our proposed application management framework built on Kubernetes and the Robot Operating System (ROS 2). We demonstrate the operation of our framework in the C-ITS use case of collective environment perception and make the source code of the prototypical framework publicly available at https://github.com/ika-rwth-aachen/application_manager .
comment: 7 pages, 2 figures, 2 tables; Accepted to be published as part of the 2025 IEEE International Conference on Intelligent Transportation Systems (ITSC 2025), Gold Coast, Australia, November 18-21, 2025
Human-Interpretable Uncertainty Explanations for Point Cloud Registration
In this paper, we address the point cloud registration problem, where well-known methods like ICP fail under uncertainty arising from sensor noise, pose-estimation errors, and partial overlap due to occlusion. We develop a novel approach, Gaussian Process Concept Attribution (GP-CA), which not only quantifies registration uncertainty but also explains it by attributing uncertainty to well-known sources of errors in registration problems. Our approach leverages active learning to discover new uncertainty sources in the wild by querying informative instances. We validate GP-CA on three publicly available datasets and in our real-world robot experiment. Extensive ablations substantiate our design choices. Our approach outperforms other state-of-the-art methods in terms of runtime, high sample-efficiency with active learning, and high accuracy. Our real-world experiment clearly demonstrates its applicability. Our video also demonstrates that GP-CA enables effective failure-recovery behaviors, yielding more robust robotic perception.
VGGT-DP: Generalizable Robot Control via Vision Foundation Models AAAI 2026
Visual imitation learning frameworks allow robots to learn manipulation skills from expert demonstrations. While existing approaches mainly focus on policy design, they often neglect the structure and capacity of visual encoders, limiting spatial understanding and generalization. Inspired by biological vision systems, which rely on both visual and proprioceptive cues for robust control, we propose VGGT-DP, a visuomotor policy framework that integrates geometric priors from a pretrained 3D perception model with proprioceptive feedback. We adopt the Visual Geometry Grounded Transformer (VGGT) as the visual encoder and introduce a proprioception-guided visual learning strategy to align perception with internal robot states, improving spatial grounding and closed-loop control. To reduce inference latency, we design a frame-wise token reuse mechanism that compacts multi-view tokens into an efficient spatial representation. We further apply random token pruning to enhance policy robustness and reduce overfitting. Experiments on challenging MetaWorld tasks show that VGGT-DP significantly outperforms strong baselines such as DP and DP3, particularly in precision-critical and long-horizon scenarios.
comment: submitted to AAAI 2026
Guaranteed Robust Nonlinear MPC via Disturbance Feedback
Robots must satisfy safety-critical state and input constraints despite disturbances and model mismatch. We introduce a robust model predictive control (RMPC) formulation that is fast, scalable, and compatible with real-time implementation. Our formulation guarantees robust constraint satisfaction, input-to-state stability (ISS) and recursive feasibility. The key idea is to decompose the uncertain nonlinear system into (i) a nominal nonlinear dynamic model, (ii) disturbance-feedback controllers, and (iii) bounds on the model error. These components are optimized jointly using sequential convex programming. The resulting convex subproblems are solved efficiently using a recent disturbance-feedback MPC solver. The approach is validated across multiple dynamics, including a rocket-landing problem with steerable thrust. An open-source implementation is available at https://github.com/antoineleeman/robust-nonlinear-mpc.
comment: Code: https://github.com/antoineleeman/robust-nonlinear-mpc
MV-UMI: A Scalable Multi-View Interface for Cross-Embodiment Learning
Recent advances in imitation learning have shown great promise for developing robust robot manipulation policies from demonstrations. However, this promise is contingent on the availability of diverse, high-quality datasets, which are not only challenging and costly to collect but are often constrained to a specific robot embodiment. Portable handheld grippers have recently emerged as intuitive and scalable alternatives to traditional robotic teleoperation methods for data collection. However, their reliance solely on first-person view wrist-mounted cameras often creates limitations in capturing sufficient scene contexts. In this paper, we present MV-UMI (Multi-View Universal Manipulation Interface), a framework that integrates a third-person perspective with the egocentric camera to overcome this limitation. This integration mitigates domain shifts between human demonstration and robot deployment, preserving the cross-embodiment advantages of handheld data-collection devices. Our experimental results, including an ablation study, demonstrate that our MV-UMI framework improves performance in sub-tasks requiring broad scene understanding by approximately 47% across 3 tasks, confirming the effectiveness of our approach in expanding the range of feasible manipulation tasks that can be learned using handheld gripper systems, without compromising the cross-embodiment advantages inherent to such systems.
comment: For project website and videos, see https https://mv-umi.github.io
An Extended Kalman Filter for Systems with Infinite-Dimensional Measurements
This article examines state estimation in discrete-time nonlinear stochastic systems with finite-dimensional states and infinite-dimensional measurements, motivated by real-world applications such as vision-based localization and tracking. We develop an extended Kalman filter (EKF) for real-time state estimation, with the measurement noise modeled as an infinite-dimensional random field. When applied to vision-based state estimation, the measurement Jacobians required to implement the EKF are shown to correspond to image gradients. This result provides a novel system-theoretic justification for the use of image gradients as features for vision-based state estimation, contrasting with their (often heuristic) introduction in many computer-vision pipelines. We demonstrate the practical utility of the EKF on a public real-world dataset involving the localization of an aerial drone using video from a downward-facing monocular camera. The EKF is shown to outperform VINS-MONO, an established visual-inertial odometry algorithm, in some cases achieving mean squared error reductions of up to an order of magnitude.
comment: 8 pages
Learning Obstacle Avoidance using Double DQN for Quadcopter Navigation
One of the challenges faced by Autonomous Aerial Vehicles is reliable navigation through urban environments. Factors like reduction in precision of Global Positioning System (GPS), narrow spaces and dynamically moving obstacles make the path planning of an aerial robot a complicated task. One of the skills required for the agent to effectively navigate through such an environment is to develop an ability to avoid collisions using information from onboard depth sensors. In this paper, we propose Reinforcement Learning of a virtual quadcopter robot agent equipped with a Depth Camera to navigate through a simulated urban environment.
Dual Iterative Learning Control for Multiple-Input Multiple-Output Dynamics with Validation in Robotic Systems
Solving motion tasks autonomously and accurately is a core ability for intelligent real-world systems. To achieve genuine autonomy across multiple systems and tasks, key challenges include coping with unknown dynamics and overcoming the need for manual parameter tuning, which is especially crucial in complex Multiple-Input Multiple-Output (MIMO) systems. This paper presents MIMO Dual Iterative Learning Control (DILC), a novel data-driven iterative learning scheme for simultaneous tracking control and model learning, without requiring any prior system knowledge or manual parameter tuning. The method is designed for repetitive MIMO systems and integrates seamlessly with established iterative learning control methods. We provide monotonic convergence conditions for both reference tracking error and model error in linear time-invariant systems. The DILC scheme -- rapidly and autonomously -- solves various motion tasks in high-fidelity simulations of an industrial robot and in multiple nonlinear real-world MIMO systems, without requiring model knowledge or manually tuning the algorithm. In our experiments, many reference tracking tasks are solved within 10-20 trials, and even complex motions are learned in less than 100 iterations. We believe that, because of its rapid and autonomous learning capabilities, DILC has the potential to serve as an efficient building block within complex learning frameworks for intelligent real-world systems.
comment: 11 pages, 4 figures
Query-Centric Diffusion Policy for Generalizable Robotic Assembly
The robotic assembly task poses a key challenge in building generalist robots due to the intrinsic complexity of part interactions and the sensitivity to noise perturbations in contact-rich settings. The assembly agent is typically designed in a hierarchical manner: high-level multi-part reasoning and low-level precise control. However, implementing such a hierarchical policy is challenging in practice due to the mismatch between high-level skill queries and low-level execution. To address this, we propose the Query-centric Diffusion Policy (QDP), a hierarchical framework that bridges high-level planning and low-level control by utilizing queries comprising objects, contact points, and skill information. QDP introduces a query-centric mechanism that identifies task-relevant components and uses them to guide low-level policies, leveraging point cloud observations to improve the policy's robustness. We conduct comprehensive experiments on the FurnitureBench in both simulation and real-world settings, demonstrating improved performance in skill precision and long-horizon success rate. In the challenging insertion and screwing tasks, QDP improves the skill-wise success rate by over 50% compared to baselines without structured queries.
comment: 8 pages, 7 figures
3D Flow Diffusion Policy: Visuomotor Policy Learning via Generating Flow in 3D Space
Learning robust visuomotor policies that generalize across diverse objects and interaction dynamics remains a central challenge in robotic manipulation. Most existing approaches rely on direct observation-to-action mappings or compress perceptual inputs into global or object-centric features, which often overlook localized motion cues critical for precise and contact-rich manipulation. We present 3D Flow Diffusion Policy (3D FDP), a novel framework that leverages scene-level 3D flow as a structured intermediate representation to capture fine-grained local motion cues. Our approach predicts the temporal trajectories of sampled query points and conditions action generation on these interaction-aware flows, implemented jointly within a unified diffusion architecture. This design grounds manipulation in localized dynamics while enabling the policy to reason about broader scene-level consequences of actions. Extensive experiments on the MetaWorld benchmark show that 3D FDP achieves state-of-the-art performance across 50 tasks, particularly excelling on medium and hard settings. Beyond simulation, we validate our method on eight real-robot tasks, where it consistently outperforms prior baselines in contact-rich and non-prehensile scenarios. These results highlight 3D flow as a powerful structural prior for learning generalizable visuomotor policies, supporting the development of more robust and versatile robotic manipulation. Robot demonstrations, additional results, and code can be found at https://sites.google.com/view/3dfdp/home.
comment: 7 main scripts + 2 reference pages
N2M: Bridging Navigation and Manipulation by Learning Pose Preference from Rollout
In mobile manipulation, the manipulation policy has strong preferences for initial poses where it is executed. However, the navigation module focuses solely on reaching the task area, without considering which initial pose is preferable for downstream manipulation. To address this misalignment, we introduce N2M, a transition module that guides the robot to a preferable initial pose after reaching the task area, thereby substantially improving task success rates. N2M features five key advantages: (1) reliance solely on ego-centric observation without requiring global or historical information; (2) real-time adaptation to environmental changes; (3) reliable prediction with high viewpoint robustness; (4) broad applicability across diverse tasks, manipulation policies, and robot hardware; and (5) remarkable data efficiency and generalizability. We demonstrate the effectiveness of N2M through extensive simulation and real-world experiments. In the PnPCounterToCab task, N2M improves the averaged success rate from 3% with the reachability-based baseline to 54%. Furthermore, in the Toybox Handover task, N2M provides reliable predictions even in unseen environments with only 15 data samples, showing remarkable data efficiency and generalizability.
Distributionally Robust Safe Motion Planning with Contextual Information
We present a distributionally robust approach for collision avoidance by incorporating contextual information. Specifically, we embed the conditional distribution of future trajectory of the obstacle conditioned on the motion of the ego agent in a reproducing kernel Hilbert space (RKHS) via the conditional kernel mean embedding operator. Then, we define an ambiguity set containing all distributions whose embedding in the RKHS is within a certain distance from the empirical estimate of conditional mean embedding learnt from past data. Consequently, a distributionally robust collision avoidance constraint is formulated, and included in the receding horizon based motion planning formulation of the ego agent. Simulation results show that the proposed approach is more successful in avoiding collision compared to approaches that do not include contextual information and/or distributional robustness in their formulation in several challenging scenarios.
SPiDR: A Simple Approach for Zero-Shot Safety in Sim-to-Real Transfer
Safety remains a major concern for deploying reinforcement learning (RL) in real-world applications. Simulators provide safe, scalable training environments, but the inevitable sim-to-real gap introduces additional safety concerns, as policies must satisfy constraints in real-world conditions that differ from simulation. To address this challenge, robust safe RL techniques offer principled methods, but are often incompatible with standard scalable training pipelines. In contrast, domain randomization, a simple and popular sim-to-real technique, stands out as a promising alternative, although it often results in unsafe behaviors in practice. We present SPiDR, short for Sim-to-real via Pessimistic Domain Randomization -- a scalable algorithm with provable guarantees for safe sim-to-real transfer. SPiDR uses domain randomization to incorporate the uncertainty about the sim-to-real gap into the safety constraints, making it versatile and highly compatible with existing training pipelines. Through extensive experiments on sim-to-sim benchmarks and two distinct real-world robotic platforms, we demonstrate that SPiDR effectively ensures safety despite the sim-to-real gap while maintaining strong performance.
Do You Need Proprioceptive States in Visuomotor Policies?
Imitation-learning-based visuomotor policies have been widely used in robot manipulation, where both visual observations and proprioceptive states are typically adopted together for precise control. However, in this study, we find that this common practice makes the policy overly reliant on the proprioceptive state input, which causes overfitting to the training trajectories and results in poor spatial generalization. On the contrary, we propose the State-free Policy, removing the proprioceptive state input and predicting actions only conditioned on visual observations. The State-free Policy is built in the relative end-effector action space, and should ensure the full task-relevant visual observations, here provided by dual wide-angle wrist cameras. Empirical results demonstrate that the State-free policy achieves significantly stronger spatial generalization than the state-based policy: in real-world tasks such as pick-and-place, challenging shirt-folding, and complex whole-body manipulation, spanning multiple robot embodiments, the average success rate improves from 0\% to 85\% in height generalization and from 6\% to 64\% in horizontal generalization. Furthermore, they also show advantages in data efficiency and cross-embodiment adaptation, enhancing their practicality for real-world deployment.
comment: Project page: https://statefreepolicy.github.io
Number Adaptive Formation Flight Planning via Affine Deformable Guidance in Narrow Environments
Formation maintenance with varying number of drones in narrow environments hinders the convergence of planning to the desired configurations. To address this challenge, this paper proposes a formation planning method guided by Deformable Virtual Structures (DVS) with continuous spatiotemporal transformation. Firstly, to satisfy swarm safety distance and preserve formation shape filling integrity for irregular formation geometries, we employ Lloyd algorithm for uniform $\underline{PA}$rtitioning and Hungarian algorithm for $\underline{AS}$signment (PAAS) in DVS. Subsequently, a spatiotemporal trajectory involving DVS is planned using primitive-based path search and nonlinear trajectory optimization. The DVS trajectory achieves adaptive transitions with respect to a varying number of drones while ensuring adaptability to narrow environments through affine transformation. Finally, each agent conducts distributed trajectory planning guided by desired spatiotemporal positions within the DVS, while incorporating collision avoidance and dynamic feasibility requirements. Our method enables up to 15\% of swarm numbers to join or leave in cluttered environments while rapidly restoring the desired formation shape in simulation. Compared to cutting-edge formation planning method, we demonstrate rapid formation recovery capacity and environmental adaptability. Real-world experiments validate the effectiveness and resilience of our formation planning method.
Generalizable Domain Adaptation for Sim-and-Real Policy Co-Training
Behavior cloning has shown promise for robot manipulation, but real-world demonstrations are costly to acquire at scale. While simulated data offers a scalable alternative, particularly with advances in automated demonstration generation, transferring policies to the real world is hampered by various simulation and real domain gaps. In this work, we propose a unified sim-and-real co-training framework for learning generalizable manipulation policies that primarily leverages simulation and only requires a few real-world demonstrations. Central to our approach is learning a domain-invariant, task-relevant feature space. Our key insight is that aligning the joint distributions of observations and their corresponding actions across domains provides a richer signal than aligning observations (marginals) alone. We achieve this by embedding an Optimal Transport (OT)-inspired loss within the co-training framework, and extend this to an Unbalanced OT framework to handle the imbalance between abundant simulation data and limited real-world examples. We validate our method on challenging manipulation tasks, showing it can leverage abundant simulation data to achieve up to a 30% improvement in the real-world success rate and even generalize to scenarios seen only in simulation.
The Case for Negative Data: From Crash Reports to Counterfactuals for Reasonable Driving
Learning-based autonomous driving systems are trained mostly on incident-free data, offering little guidance near safety-performance boundaries. Real crash reports contain precisely the contrastive evidence needed, but they are hard to use: narratives are unstructured, third-person, and poorly grounded to sensor views. We address these challenges by normalizing crash narratives to ego-centric language and converting both logs and crashes into a unified scene-action representation suitable for retrieval. At decision time, our system adjudicates proposed actions by retrieving relevant precedents from this unified index; an agentic counterfactual extension proposes plausible alternatives, retrieves for each, and reasons across outcomes before deciding. On a nuScenes benchmark, precedent retrieval substantially improves calibration, with recall on contextually preferred actions rising from 24% to 53%. The counterfactual variant preserves these gains while sharpening decisions near risk.
comment: 8 pages, 5 figures
SINGER: An Onboard Generalist Vision-Language Navigation Policy for Drones
Large vision-language models have driven remarkable progress in open-vocabulary robot policies, e.g., generalist robot manipulation policies, that enable robots to complete complex tasks specified in natural language. Despite these successes, open-vocabulary autonomous drone navigation remains an unsolved challenge due to the scarcity of large-scale demonstrations, real-time control demands of drones for stabilization, and lack of reliable external pose estimation modules. In this work, we present SINGER for language-guided autonomous drone navigation in the open world using only onboard sensing and compute. To train robust, open-vocabulary navigation policies, SINGER leverages three central components: (i) a photorealistic language-embedded flight simulator with minimal sim-to-real gap using Gaussian Splatting for efficient data generation, (ii) an RRT-inspired multi-trajectory generation expert for collision-free navigation demonstrations, and these are used to train (iii) a lightweight end-to-end visuomotor policy for real-time closed-loop control. Through extensive hardware flight experiments, we demonstrate superior zero-shot sim-to-real transfer of our policy to unseen environments and unseen language-conditioned goal objects. When trained on ~700k-1M observation action pairs of language conditioned visuomotor data and deployed on hardware, SINGER outperforms a velocity-controlled semantic guidance baseline by reaching the query 23.33% more on average, and maintains the query in the field of view 16.67% more on average, with 10% fewer collisions.
PIE: Perception and Interaction Enhanced End-to-End Motion Planning for Autonomous Driving
End-to-end motion planning is promising for simplifying complex autonomous driving pipelines. However, challenges such as scene understanding and effective prediction for decision-making continue to present substantial obstacles to its large-scale deployment. In this paper, we present PIE, a pioneering framework that integrates advanced perception, reasoning, and intention modeling to dynamically capture interactions between the ego vehicle and surrounding agents. It incorporates a bidirectional Mamba fusion that addresses data compression losses in multimodal fusion of camera and LiDAR inputs, alongside a novel reasoning-enhanced decoder integrating Mamba and Mixture-of-Experts to facilitate scene-compliant anchor selection and optimize adaptive trajectory inference. PIE adopts an action-motion interaction module to effectively utilize state predictions of surrounding agents to refine ego planning. The proposed framework is thoroughly validated on the NAVSIM benchmark. PIE, without using any ensemble and data augmentation techniques, achieves an 88.9 PDM score and 85.6 EPDM score, surpassing the performance of prior state-of-the-art methods. Comprehensive quantitative and qualitative analyses demonstrate that PIE is capable of reliably generating feasible and high-quality ego trajectories.
End-to-End Crop Row Navigation via LiDAR-Based Deep Reinforcement Learning
Reliable navigation in under-canopy agricultural environments remains a challenge due to GNSS unreliability, cluttered rows, and variable lighting. To address these limitations, we present an end-to-end learning-based navigation system that maps raw 3D LiDAR data directly to control commands using a deep reinforcement learning policy trained entirely in simulation. Our method includes a voxel-based downsampling strategy that reduces LiDAR input size by 95.83%, enabling efficient policy learning without relying on labeled datasets or manually designed control interfaces. The policy was validated in simulation, achieving a 100% success rate in straight-row plantations and showing a gradual decline in performance as row curvature increased, tested across varying sinusoidal frequencies and amplitudes.
comment: Accepted to the 22nd International Conference on Advanced Robotics (ICAR 2025). 7 pages
Growing with Your Embodied Agent: A Human-in-the-Loop Lifelong Code Generation Framework for Long-Horizon Manipulation Skills
Large language models (LLMs)-based code generation for robotic manipulation has recently shown promise by directly translating human instructions into executable code, but existing methods remain noisy, constrained by fixed primitives and limited context windows, and struggle with long-horizon tasks. While closed-loop feedback has been explored, corrected knowledge is often stored in improper formats, restricting generalization and causing catastrophic forgetting, which highlights the need for learning reusable skills. Moreover, approaches that rely solely on LLM guidance frequently fail in extremely long-horizon scenarios due to LLMs' limited reasoning capability in the robotic domain, where such issues are often straightforward for humans to identify. To address these challenges, we propose a human-in-the-loop framework that encodes corrections into reusable skills, supported by external memory and Retrieval-Augmented Generation with a hint mechanism for dynamic reuse. Experiments on Ravens, Franka Kitchen, and MetaWorld, as well as real-world settings, show that our framework achieves a 0.93 success rate (up to 27% higher than baselines) and a 42% efficiency improvement in correction rounds. It can robustly solve extremely long-horizon tasks such as "build a house", which requires planning over 20 primitives.
comment: upload 9 main page - v1
VLN-Zero: Rapid Exploration and Cache-Enabled Neurosymbolic Vision-Language Planning for Zero-Shot Transfer in Robot Navigation
Rapid adaptation in unseen environments is essential for scalable real-world autonomy, yet existing approaches rely on exhaustive exploration or rigid navigation policies that fail to generalize. We present VLN-Zero, a two-phase vision-language navigation framework that leverages vision-language models to efficiently construct symbolic scene graphs and enable zero-shot neurosymbolic navigation. In the exploration phase, structured prompts guide VLM-based search toward informative and diverse trajectories, yielding compact scene graph representations. In the deployment phase, a neurosymbolic planner reasons over the scene graph and environmental observations to generate executable plans, while a cache-enabled execution module accelerates adaptation by reusing previously computed task-location trajectories. By combining rapid exploration, symbolic reasoning, and cache-enabled execution, the proposed framework overcomes the computational inefficiency and poor generalization of prior vision-language navigation methods, enabling robust and scalable decision-making in unseen environments. VLN-Zero achieves 2x higher success rate compared to state-of-the-art zero-shot models, outperforms most fine-tuned baselines, and reaches goal locations in half the time with 55% fewer VLM calls on average compared to state-of-the-art models across diverse environments. Codebase, datasets, and videos for VLN-Zero are available at: https://vln-zero.github.io/.
comment: Codebase, datasets, and videos for VLN-Zero are available at: https://vln-zero.github.io/
LCMF: Lightweight Cross-Modality Mambaformer for Embodied Robotics VQA
Multimodal semantic learning plays a critical role in embodied intelligence, especially when robots perceive their surroundings, understand human instructions, and make intelligent decisions. However, the field faces technical challenges such as effective fusion of heterogeneous data and computational efficiency in resource-constrained environments. To address these challenges, this study proposes the lightweight LCMF cascaded attention framework, introducing a multi-level cross-modal parameter sharing mechanism into the Mamba module. By integrating the advantages of Cross-Attention and Selective parameter-sharing State Space Models (SSMs), the framework achieves efficient fusion of heterogeneous modalities and semantic complementary alignment. Experimental results show that LCMF surpasses existing multimodal baselines with an accuracy of 74.29% in VQA tasks and achieves competitive mid-tier performance within the distribution cluster of Large Language Model Agents (LLM Agents) in EQA video tasks. Its lightweight design achieves a 4.35-fold reduction in FLOPs relative to the average of comparable baselines while using only 166.51M parameters (image-text) and 219M parameters (video-text), providing an efficient solution for Human-Robot Interaction (HRI) applications in resource-constrained scenarios with strong multimodal decision generalization capabilities.
Event-guided 3D Gaussian Splatting for Dynamic Human and Scene Reconstruction
Reconstructing dynamic humans together with static scenes from monocular videos remains difficult, especially under fast motion, where RGB frames suffer from motion blur. Event cameras exhibit distinct advantages, e.g., microsecond temporal resolution, making them a superior sensing choice for dynamic human reconstruction. Accordingly, we present a novel event-guided human-scene reconstruction framework that jointly models human and scene from a single monocular event camera via 3D Gaussian Splatting. Specifically, a unified set of 3D Gaussians carries a learnable semantic attribute; only Gaussians classified as human undergo deformation for animation, while scene Gaussians stay static. To combat blur, we propose an event-guided loss that matches simulated brightness changes between consecutive renderings with the event stream, improving local fidelity in fast-moving regions. Our approach removes the need for external human masks and simplifies managing separate Gaussian sets. On two benchmark datasets, ZJU-MoCap-Blur and MMHPSD-Blur, it delivers state-of-the-art human-scene reconstruction, with notable gains over strong baselines in PSNR/SSIM and reduced LPIPS, especially for high-speed subjects.
Spatial Envelope MPC: High Performance Driving without a Reference
This paper presents a novel envelope based model predictive control (MPC) framework designed to enable autonomous vehicles to handle high performance driving across a wide range of scenarios without a predefined reference. In high performance autonomous driving, safe operation at the vehicle's dynamic limits requires a real time planning and control framework capable of accounting for key vehicle dynamics and environmental constraints when following a predefined reference trajectory is suboptimal or even infeasible. State of the art planning and control frameworks, however, are predominantly reference based, which limits their performance in such situations. To address this gap, this work first introduces a computationally efficient vehicle dynamics model tailored for optimization based control and a continuously differentiable mathematical formulation that accurately captures the entire drivable envelope. This novel model and formulation allow for the direct integration of dynamic feasibility and safety constraints into a unified planning and control framework, thereby removing the necessity for predefined references. The challenge of envelope planning, which refers to maximally approximating the safe drivable area, is tackled by combining reinforcement learning with optimization techniques. The framework is validated through both simulations and real world experiments, demonstrating its high performance across a variety of tasks, including racing, emergency collision avoidance and off road navigation. These results highlight the framework's scalability and broad applicability across a diverse set of scenarios.
The Impact of 2D Segmentation Backbones on Point Cloud Predictions Using 4D Radar
LiDAR's dense, sharp point cloud (PC) representations of the surrounding environment enable accurate perception and significantly improve road safety by offering greater scene awareness and understanding. However, LiDAR's high cost continues to restrict the broad adoption of high-level Autonomous Driving (AD) systems in commercially available vehicles. Prior research has shown progress towards circumventing the need for LiDAR by training a neural network, using LiDAR point clouds as ground truth (GT), to produce LiDAR-like 3D point clouds using only 4D Radars. One of the best examples is a neural network created to train a more efficient radar target detector with a modular 2D convolutional neural network (CNN) backbone and a temporal coherence network at its core that uses the RaDelft dataset for training (see arXiv:2406.04723). In this work, we investigate the impact of higher-capacity segmentation backbones on the quality of the produced point clouds. Our results show that while very high-capacity models may actually hurt performance, an optimal segmentation backbone can provide a 23.7% improvement over the state-of-the-art (SOTA).
Minimalistic Autonomous Stack for High-Speed Time-Trial Racing
Autonomous racing has seen significant advancements, driven by competitions such as the Indy Autonomous Challenge (IAC) and the Abu Dhabi Autonomous Racing League (A2RL). However, developing an autonomous racing stack for a full-scale car is often constrained by limited access to dedicated test tracks, restricting opportunities for real-world validation. While previous work typically requires extended development cycles and significant track time, this paper introduces a minimalistic autonomous racing stack for high-speed time-trial racing that emphasizes rapid deployment and efficient system integration with minimal on-track testing. The proposed stack was validated on real speedways, achieving a top speed of 206 km/h within just 11 hours' practice run on the track with 325 km in total. Additionally, we present the system performance analysis, including tracking accuracy, vehicle dynamics, and safety considerations, offering insights for teams seeking to rapidly develop and deploy an autonomous racing stack with limited track access.
comment: The data associated with this paper is available at https://doi.org/10.5281/zenodo.17187680
EgoBridge: Domain Adaptation for Generalizable Imitation from Egocentric Human Data NeurIPS 2025
Egocentric human experience data presents a vast resource for scaling up end-to-end imitation learning for robotic manipulation. However, significant domain gaps in visual appearance, sensor modalities, and kinematics between human and robot impede knowledge transfer. This paper presents EgoBridge, a unified co-training framework that explicitly aligns the policy latent spaces between human and robot data using domain adaptation. Through a measure of discrepancy on the joint policy latent features and actions based on Optimal Transport (OT), we learn observation representations that not only align between the human and robot domain but also preserve the action-relevant information critical for policy learning. EgoBridge achieves a significant absolute policy success rate improvement by 44% over human-augmented cross-embodiment baselines in three real-world single-arm and bimanual manipulation tasks. EgoBridge also generalizes to new objects, scenes, and tasks seen only in human data, where baselines fail entirely. Videos and additional information can be found at https://ego-bridge.github.io
comment: Accepted at 39th Conference on Neural Information Processing Systems (NeurIPS 2025) and Oral at Conference on Robot Learning (CoRL 2025)
Look as You Leap: Planning Simultaneous Motion and Perception for High-DOF Robots
In this work, we address the problem of planning robot motions for a high-degree-of-freedom (DoF) robot that effectively achieves a given perception task while the robot and the perception target move in a dynamic environment. Achieving navigation and perception tasks simultaneously is challenging, as these objectives often impose conflicting requirements. Existing methods that compute motion under perception constraints fail to account for obstacles, are designed for low-DoF robots, or rely on simplified models of perception. Furthermore, in dynamic real-world environments, robots must replan and react quickly to changes and directly evaluating the quality of perception (e.g., object detection confidence) is often expensive or infeasible at runtime. This problem is especially important in human-centered environments such as homes and hospitals, where effective perception is essential for safe and reliable operation. To address these challenges, we propose a GPU-parallelized perception-score-guided probabilistic roadmap planner with a neural surrogate model (PS-PRM). The planner explicitly incorporates the estimated quality of a perception task into motion planning for high-DoF robots. Our method uses a learned model to approximate perception scores and leverages GPU parallelism to enable efficient online replanning in dynamic settings. We demonstrate that our planner, evaluated on high-DoF robots, outperforms baseline methods in both static and dynamic environments in both simulation and real-robot experiments.
comment: 16 pages, 10 figures, under review
From Space to Time: Enabling Adaptive Safety with Learned Value Functions via Disturbance Recasting
The widespread deployment of autonomous systems in safety-critical environments such as urban air mobility hinges on ensuring reliable, performant, and safe operation under varying environmental conditions. One such approach, value function-based safety filters, minimally modifies a nominal controller to ensure safety. Recent advances leverage offline learned value functions to scale these safety filters to high-dimensional systems. However, these methods assume detailed priors on all possible sources of model mismatch, in the form of disturbances in the environment -- information that is rarely available in real world settings. Even in well-mapped environments like urban canyons or industrial sites, drones encounter complex, spatially-varying disturbances arising from payload-drone interaction, turbulent airflow, and other environmental factors. We introduce SPACE2TIME, which enables safe and adaptive deployment of offline-learned safety filters under unknown, spatially-varying disturbances. The key idea is to reparameterize spatial variations in disturbance as temporal variations, enabling the use of precomputed value functions during online operation. We validate SPACE2TIME on a quadcopter through extensive simulations and hardware experiments, demonstrating significant improvement over baselines.
comment: The first three authors contributed equally. This work has been accepted for publication at the Conference on Robot Learning
Terra: Hierarchical Terrain-Aware 3D Scene Graph for Task-Agnostic Outdoor Mapping
Outdoor intelligent autonomous robotic operation relies on a sufficiently expressive map of the environment. Classical geometric mapping methods retain essential structural environment information, but lack a semantic understanding and organization to allow high-level robotic reasoning. 3D scene graphs (3DSGs) address this limitation by integrating geometric, topological, and semantic relationships into a multi-level graph-based map. Outdoor autonomous operations commonly rely on terrain information either due to task-dependence or the traversability of the robotic platform. We propose a novel approach that combines indoor 3DSG techniques with standard outdoor geometric mapping and terrain-aware reasoning, producing terrain-aware place nodes and hierarchically organized regions for outdoor environments. Our method generates a task-agnostic metric-semantic sparse map and constructs a 3DSG from this map for downstream planning tasks, all while remaining lightweight for autonomous robotic operation. Our thorough evaluation demonstrates our 3DSG method performs on par with state-of-the-art camera-based 3DSG methods in object retrieval and surpasses them in region classification while remaining memory efficient. We demonstrate its effectiveness in diverse robotic tasks of object retrieval and region monitoring in both simulation and real-world environments.
Chasing Stability: Humanoid Running via Control Lyapunov Function Guided Reinforcement Learning ICRA 2026
Achieving highly dynamic behaviors on humanoid robots, such as running, requires controllers that are both robust and precise, and hence difficult to design. Classical control methods offer valuable insight into how such systems can stabilize themselves, but synthesizing real-time controllers for nonlinear and hybrid dynamics remains challenging. Recently, reinforcement learning (RL) has gained popularity for locomotion control due to its ability to handle these complex dynamics. In this work, we embed ideas from nonlinear control theory, specifically control Lyapunov functions (CLFs), along with optimized dynamic reference trajectories into the reinforcement learning training process to shape the reward. This approach, CLF-RL, eliminates the need to handcraft and tune heuristic reward terms, while simultaneously encouraging certifiable stability and providing meaningful intermediate rewards to guide learning. By grounding policy learning in dynamically feasible trajectories, we expand the robot's dynamic capabilities and enable running that includes both flight and single support phases. The resulting policy operates reliably on a treadmill and in outdoor environments, demonstrating robustness to disturbances applied to the torso and feet. Moreover, it achieves accurate global reference tracking utilizing only on-board sensors, making a critical step toward integrating these dynamic motions into a full autonomy stack.
comment: Submitted to ICRA 2026
Agentic Scene Policies: Unifying Space, Semantics, and Affordances for Robot Action
Executing open-ended natural language queries is a core problem in robotics. While recent advances in imitation learning and vision-language-actions models (VLAs) have enabled promising end-to-end policies, these models struggle when faced with complex instructions and new scenes. An alternative is to design an explicit scene representation as a queryable interface between the robot and the world, using query results to guide downstream motion planning. In this work, we present Agentic Scene Policies (ASP), an agentic framework that leverages the advanced semantic, spatial, and affordance-based querying capabilities of modern scene representations to implement a capable language-conditioned robot policy. ASP can execute open-vocabulary queries in a zero-shot manner by explicitly reasoning about object affordances in the case of more complex skills. Through extensive experiments, we compare ASP with VLAs on tabletop manipulation problems and showcase how ASP can tackle room-level queries through affordance-guided navigation, and a scaled-up scene representation. (Project page: https://montrealrobotics.ca/agentic-scene-policies.github.io/)
comment: Project page: https://montrealrobotics.ca/agentic-scene-policies.github.io/
AnySafe: Adapting Latent Safety Filters at Runtime via Safety Constraint Parameterization in the Latent Space
Recent works have shown that foundational safe control methods, such as Hamilton-Jacobi (HJ) reachability analysis, can be applied in the latent space of world models. While this enables the synthesis of latent safety filters for hard-to-model vision-based tasks, they assume that the safety constraint is known a priori and remains fixed during deployment, limiting the safety filter's adaptability across scenarios. To address this, we propose constraint-parameterized latent safety filters that can adapt to user-specified safety constraints at runtime. Our key idea is to define safety constraints by conditioning on an encoding of an image that represents a constraint, using a latent-space similarity measure. The notion of similarity to failure is aligned in a principled way through conformal calibration, which controls how closely the system may approach the constraint representation. The parameterized safety filter is trained entirely within the world model's imagination, treating any image seen by the model as a potential test-time constraint, thereby enabling runtime adaptation to arbitrary safety constraints. In simulation and hardware experiments on vision-based control tasks with a Franka manipulator, we show that our method adapts at runtime by conditioning on the encoding of user-specified constraint images, without sacrificing performance. Video results can be found on https://any-safe.github.io
RoMoCo: Robotic Motion Control Toolbox for Reduced-Order Model-Based Locomotion on Bipedal and Humanoid Robots
We present RoMoCo, an open-source C++ toolbox for the synthesis and evaluation of reduced-order model-based planners and whole-body controllers for bipedal and humanoid robots. RoMoCo's modular architecture unifies state-of-the-art planners and whole-body locomotion controllers under a consistent API, enabling rapid prototyping and reproducible benchmarking. By leveraging reduced-order models for platform-agnostic gait generation, RoMoCo enables flexible controller design across diverse robots. We demonstrate its versatility and performance through extensive simulations on the Cassie, Unitree H1, and G1 robots, and validate its real-world efficacy with hardware experiments on the Cassie and G1 humanoids.
Autonomous Elemental Characterization Enabled by a Low Cost Robotic Platform Built Upon a Generalized Software Architecture
Despite the rapidly growing applications of robots in industry, the use of robots to automate tasks in scientific laboratories is less prolific due to lack of generalized methodologies and high cost of hardware. This paper focuses on the automation of characterization tasks necessary for reducing cost while maintaining generalization, and proposes a software architecture for building robotic systems in scientific laboratory environment. A dual-layer (Socket.IO and ROS) action server design is the basic building block, which facilitates the implementation of a web-based front end for user-friendly operations and the use of ROS Behavior Tree for convenient task planning and execution. A robotic platform for automating mineral and material sample characterization is built upon the architecture, with an open source, low-cost three-axis computer numerical control gantry system serving as the main robot. A handheld laser induced breakdown spectroscopy (LIBS) analyzer is integrated with a 3D printed adapter, enabling automated 2D chemical mapping. We demonstrate the utility of automated chemical mapping by scanning of the surface of a spodumene-bearing pegmatite core sample with a 1071-point dense hyperspectral map acquired at a rate of 1520 bits per second. Automated LIBS scanning enables controlled chemical quantification in the laboratory that complements field-based measurements acquired with the same handheld device, linking resource exploration and processing steps in the supply chain for lithium-based battery materials.
Real-Time Reinforcement Learning for Dynamic Tasks with a Parallel Soft Robot IROS
Closed-loop control remains an open challenge in soft robotics. The nonlinear responses of soft actuators under dynamic loading conditions limit the use of analytic models for soft robot control. Traditional methods of controlling soft robots underutilize their configuration spaces to avoid nonlinearity, hysteresis, large deformations, and the risk of actuator damage. Furthermore, episodic data-driven control approaches such as reinforcement learning (RL) are traditionally limited by sample efficiency and inconsistency across initializations. In this work, we demonstrate RL for reliably learning control policies for dynamic balancing tasks in real-time single-shot hardware deployments. We use a deformable Stewart platform constructed using parallel, 3D-printed soft actuators based on motorized handed shearing auxetic (HSA) structures. By introducing a curriculum learning approach based on expanding neighborhoods of a known equilibrium, we achieve reliable single-deployment balancing at arbitrary coordinates. In addition to benchmarking the performance of model-based and model-free methods, we demonstrate that in a single deployment, Maximum Diffusion RL is capable of learning dynamic balancing after half of the actuators are effectively disabled, by inducing buckling and by breaking actuators with bolt cutters. Training occurs with no prior data, in as fast as 15 minutes, with performance nearly identical to the fully-intact platform. Single-shot learning on hardware facilitates soft robotic systems reliably learning in the real world and will enable more diverse and capable soft robots.
comment: Published at IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025
Score the Steps, Not Just the Goal: VLM-Based Subgoal Evaluation for Robotic Manipulation
Robot learning papers typically report a single binary success rate (SR), which obscures where a policy succeeds or fails along a multi-step manipulation task. We argue that subgoal-level reporting should become routine: for each trajectory, a vector of per-subgoal SRs that makes partial competence visible (e.g., grasp vs. pour). We propose a blueprint for StepEval, a cost-aware plug-in evaluation framework that utilizes vision-language models (VLMs) as automated judges of subgoal outcomes from recorded images or videos. Rather than proposing new benchmarks or APIs, our contribution is to outline design principles for a scalable, community-driven open-source project. In StepEval, the primary artifact for policy evaluation is the per-subgoal SR vector; however, other quantities (e.g., latency or cost estimates) are also considered for framework-optimization diagnostics to help the community tune evaluation efficiency and accuracy when ground-truth subgoal success labels are available. We discuss how such a framework can remain model-agnostic, support single- or multi-view inputs, and be lightweight enough to adopt across labs. The intended contribution is a shared direction: a minimal, extensible seed that invites open-source contributions, so that scoring the steps, not just the final goal, becomes a standard and reproducible practice.
comment: Accepted to the CoRL 2025 Eval&Deploy Workshop
Bioinspired SLAM Approach for Unmanned Surface Vehicle
This paper presents OpenRatSLAM2, a new version of OpenRatSLAM - a bioinspired SLAM framework based on computational models of the rodent hippocampus. OpenRatSLAM2 delivers low-computation-cost visual-inertial based SLAM, suitable for GPS-denied environments. Our contributions include a ROS2-based architecture, experimental results on new waterway datasets, and insights into system parameter tuning. This work represents the first known application of RatSLAM on USVs. The estimated trajectory was compared with ground truth data using the Hausdorff distance. The results show that the algorithm can generate a semimetric map with an error margin acceptable for most robotic applications.
A Bimanual Gesture Interface for ROS-Based Mobile Manipulators Using TinyML and Sensor Fusion
Gesture-based control for mobile manipulators faces persistent challenges in reliability, efficiency, and intuitiveness. This paper presents a dual-hand gesture interface that integrates TinyML, spectral analysis, and sensor fusion within a ROS framework to address these limitations. The system uses left-hand tilt and finger flexion, captured using accelerometer and flex sensors, for mobile base navigation, while right-hand IMU signals are processed through spectral analysis and classified by a lightweight neural network. This pipeline enables TinyML-based gesture recognition to control a 7-DOF Kinova Gen3 manipulator. By supporting simultaneous navigation and manipulation, the framework improves efficiency and coordination compared to sequential methods. Key contributions include a bimanual control architecture, real-time low-power gesture recognition, robust multimodal sensor fusion, and a scalable ROS-based implementation. The proposed approach advances Human-Robot Interaction (HRI) for industrial automation, assistive robotics, and hazardous environments, offering a cost-effective, open-source solution with strong potential for real-world deployment and further optimization.
comment: 12 pages, 11 figures
Supercomputing for High-speed Avoidance and Reactive Planning in Robots
This paper presents SHARP (Supercomputing for High-speed Avoidance and Reactive Planning), a proof-of-concept study demonstrating how high-performance computing (HPC) can enable millisecond-scale responsiveness in robotic control. While modern robots face increasing demands for reactivity in human--robot shared workspaces, onboard processors are constrained by size, power, and cost. Offloading to HPC offers massive parallelism for trajectory planning, but its feasibility for real-time robotics remains uncertain due to network latency and jitter. We evaluate SHARP in a stress-test scenario where a 7-DOF manipulator must dodge high-speed foam projectiles. Using a parallelized multi-goal A* search implemented with MPI on both local and remote HPC clusters, the system achieves mean planning latencies of 22.9 ms (local) and 30.0 ms (remote, ~300 km away), with avoidance success rates of 84% and 88%, respectively. These results show that when round-trip latency remains within the tens-of-milliseconds regime, HPC-side computation is no longer the bottleneck, enabling avoidance well below human reaction times. The SHARP results motivate hybrid control architectures: low-level reflexes remain onboard for safety, while bursty, high-throughput planning tasks are offloaded to HPC for scalability. By reporting per-stage timing and success rates, this study provides a reproducible template for assessing real-time feasibility of HPC-driven robotics. Collectively, SHARP reframes HPC offloading as a viable pathway toward dependable, reactive robots in dynamic environments.
comment: 8 pages, 3 figures
OmniVLA: An Omni-Modal Vision-Language-Action Model for Robot Navigation
Humans can flexibly interpret and compose different goal specifications, such as language instructions, spatial coordinates, or visual references, when navigating to a destination. In contrast, most existing robotic navigation policies are trained on a single modality, limiting their adaptability to real-world scenarios where different forms of goal specification are natural and complementary. In this work, we present a training framework for robotic foundation models that enables omni-modal goal conditioning for vision-based navigation. Our approach leverages a high-capacity vision-language-action (VLA) backbone and trains with three primary goal modalities: 2D poses, egocentric images, and natural language, as well as their combinations, through a randomized modality fusion strategy. This design not only expands the pool of usable datasets but also encourages the policy to develop richer geometric, semantic, and visual representations. The resulting model, OmniVLA, achieves strong generalization to unseen environments, robustness to scarce modalities, and the ability to follow novel natural language instructions. We demonstrate that OmniVLA outperforms specialist baselines across modalities and offers a flexible foundation for fine-tuning to new modalities and tasks. We believe OmniVLA provides a step toward broadly generalizable and flexible navigation policies, and a scalable path for building omni-modal robotic foundation models. We present videos showcasing OmniVLA performance and will release its checkpoints and training code on our project page.
comment: 9 pages, 7 figures, 6 tables
Robust Near-Optimal Nonlinear Target Enclosing Guidance
This paper proposes a nonlinear optimal guidance law that enables a pursuer to enclose a target within arbitrary geometric patterns, which extends beyond conventional circular encirclement. The design operates using only relative state measurements and formulates a target enclosing guidance law in which the vehicle's lateral acceleration serves as the steering control, making it well-suited for aerial vehicles with turning constraints. Our approach generalizes and extends existing guidance strategies that are limited to target encirclement and provides a degree of optimality. At the same time, the exact information of the target's maneuver is unnecessary during the design. The guidance law is developed within the framework of a state-dependent Riccati equation (SDRE), thereby providing a systematic way to handle nonlinear dynamics through a pseudo-linear representation to design locally optimal feedback guidance commands through state-dependent weighting matrices. While SDRE ensures near-optimal performance in the absence of strong disturbances, we further augment the design to incorporate an integral sliding mode manifold to compensate when disturbances push the system away from the nominal trajectory, and demonstrate that the design provides flexibility in the sense that the (possibly time-varying) stand-off curvature could also be treated as unknown. Simulations demonstrate the efficacy of the proposed approach.
Crater Observing Bio-inspired Rolling Articulator (COBRA)
NASA aims to establish a sustainable human basecamp on the Moon as a stepping stone for future missions to Mars and beyond. The discovery of water ice on the Moon's craters located in permanently shadowed regions, which can provide drinking water, oxygen, and rocket fuel, is therefore of critical importance. However, current methods to access lunar ice deposits are limited. While rovers have been used to explore the lunar surface for decades, they face significant challenges in navigating harsh terrains, such as permanently shadowed craters, due to the high risk of immobilization. This report introduces COBRA (Crater Observing Bio-inspired Rolling Articulator), a multi-modal snake-style robot designed to overcome mobility challenges in Shackleton Crater's rugged environment. COBRA combines slithering and tumbling locomotion to adapt to various crater terrains. In snake mode, it uses sidewinding to traverse flat or low inclined surfaces, while in tumbling mode, it forms a circular barrel by linking its head and tail, enabling rapid movement with minimal energy on steep slopes. Equipped with an onboard computer, stereo camera, inertial measurement unit, and joint encoders, COBRA facilitates real-time data collection and autonomous operation. This paper highlights COBRAs robustness and efficiency in navigating extreme terrains through both simulations and experimental validation.
CU-Multi: A Dataset for Multi-Robot Collaborative Perception
A central challenge for multi-robot systems is fusing independently gathered perception data into a unified representation. Despite progress in Collaborative SLAM (C-SLAM), benchmarking remains hindered by the scarcity of dedicated multi-robot datasets. Many evaluations instead partition single-robot trajectories, a practice that may only partially reflect true multi-robot operations and, more critically, lacks standardization, leading to results that are difficult to interpret or compare across studies. While several multi-robot datasets have recently been introduced, they mostly contain short trajectories with limited inter-robot overlap and sparse intra-robot loop closures. To overcome these limitations, we introduce CU-Multi, a dataset collected over multiple days at two large outdoor sites on the University of Colorado Boulder campus. CU-Multi comprises four synchronized runs with aligned start times and controlled trajectory overlap, replicating the distinct perspectives of a robot team. It includes RGB-D sensing, RTK GPS, semantic LiDAR, and refined ground-truth odometry. By combining overlap variation with dense semantic annotations, CU-Multi provides a strong foundation for reproducible evaluation in multi-robot collaborative perception tasks.
comment: 8 pages, 11 figures
Self-evolved Imitation Learning in Simulated World
Imitation learning has been a trend recently, yet training a generalist agent across multiple tasks still requires large-scale expert demonstrations, which are costly and labor-intensive to collect. To address the challenge of limited supervision, we propose Self-Evolved Imitation Learning (SEIL), a framework that progressively improves a few-shot model through simulator interactions. The model first attempts tasksin the simulator, from which successful trajectories are collected as new demonstrations for iterative refinement. To enhance the diversity of these demonstrations, SEIL employs dual-level augmentation: (i) Model-level, using an Exponential Moving Average (EMA) model to collaborate with the primary model, and (ii) Environment-level, introducing slight variations in initial object positions. We further introduce a lightweight selector that filters complementary and informative trajectories from the generated pool to ensure demonstration quality. These curated samples enable the model to achieve competitive performance with far fewer training examples. Extensive experiments on the LIBERO benchmark show that SEIL achieves a new state-of-the-art performance in few-shot imitation learning scenarios. Code is available at https://github.com/Jasper-aaa/SEIL.git.
ROPA: Synthetic Robot Pose Generation for RGB-D Bimanual Data Augmentation
Training robust bimanual manipulation policies via imitation learning requires demonstration data with broad coverage over robot poses, contacts, and scene contexts. However, collecting diverse and precise real-world demonstrations is costly and time-consuming, which hinders scalability. Prior works have addressed this with data augmentation, typically for either eye-in-hand (wrist camera) setups with RGB inputs or for generating novel images without paired actions, leaving augmentation for eye-to-hand (third-person) RGB-D training with new action labels less explored. In this paper, we propose Synthetic Robot Pose Generation for RGB-D Bimanual Data Augmentation (ROPA), an offline imitation learning data augmentation method that fine-tunes Stable Diffusion to synthesize third-person RGB and RGB-D observations of novel robot poses. Our approach simultaneously generates corresponding joint-space action labels while employing constrained optimization to enforce physical consistency through appropriate gripper-to-object contact constraints in bimanual scenarios. We evaluate our method on 5 simulated and 3 real-world tasks. Our results across 2625 simulation trials and 300 real-world trials demonstrate that ROPA outperforms baselines and ablations, showing its potential for scalable RGB and RGB-D data augmentation in eye-to-hand bimanual manipulation. Our project website is available at: https://ropaaug.github.io/.
HUNT: High-Speed UAV Navigation and Tracking in Unstructured Environments via Instantaneous Relative Frames
Search and rescue operations require unmanned aerial vehicles to both traverse unknown unstructured environments at high speed and track targets once detected. Achieving both capabilities under degraded sensing and without global localization remains an open challenge. Recent works on relative navigation have shown robust tracking by anchoring planning and control to a visible detected object, but cannot address navigation when no target is in the field of view. We present HUNT (High-speed UAV Navigation and Tracking), a real-time framework that unifies traversal, acquisition, and tracking within a single relative formulation. HUNT defines navigation objectives directly from onboard instantaneous observables such as attitude, altitude, and velocity, enabling reactive high-speed flight during search. Once a target is detected, the same perception-control pipeline transitions seamlessly to tracking. Outdoor experiments in dense forests, container compounds, and search-and-rescue operations with vehicles and mannequins demonstrate robust autonomy where global methods fail.
RoboSeek: You Need to Interact with Your Objects
Optimizing and refining action execution through exploration and interaction is a promising way for robotic manipulation. However, practical approaches to interaction-driven robotic learning are still underexplored, particularly for long-horizon tasks where sequential decision-making, physical constraints, and perceptual uncertainties pose significant challenges. Motivated by embodied cognition theory, we propose RoboSeek, a framework for embodied action execution that leverages interactive experience to accomplish manipulation tasks. RoboSeek optimizes prior knowledge from high-level perception models through closed-loop training in simulation and achieves robust real-world execution via a real2sim2real transfer pipeline. Specifically, we first replicate real-world environments in simulation using 3D reconstruction to provide visually and physically consistent environments, then we train policies in simulation using reinforcement learning and the cross-entropy method leveraging visual priors. The learned policies are subsequently deployed on real robotic platforms for execution. RoboSeek is hardware-agnostic and is evaluated on multiple robotic platforms across eight long-horizon manipulation tasks involving sequential interactions, tool use, and object handling. Our approach achieves an average success rate of 79%, significantly outperforming baselines whose success rates remain below 50%, highlighting its generalization and robustness across tasks and platforms. Experimental results validate the effectiveness of our training framework in complex, dynamic real-world settings and demonstrate the stability of the proposed real2sim2real transfer mechanism, paving the way for more generalizable embodied robotic learning. Project Page: https://russderrick.github.io/Roboseek/
EmbodiedSplat: Personalized Real-to-Sim-to-Real Navigation with Gaussian Splats from a Mobile Device ICCV
The field of Embodied AI predominantly relies on simulation for training and evaluation, often using either fully synthetic environments that lack photorealism or high-fidelity real-world reconstructions captured with expensive hardware. As a result, sim-to-real transfer remains a major challenge. In this paper, we introduce EmbodiedSplat, a novel approach that personalizes policy training by efficiently capturing the deployment environment and fine-tuning policies within the reconstructed scenes. Our method leverages 3D Gaussian Splatting (GS) and the Habitat-Sim simulator to bridge the gap between realistic scene capture and effective training environments. Using iPhone-captured deployment scenes, we reconstruct meshes via GS, enabling training in settings that closely approximate real-world conditions. We conduct a comprehensive analysis of training strategies, pre-training datasets, and mesh reconstruction techniques, evaluating their impact on sim-to-real predictivity in real-world scenarios. Experimental results demonstrate that agents fine-tuned with EmbodiedSplat outperform both zero-shot baselines pre-trained on large-scale real-world datasets (HM3D) and synthetically generated datasets (HSSD), achieving absolute success rate improvements of 20% and 40% on real-world Image Navigation task. Moreover, our approach yields a high sim-vs-real correlation (0.87-0.97) for the reconstructed meshes, underscoring its effectiveness in adapting policies to diverse environments with minimal effort. Project page: https://gchhablani.github.io/embodied-splat.
comment: 16 pages, 18 figures, paper accepted at ICCV, 2025
L2M-Reg: Building-level Uncertainty-aware Registration of Outdoor LiDAR Point Clouds and Semantic 3D City Models SP
Accurate registration between LiDAR (Light Detection and Ranging) point clouds and semantic 3D city models is a fundamental topic in urban digital twinning and a prerequisite for downstream tasks, such as digital construction, change detection and model refinement. However, achieving accurate LiDAR-to-Model registration at individual building level remains challenging, particularly due to the generalization uncertainty in semantic 3D city models at the Level of Detail 2 (LoD2). This paper addresses this gap by proposing L2M-Reg, a plane-based fine registration method that explicitly accounts for model uncertainty. L2M-Reg consists of three key steps: establishing reliable plane correspondence, building a pseudo-plane-constrained Gauss-Helmert model, and adaptively estimating vertical translation. Experiments on three real-world datasets demonstrate that L2M-Reg is both more accurate and computationally efficient than existing ICP-based and plane-based methods. Overall, L2M-Reg provides a novel building-level solution regarding LiDAR-to-Model registration when model uncertainty is present.
comment: Submitted to the ISPRS Journal of Photogrammetry and Remote Sensing
HDMI: Learning Interactive Humanoid Whole-Body Control from Human Videos
Enabling robust whole-body humanoid-object interaction (HOI) remains challenging due to motion data scarcity and the contact-rich nature. We present HDMI (HumanoiD iMitation for Interaction), a simple and general framework that learns whole-body humanoid-object interaction skills directly from monocular RGB videos. Our pipeline (i) extracts and retargets human and object trajectories from unconstrained videos to build structured motion datasets, (ii) trains a reinforcement learning (RL) policy to co-track robot and object states with three key designs: a unified object representation, a residual action space, and a general interaction reward, and (iii) zero-shot deploys the RL policies on real humanoid robots. Extensive sim-to-real experiments on a Unitree G1 humanoid demonstrate the robustness and generality of our approach: HDMI achieves 67 consecutive door traversals and successfully performs 6 distinct loco-manipulation tasks in the real world and 14 tasks in simulation. Our results establish HDMI as a simple and general framework for acquiring interactive humanoid skills from human videos.
comment: website: hdmi-humanoid.github.io
Occlusion-Aware Consistent Model Predictive Control for Robot Navigation in Occluded Obstacle-Dense Environments
Ensuring safety and motion consistency for robot navigation in occluded, obstacle-dense environments is a critical challenge. In this context, this study presents an occlusion-aware Consistent Model Predictive Control (CMPC) strategy. To account for the occluded obstacles, it incorporates adjustable risk regions that represent their potential future locations. Subsequently, dynamic risk boundary constraints are developed online to ensure safety. The CMPC then constructs multiple locally optimal trajectory branches (each tailored to different risk regions) to strike a balance between safety and performance. A shared consensus segment is generated to ensure smooth transitions between branches without significant velocity fluctuations, further preserving motion consistency. To facilitate high computational efficiency and ensure coordination across local trajectories, we use the alternating direction method of multipliers (ADMM) to decompose the CMPC into manageable sub-problems for parallel solving. The proposed strategy is validated through simulations and real-world experiments on an Ackermann-steering robot platform. The results demonstrate the effectiveness of the proposed CMPC strategy through comparisons with baseline approaches in occluded, obstacle-dense environments.
Operator Splitting Covariance Steering for Safe Stochastic Nonlinear Control
This paper presents a novel algorithm for solving distribution steering problems featuring nonlinear dynamics and chance constraints. Covariance steering (CS) is an emerging methodology in stochastic optimal control that poses constraints on the first two moments of the state distribution -- thereby being more tractable than full distributional control. Nevertheless, a significant limitation of current approaches for solving nonlinear CS problems, such as sequential convex programming (SCP), is that they often generate infeasible or poor results due to the large number of constraints. In this paper, we address these challenges, by proposing an operator splitting CS approach that temporarily decouples the full problem into subproblems that can be solved in parallel. This relaxation does not require intermediate iterates to satisfy all constraints simultaneously prior to convergence, which enhances exploration and improves feasibility in such non-convex settings. Simulation results across a variety of robotics applications verify the ability of the proposed method to find better solutions even under stricter safety constraints than standard SCP. Finally, the applicability of our framework on real systems is also confirmed through hardware demonstrations
Embodied Arena: A Comprehensive, Unified, and Evolving Evaluation Platform for Embodied AI
Embodied AI development significantly lags behind large foundation models due to three critical challenges: (1) lack of systematic understanding of core capabilities needed for Embodied AI, making research lack clear objectives; (2) absence of unified and standardized evaluation systems, rendering cross-benchmark evaluation infeasible; and (3) underdeveloped automated and scalable acquisition methods for embodied data, creating critical bottlenecks for model scaling. To address these obstacles, we present Embodied Arena, a comprehensive, unified, and evolving evaluation platform for Embodied AI. Our platform establishes a systematic embodied capability taxonomy spanning three levels (perception, reasoning, task execution), seven core capabilities, and 25 fine-grained dimensions, enabling unified evaluation with systematic research objectives. We introduce a standardized evaluation system built upon unified infrastructure supporting flexible integration of 22 diverse benchmarks across three domains (2D/3D Embodied Q&A, Navigation, Task Planning) and 30+ advanced models from 20+ worldwide institutes. Additionally, we develop a novel LLM-driven automated generation pipeline ensuring scalable embodied evaluation data with continuous evolution for diversity and comprehensiveness. Embodied Arena publishes three real-time leaderboards (Embodied Q&A, Navigation, Task Planning) with dual perspectives (benchmark view and capability view), providing comprehensive overviews of advanced model capabilities. Especially, we present nine findings summarized from the evaluation results on the leaderboards of Embodied Arena. This helps to establish clear research veins and pinpoint critical research problems, thereby driving forward progress in the field of Embodied AI.
comment: 32 pages, 5 figures, Embodied Arena Technical Report
Stratified Topological Autonomy for Long-Range Coordination (STALC)
In this paper, we present Stratified Topological Autonomy for Long-Range Coordination (STALC), a hierarchical planning approach for coordinated multi-robot maneuvering in real-world environments with significant inter-robot spatial and temporal dependencies. At its core, STALC consists of a multi-robot graph-based planner which combines a topological graph with a novel, computationally efficient mixed-integer programming formulation to generate highly-coupled multi-robot plans in seconds. To enable autonomous planning across different spatial and temporal scales, we construct our graphs so that they capture connectivity between free-space regions and other problem-specific features, such as traversability or risk. We then use receding-horizon planners to achieve local collision avoidance and formation control. To evaluate our approach, we consider a multi-robot reconnaissance scenario where robots must autonomously coordinate to navigate through an environment while minimizing the risk of detection by observers. Through simulation-based experiments, we show that our approach is able to scale to address complex multi-robot planning scenarios. Through hardware experiments, we demonstrate our ability to generate graphs from real-world data and successfully plan across the entire hierarchy to achieve shared objectives.
comment: This work has been submitted to the IEEE for possible publication
Low-pass sampling in Model Predictive Path Integral Control
Model Predictive Path Integral (MPPI) control is a widely used sampling-based approach for real-time control, offering flexibility in handling arbitrary dynamics and cost functions. However, the original MPPI suffers from high-frequency noise in the sampled control trajectories, leading to actuator wear and inefficient exploration. In this work, we introduce Low-Pass Model Predictive Path Integral Control (LP-MPPI), which integrates low-pass filtering into the sampling process to eliminate detrimental high-frequency components and improve the effectiveness of the control trajectories exploration. Unlike prior approaches, LP-MPPI provides direct and interpretable control over the frequency spectrum of sampled trajectories, enhancing sampling efficiency and control smoothness. Through extensive evaluations in Gymnasium environments, simulated quadruped locomotion, and real-world F1TENTH autonomous racing, we demonstrate that LP-MPPI consistently outperforms state-of-the-art MPPI variants, achieving significant performance improvements while reducing control signal chattering.
V2V-GoT: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multimodal Large Language Models and Graph-of-Thoughts
Current state-of-the-art autonomous vehicles could face safety-critical situations when their local sensors are occluded by large nearby objects on the road. Vehicle-to-vehicle (V2V) cooperative autonomous driving has been proposed as a means of addressing this problem, and one recently introduced framework for cooperative autonomous driving has further adopted an approach that incorporates a Multimodal Large Language Model (MLLM) to integrate cooperative perception and planning processes. However, despite the potential benefit of applying graph-of-thoughts reasoning to the MLLM, this idea has not been considered by previous cooperative autonomous driving research. In this paper, we propose a novel graph-of-thoughts framework specifically designed for MLLM-based cooperative autonomous driving. Our graph-of-thoughts includes our proposed novel ideas of occlusion-aware perception and planning-aware prediction. We curate the V2V-GoT-QA dataset and develop the V2V-GoT model for training and testing the cooperative driving graph-of-thoughts. Our experimental results show that our method outperforms other baselines in cooperative perception, prediction, and planning tasks. Our project website: https://eddyhkchiu.github.io/v2vgot.github.io/ .
comment: Our project website: https://eddyhkchiu.github.io/v2vgot.github.io/
Socially Pertinent Robots in Gerontological Healthcare
Despite the many recent achievements in developing and deploying social robotics, there are still many underexplored environments and applications for which systematic evaluation of such systems by end-users is necessary. While several robotic platforms have been used in gerontological healthcare, the question of whether or not a social interactive robot with multi-modal conversational capabilities will be useful and accepted in real-life facilities is yet to be answered. This paper is an attempt to partially answer this question, via two waves of experiments with patients and companions in a day-care gerontological facility in Paris with a full-sized humanoid robot endowed with social and conversational interaction capabilities. The software architecture, developed during the H2020 SPRING project, together with the experimental protocol, allowed us to evaluate the acceptability (AES) and usability (SUS) with more than 60 end-users. Overall, the users are receptive to this technology, especially when the robot perception and action skills are robust to environmental clutter and flexible to handle a plethora of different interactions.
Ratatouille: Imitation Learning Ingredients for Real-world Social Robot Navigation
Scaling Reinforcement Learning to in-the-wild social robot navigation is both data-intensive and unsafe, since policies must learn through direct interaction and inevitably encounter collisions. Offline Imitation learning (IL) avoids these risks by collecting expert demonstrations safely, training entirely offline, and deploying policies zero-shot. However, we find that naively applying Behaviour Cloning (BC) to social navigation is insufficient; achieving strong performance requires careful architectural and training choices. We present Ratatouille, a pipeline and model architecture that, without changing the data, reduces collisions per meter by 6 times and improves success rate by 3 times compared to naive BC. We validate our approach in both simulation and the real world, where we collected over 11 hours of data on a dense university campus. We further demonstrate qualitative results in a public food court. Our findings highlight that thoughtful IL design, rather than additional data, can substantially improve safety and reliability in real-world social navigation. Video: https://youtu.be/tOdLTXsaYLQ. Code will be released after acceptance.
comment: 8 pages
Dynamic Mixture of Progressive Parameter-Efficient Expert Library for Lifelong Robot Learning
A generalist agent must continuously learn and adapt throughout its lifetime, achieving efficient forward transfer while minimizing catastrophic forgetting. Previous work within the dominant pretrain-then-finetune paradigm has explored parameter-efficient fine-tuning for single-task adaptation, effectively steering a frozen pretrained model with a small number of parameters. However, in the context of lifelong learning, these methods rely on the impractical assumption of a test-time task identifier and restrict knowledge sharing among isolated adapters. To address these limitations, we propose Dynamic Mixture of Progressive Parameter-Efficient Expert Library (DMPEL) for lifelong robot learning. DMPEL progressively builds a low-rank expert library and employs a lightweight router to dynamically combine experts into an end-to-end policy, enabling flexible and efficient lifelong forward transfer. Furthermore, by leveraging the modular structure of the fine-tuned parameters, we introduce expert coefficient replay, which guides the router to accurately retrieve frozen experts for previously encountered tasks. This technique mitigates forgetting while being significantly more storage- and computation-efficient than experience replay over the entire policy. Extensive experiments on the lifelong robot learning benchmark LIBERO demonstrate that our framework outperforms state-of-the-art lifelong learning methods in success rates during continual adaptation, while utilizing minimal trainable parameters and storage.
Learning coordinated badminton skills for legged manipulators
Coordinating the motion between lower and upper limbs and aligning limb control with perception are substantial challenges in robotics, particularly in dynamic environments. To this end, we introduce an approach for enabling legged mobile manipulators to play badminton, a task that requires precise coordination of perception, locomotion, and arm swinging. We propose a unified reinforcement learning-based control policy for whole-body visuomotor skills involving all degrees of freedom to achieve effective shuttlecock tracking and striking. This policy is informed by a perception noise model that utilizes real-world camera data, allowing for consistent perception error levels between simulation and deployment and encouraging learned active perception behaviors. Our method includes a shuttlecock prediction model, constrained reinforcement learning for robust motion control, and integrated system identification techniques to enhance deployment readiness. Extensive experimental results in a variety of environments validate the robot's capability to predict shuttlecock trajectories, navigate the service area effectively, and execute precise strikes against human players, demonstrating the feasibility of using legged mobile manipulators in complex and dynamic sports scenarios.
comment: Science Robotics DOI: 10.1126/scirobotics.adu3922
Enhancing Video-Based Robot Failure Detection Using Task Knowledge
Robust robotic task execution hinges on the reliable detection of execution failures in order to trigger safe operation modes, recovery strategies, or task replanning. However, many failure detection methods struggle to provide meaningful performance when applied to a variety of real-world scenarios. In this paper, we propose a video-based failure detection approach that uses spatio-temporal knowledge in the form of the actions the robot performs and task-relevant objects within the field of view. Both pieces of information are available in most robotic scenarios and can thus be readily obtained. We demonstrate the effectiveness of our approach on three datasets that we amend, in part, with additional annotations of the aforementioned task-relevant knowledge. In light of the results, we also propose a data augmentation method that improves performance by applying variable frame rates to different parts of the video. We observe an improvement from 77.9 to 80.0 in F1 score on the ARMBench dataset without additional computational expense and an additional increase to 81.4 with test-time augmentation. The results emphasize the importance of spatio-temporal information during failure detection and suggest further investigation of suitable heuristics in future implementations. Code and annotations are available.
comment: Accepted at ECMR 2025
Constrained Style Learning from Imperfect Demonstrations under Task Optimality
Learning from demonstration has proven effective in robotics for acquiring natural behaviors, such as stylistic motions and lifelike agility, particularly when explicitly defining style-oriented reward functions is challenging. Synthesizing stylistic motions for real-world tasks usually requires balancing task performance and imitation quality. Existing methods generally depend on expert demonstrations closely aligned with task objectives. However, practical demonstrations are often incomplete or unrealistic, causing current methods to boost style at the expense of task performance. To address this issue, we propose formulating the problem as a constrained Markov Decision Process (CMDP). Specifically, we optimize a style-imitation objective with constraints to maintain near-optimal task performance. We introduce an adaptively adjustable Lagrangian multiplier to guide the agent to imitate demonstrations selectively, capturing stylistic nuances without compromising task performance. We validate our approach across multiple robotic platforms and tasks, demonstrating both robust task performance and high-fidelity style learning. On ANYmal-D hardware we show a 14.5% drop in mechanical energy and a more agile gait pattern, showcasing real-world benefits.
comment: This paper has been accepted to CoRL 2025
RAVE: End-to-end Hierarchical Visual Localization with Rasterized and Vectorized HD map
Accurate localization serves as an important component in autonomous driving systems. Traditional rule-based localization involves many standalone modules, which is theoretically fragile and requires costly hyperparameter tuning, therefore sacrificing the accuracy and generalization. In this paper, we propose an end-to-end visual localization system, RAVE, in which the surrounding images are associated with the HD map data to estimate pose. To ensure high-quality observations for localization, a low-rank flow-based prior fusion module (FLORA) is developed to incorporate misaligned map prior into the perceived BEV features. Pursuing a balance among efficiency, interpretability, and accuracy, a hierarchical localization module is proposed, which efficiently estimates poses through a decoupled BEV neural matching-based pose solver (DEMA) using rasterized HD map, and then refines the estimation through a Transformer-based pose regressor (POET) using vectorized HD map. The experimental results demonstrate that our method can perform robust and accurate localization under varying environmental conditions while running efficiently.
comment: 16 pages, 10 figures, 6 tables
EMMA: End-to-End Multimodal Model for Autonomous Driving
We introduce EMMA, an End-to-end Multimodal Model for Autonomous driving. Built upon a multi-modal large language model foundation like Gemini, EMMA directly maps raw camera sensor data into various driving-specific outputs, including planner trajectories, perception objects, and road graph elements. EMMA maximizes the utility of world knowledge from the pre-trained large language models, by representing all non-sensor inputs (e.g. navigation instructions and ego vehicle status) and outputs (e.g. trajectories and 3D locations) as natural language text. This approach allows EMMA to jointly process various driving tasks in a unified language space, and generate the outputs for each task using task-specific prompts. Empirically, we demonstrate EMMA's effectiveness by achieving state-of-the-art performance in motion planning on nuScenes as well as competitive results on the Waymo Open Motion Dataset (WOMD). EMMA also yields competitive results for camera-primary 3D object detection on the Waymo Open Dataset (WOD). We show that co-training EMMA with planner trajectories, object detection, and road graph tasks yields improvements across all three domains, highlighting EMMA's potential as a generalist model for autonomous driving applications. We hope that our results will inspire research to further evolve the state of the art in autonomous driving model architectures.
comment: Accepted by TMLR. Blog post: https://waymo.com/blog/2024/10/introducing-emma/
Active Shadowing (ASD): Manipulating Perception of Robotic Behaviors via Implicit Virtual Communication
Explicit communication is often valued for its directness in presenting information but requires attention during exchange, resulting in cognitive interruptions. On the other hand, implicit communication contributes to tacit and smooth interaction, making it more suitable for teaming, but requires inference for interpretation. This paper studies a novel type of implicit visual communication (IVC) using shadows via visual projection with augmented reality, referred to as active shadowing (ASD). Prior IVC methods, such as legible motion, are often used to influence the perception of robot behavior to make it more understandable. They often require changing the physical robot behavior, resulting in suboptimality. In our work, we investigate how ASD can be used to achieve similar effects without losing optimality. Our evaluations with user studies demonstrates that ASD can effectively creates ''illusions'' that maintain optimal physical behavior without compromising its understandability. We also show that ASD can be more informative than other explicit communication methods, and examine the conditions under which ASD becomes less effective.
Learning to Drive by Imitating Surrounding Vehicles
Imitation learning is a promising approach for training autonomous vehicles (AV) to navigate complex traffic environments by mimicking expert driver behaviors. While existing imitation learning frameworks focus on leveraging expert demonstrations, they often overlook the potential of additional complex driving data from surrounding traffic participants. In this paper, we study a data augmentation strategy that leverages the observed trajectories of nearby vehicles, captured by the AV's sensors, as additional demonstrations. We introduce a simple vehicle-selection sampling and filtering strategy that prioritizes informative and diverse driving behaviors, contributing to a richer dataset for training. We evaluate this idea with a representative learning-based planner on a large real-world dataset and demonstrate improved performance in complex driving scenarios. Specifically, the approach reduces collision rates and improves safety metrics compared to the baseline. Notably, even when using only 10 percent of the original dataset, the method matches or exceeds the performance of the full dataset. Through ablations, we analyze selection criteria and show that naive random selection can degrade performance. Our findings highlight the value of leveraging diverse real-world trajectory data in imitation learning and provide insights into data augmentation strategies for autonomous driving.
CUPID: Curating Data your Robot Loves with Influence Functions
In robot imitation learning, policy performance is tightly coupled with the quality and composition of the demonstration data. Yet, developing a precise understanding of how individual demonstrations contribute to downstream outcomes - such as closed-loop task success or failure - remains a persistent challenge. We propose CUPID, a robot data curation method based on a novel influence function-theoretic formulation for imitation learning policies. Given a set of evaluation rollouts, CUPID estimates the influence of each training demonstration on the policy's expected return. This enables ranking and selection of demonstrations according to their impact on the policy's closed-loop performance. We use CUPID to curate data by 1) filtering out training demonstrations that harm policy performance and 2) subselecting newly collected trajectories that will most improve the policy. Extensive simulated and hardware experiments show that our approach consistently identifies which data drives test-time performance. For example, training with less than 33% of curated data can yield state-of-the-art diffusion policies on the simulated RoboMimic benchmark, with similar gains observed in hardware. Furthermore, hardware experiments show that our method can identify robust strategies under distribution shift, isolate spurious correlations, and even enhance the post-training of generalist robot policies. Videos and code are made available at: https://cupid-curation.github.io.
comment: Project page: https://cupid-curation.github.io. 27 pages, 15 figures. Accepted to the Conference on Robot Learning (CoRL) 2025
Uncertainty-aware Latent Safety Filters for Avoiding Out-of-Distribution Failures
Recent advances in generative world models have enabled classical safe control methods, such as Hamilton-Jacobi (HJ) reachability, to generalize to complex robotic systems operating directly from high-dimensional sensor observations. However, obtaining comprehensive coverage of all safety-critical scenarios during world model training is extremely challenging. As a result, latent safety filters built on top of these models may miss novel hazards and even fail to prevent known ones, overconfidently misclassifying risky out-of-distribution (OOD) situations as safe. To address this, we introduce an uncertainty-aware latent safety filter that proactively steers robots away from both known and unseen failures. Our key idea is to use the world model's epistemic uncertainty as a proxy for identifying unseen potential hazards. We propose a principled method to detect OOD world model predictions by calibrating an uncertainty threshold via conformal prediction. By performing reachability analysis in an augmented state space-spanning both the latent representation and the epistemic uncertainty-we synthesize a latent safety filter that can reliably safeguard arbitrary policies from both known and unseen safety hazards. In simulation and hardware experiments on vision-based control tasks with a Franka manipulator, we show that our uncertainty-aware safety filter preemptively detects potential unsafe scenarios and reliably proposes safe, in-distribution actions. Video results can be found on the project website at https://cmu-intentlab.github.io/UNISafe
comment: Conference on Robot Learning (CoRL 2025)
Systems and Control (CS)
Policy Gradient Bounds in Multitask LQR
We analyze the performance of policy gradient in multitask linear quadratic regulation (LQR), where the system and cost parameters differ across tasks. The main goal of multitask LQR is to find a controller with satisfactory performance on every task. Prior analyses on relevant contexts fail to capture closed-loop task similarities, resulting in conservative performance guarantees. To account for such similarities, we propose bisimulation-based measures of task heterogeneity. Our measures employ new bisimulation functions to bound the cost gradient distance between a pair of tasks in closed loop with a common stabilizing controller. Employing these measures, we derive suboptimality bounds for both the multitask optimal controller and the asymptotic policy gradient controller with respect to each of the tasks. We further provide conditions under which the policy gradient iterates remain stabilizing for every system. For multiple random sets of certain tasks, we observe that our bisimulation-based measures improve upon baseline measures of task heterogeneity dramatically.
Proactive-reactive detection and mitigation of intermittent faults in robot swarms
Intermittent faults are transient errors that sporadically appear and disappear. Although intermittent faults pose substantial challenges to reliability and coordination, existing studies of fault tolerance in robot swarms focus instead on permanent faults. One reason for this is that intermittent faults are prohibitively difficult to detect in the fully self-organized ad-hoc networks typical of robot swarms, as their network topologies are transient and often unpredictable. However, in the recently introduced self-organizing nervous systems (SoNS) approach, robot swarms are able to self-organize persistent network structures for the first time, easing the problem of detecting intermittent faults. To address intermittent faults in robot swarms that have persistent networks, we propose a novel proactive-reactive strategy to detection and mitigation, based on self-organized backup layers and distributed consensus in a multiplex network. Proactively, the robots self-organize dynamic backup paths before faults occur, adapting to changes in the primary network topology and the robots' relative positions. Reactively, robots use one-shot likelihood ratio tests to compare information received along different paths in the multiplex network, enabling early fault detection. Upon detection, communication is temporarily rerouted in a self-organized way, until the detected fault resolves. We validate the approach in representative scenarios of faulty positional data occurring during formation control, demonstrating that intermittent faults are prevented from disrupting convergence to desired formations, with high fault detection accuracy and low rates of false positives.
Watts and Drops: Co-Scheduling Power and Water in Desalination Plants
We develop a mathematical framework to jointly schedule water and electricity in a profit-maximizing renewable colocated water desalination plant that integrates both thermal and membrane based technologies. The price-taking desalination plant sells desalinated water to a water utility at a given price and engages in bidirectional electricity transactions with the grid, purchasing or selling power based on its net electricity demand. We show that the optimal scheduling policy depends on the plant's internal renewable generation and follows a simple threshold structure. Under the optimal policy, thermal based water output decreases monotonically with renewable output, while membrane based water output increases monotonically. We characterize the structure and intuition behind the threshold policy and examine key special properties.
comment: 5 pages, 6 figures. To appear in Proceedings of the 61st Allerton Conference on Communication, Control, and Computing
Robust Synchronous Reference Frame Phase-Looked Loop (PLL) with Feed-Forward Frequency Estimation
Synchronous reference frame phase-looked loop (SRF-PLL) techniques are widely used for interfacing and control applications in the power systems and energy conversion at large. Since a PLL system synchronizes its output with an exogenous harmonic signal, often 3-phases voltage or current, the locking of the frequency and phase angle depends on the performance of the feedback loop with at least two integrator terms, and on the distortions of the measured input quantities. For the conventional SRF-PLL with a proportional-integral (PI) control in feedback, we are providing a robust design which maximizes the phase margin and uses the normalization scheme for yielding the loop insensitive to the input amplitude variations. The main improvement in the transient behavior and also in tracking of frequency ramps is achieved by using the robust feed-forward frequency estimator, which is model-free and suitable for the noisy and time-varying harmonic signals. The proposed feed-forward-feedback SRF-PLL scheme is experimentally evaluated on the 3-phases harmonic currents from standard PMSM drives with varying angular speeds and loads. Both, the tracked angular frequency and locked phase angle are assessed as performance metrics of the robust SRF-PLL scheme with feedforwarding.
comment: 8 pages, 9 figures
A Fast Initialization Method for Neural Network Controllers: A Case Study of Image-based Visual Servoing Control for the multicopter Interception
Reinforcement learning-based controller design methods often require substantial data in the initial training phase. Moreover, the training process tends to exhibit strong randomness and slow convergence. It often requires considerable time or high computational resources. Another class of learning-based method incorporates Lyapunov stability theory to obtain a control policy with stability guarantees. However, these methods generally require an initially stable neural network control policy at the beginning of training. Evidently, a stable neural network controller can not only serve as an initial policy for reinforcement learning, allowing the training to focus on improving controller performance, but also act as an initial state for learning-based Lyapunov control methods. Although stable controllers can be designed using traditional control theory, designers still need to have a great deal of control design knowledge to address increasingly complicated control problems. The proposed neural network rapid initialization method in this paper achieves the initial training of the neural network control policy by constructing datasets that conform to the stability conditions based on the system model. Furthermore, using the image-based visual servoing control for multicopter interception as a case study, simulations and experiments were conducted to validate the effectiveness and practical performance of the proposed method. In the experiment, the trained control policy attains a final interception velocity of 15 m/s.
AI-Enabled Smart Hygiene System for Real-Time Glucose Detection
This research presents a smart urinary health monitoring system incorporating a coplanar waveguide (CPW)-fed slot-loop antenna biosensor designed to analyse various urine samples. The antenna demonstrates distinct resonant frequency shifts when exposed to five specific urine conditions, deviating from its baseline 1.42 GHz operation. These measurable frequency variations enable the antenna to function as an effective microwave sensor for urinary biomarker detection. A potential artificial intelligence-based Convolutional Neural Networks Long Short-Term Memory (CNN-LSTM) framework is also discussed to overcome the limitations of overlapping frequency responses, aiming to improve the accuracy of health condition detection. These components contribute to the development of a smart toilet system that displays real-time health information on a wall-mounted urinal screen, without requiring any user effort or behavioural change.
MAPPO for Edge Server Monitoring
In this paper, we consider a goal-oriented communication problem for edge server monitoring, where jobs arrive intermittently at multiple dispatchers and must be assigned to shared edge servers with finite queues and time-varying availability. Accurate knowledge of server status is critical for sustaining high throughput, yet remains challenging under dynamic workloads and partial observability. To address this challenge, each dispatcher maintains server knowledge through two complementary mechanisms: (i) active status queries that provide instantaneous updates at a communication cost, and (ii) job execution feedback that reveals server conditions opportunistically. We formulate a cooperative multi-agent distributed decision-making problem in which dispatchers jointly optimize query scheduling to balance throughput against communication overhead. To solve this problem, we propose a Multi-Agent Proximal Policy Optimization (MAPPO)-based algorithm that leverages centralized training with decentralized execution (CTDE) to learn distributed query-and-dispatch policies under partial and stale observations. Numerical evaluations show that MAPPO achieves superior throughput-cost tradeoffs and significantly outperforms baseline strategies, achieving on average a 30% improvement over the closest baseline.
comment: 6 pages, 4 figures. Accepted to IEEE MILCOM 2025
A Weighted Least Squares Error Hetero-functional Graph State Estimator of the American Multi-modal Energy System
As one of the most pressing challenges of the 21st century, global climate change demands a host of changes across at least four critical energy infrastructures: the electric grid, the natural gas system, the oil system, and the coal system. In the context of the United States, this paper refers to this system-of-systems as ``The American Multi-Modal Energy System (AMES)". These combined changes necessitate an understanding of the AMES interdependencies, both structurally and behaviorally, to develop and enact effective policies. This work focuses on behavioral analysis methods to provide examples of how to analyze system behavior and the critical matter and energy flows through the system. Building upon past works, two regions of the AMES are modeled, and their behavior is analyzed using Hetero-functional Graph Theory (HFGT). More specifically, the work presents a weighted least square error state estimation model of the AMES. State estimation has played a major role in the operation and development of the American Electric Power System. This work extends the state estimation analysis beyond the single-operand electric grid environment into the heterogeneous environment of the AMES. Employing a data-driven and model-based systems engineering approach in combination with HFGT, a Weighted Least Squares Error Hetero-functional Graph State Estimation (WLSEHFGSE) optimization program is developed to estimate the optimal flows of mass and energy through the AMES. This work is the first to integrate state estimation methods with HFGT. Furthermore, it demonstrates how such a WLSEHFGSE recovers the mass and energy flows in a system-of-systems like the AMES with asset-level granularity.
comment: 27 pages, 3 Tables, 11 Figures
Adaptive Override Control under High-Relative-Degree Nonovershooting Constraints
This paper considers the problem of adaptively overriding unsafe actions of a nominal controller in the presence of high-relative-degree nonovershooting constraints and parametric uncertainties. To prevent the design from being coupled with high-order derivatives of the parameter estimation error, we adopt a modular design approach in which the controller and the parameter identifier are designed separately. The controller module ensures that any safety violations caused by parametric uncertainties remain bounded, provided that the parameter estimation error and its first-order derivative are either bounded or square-integrable. The identifier module, in turn, guarantees that these requirements on the parameter estimation error are satisfied. Both theoretical analysis and simulation results demonstrate that the closed-loop safety violation is bounded by a tunable function of the initial estimation error. Moreover, as time increases, the parameter estimate converges to the true value, and the amount of safety violation decreases accordingly.
Frequency-Varying Optimization: A Control Framework for New Dynamic Frequency Response Services
To address the variability of renewable generation, initiatives have been launched globally to provide faster and more effective frequency responses. In the UK, the National Energy System Operator (NESO) has introduced a suite of three new dynamic services, where aggregation of assets is expected to play a key role. For an Aggregated Response Unit (ARU), the required level of frequency response varies with grid frequency, resulting in a frequency-varying equality constraint that assets should meet collectively. We show that the optimal coordination of an ARU constitutes a Frequency-Varying Optimization (FVO) problem, in which the optimal trajectory for each asset evolves dynamically. To facilitate online optimization, we reformulate the FVO problem into Tracking of the Optimal Trajectory (TOT) problems, with algorithms proposed for two scenarios: one where the asset dynamics are negligible, and another where they must be accounted for. Under reasonable conditions, the ARU converges to the optimal trajectory within a fixed time, and within the maximum delivery time requested by NESO. The proposed framework can be readily distributed to coordinate a large number of assets. Numerical results verify the effectiveness and scalability of the proposed control framework.
On the Boundary of the Robust Admissible Set in State and Input Constrained Nonlinear Systems
In this paper, we consider nonlinear control systems subject to bounded disturbances and to both state and input constraints. We introduce the definition of robust admissible set - the set of all initial states from which the state and input constraints can be satisfied for all times against all admissible disturbances. We focus on its boundary that can be decomposed into the usable part on the state constraint boundary and the barrier, interior to the state constraints. We show that, at the intersection of these two components, the boundary of the admissible set must be tangent to the state constraints and separate the interior of the robust admissible set and its complement. Moreover, we prove that the barrier must satisfy a saddle-point principle on a Hamiltonian, in the spirit of Pontryagin's maximum principle, thus providing a direct computational tool. Lastly, we illustrate our results by calculating the robust admissible set for an adaptive cruise control example.
comment: 18 pages, 4 figures
Integration of Concentrated Solar Power Plants in Renewable-Only VPP with Electrical and Thermal Demands: A Two-Stage Robust Bidding Approach
This paper proposes the integration of Concentrated Solar Power Plant (CSP) in the Renewable-only virtual power plant (RVPP) for bidding in the electricity day-ahead and secondary reserve markets, as well as trading thermal energy through a heat purchase agreement. A reformulated two-stage robust optimization approach is introduced to account for multiple uncertainties, including electricity prices, non-dispatchable renewable energy sources electrical production, CSP thermal production, and uncertainties in electrical and thermal demand consumption. The provision of energy and reserve by the thermal storage of CSP is modeled using an adjustable approach, which allocates a share of energy for up and down reserves based on the profitability of the RVPP. Simulations are conducted for several case studies to demonstrate the effectiveness and computational efficiency of the proposed approach under different RVPP operator decisions against uncertain parameters and various trading strategies for electricity and thermal energy. The simulation results show that integrating CSP into RVPP enhances RVPP flexibility for both electrical and thermal trading. Furthermore, the results indicate that the profitability of the RVPP increases when all trading options are considered, across different levels of conservatism adopted by the RVPP operator in response to uncertain parameters.
Guaranteed Robust Nonlinear MPC via Disturbance Feedback
Robots must satisfy safety-critical state and input constraints despite disturbances and model mismatch. We introduce a robust model predictive control (RMPC) formulation that is fast, scalable, and compatible with real-time implementation. Our formulation guarantees robust constraint satisfaction, input-to-state stability (ISS) and recursive feasibility. The key idea is to decompose the uncertain nonlinear system into (i) a nominal nonlinear dynamic model, (ii) disturbance-feedback controllers, and (iii) bounds on the model error. These components are optimized jointly using sequential convex programming. The resulting convex subproblems are solved efficiently using a recent disturbance-feedback MPC solver. The approach is validated across multiple dynamics, including a rocket-landing problem with steerable thrust. An open-source implementation is available at https://github.com/antoineleeman/robust-nonlinear-mpc.
comment: Code: https://github.com/antoineleeman/robust-nonlinear-mpc
An Extended Kalman Filter for Systems with Infinite-Dimensional Measurements
This article examines state estimation in discrete-time nonlinear stochastic systems with finite-dimensional states and infinite-dimensional measurements, motivated by real-world applications such as vision-based localization and tracking. We develop an extended Kalman filter (EKF) for real-time state estimation, with the measurement noise modeled as an infinite-dimensional random field. When applied to vision-based state estimation, the measurement Jacobians required to implement the EKF are shown to correspond to image gradients. This result provides a novel system-theoretic justification for the use of image gradients as features for vision-based state estimation, contrasting with their (often heuristic) introduction in many computer-vision pipelines. We demonstrate the practical utility of the EKF on a public real-world dataset involving the localization of an aerial drone using video from a downward-facing monocular camera. The EKF is shown to outperform VINS-MONO, an established visual-inertial odometry algorithm, in some cases achieving mean squared error reductions of up to an order of magnitude.
comment: 8 pages
Dual Iterative Learning Control for Multiple-Input Multiple-Output Dynamics with Validation in Robotic Systems
Solving motion tasks autonomously and accurately is a core ability for intelligent real-world systems. To achieve genuine autonomy across multiple systems and tasks, key challenges include coping with unknown dynamics and overcoming the need for manual parameter tuning, which is especially crucial in complex Multiple-Input Multiple-Output (MIMO) systems. This paper presents MIMO Dual Iterative Learning Control (DILC), a novel data-driven iterative learning scheme for simultaneous tracking control and model learning, without requiring any prior system knowledge or manual parameter tuning. The method is designed for repetitive MIMO systems and integrates seamlessly with established iterative learning control methods. We provide monotonic convergence conditions for both reference tracking error and model error in linear time-invariant systems. The DILC scheme -- rapidly and autonomously -- solves various motion tasks in high-fidelity simulations of an industrial robot and in multiple nonlinear real-world MIMO systems, without requiring model knowledge or manually tuning the algorithm. In our experiments, many reference tracking tasks are solved within 10-20 trials, and even complex motions are learned in less than 100 iterations. We believe that, because of its rapid and autonomous learning capabilities, DILC has the potential to serve as an efficient building block within complex learning frameworks for intelligent real-world systems.
comment: 11 pages, 4 figures
Verification and Synthesis of Discrete-Time Control Barrier Functions
Discrete-time Control Barrier Functions (DTCBFs) have recently attracted interest for guaranteeing safety and synthesizing safe controllers for discrete-time dynamical systems. This paper addresses the open challenges of verifying candidate DTCBFs and synthesizing DTCBFs for general nonlinear discrete-time systems with input constraints and arbitrary safe sets. In particular, we propose a branch-and-bound method, inspired by the $\alpha$BB algorithm, for the verification of candidate DTCBFs in both cases, whether a corresponding control policy is known or unknown. We prove that this method, in a finite number of iterations, either verifies a given candidate function as a valid DTCBF or falsifies it by providing a counterexample (within predefined tolerances). As a second main contribution, we propose a novel bilevel optimization approach to synthesize a DTCBF and a corresponding control policy in finite time. This involves determining the unknown coefficients of a parameterized DTCBF and a parameterized control policy. Furthermore, we introduce various strategies to reduce the computational burden of the bilevel approach. We also demonstrate our methods using numerical case studies.
comment: To appear in IEEE Transactions on Automatic Control
3D Flow Diffusion Policy: Visuomotor Policy Learning via Generating Flow in 3D Space
Learning robust visuomotor policies that generalize across diverse objects and interaction dynamics remains a central challenge in robotic manipulation. Most existing approaches rely on direct observation-to-action mappings or compress perceptual inputs into global or object-centric features, which often overlook localized motion cues critical for precise and contact-rich manipulation. We present 3D Flow Diffusion Policy (3D FDP), a novel framework that leverages scene-level 3D flow as a structured intermediate representation to capture fine-grained local motion cues. Our approach predicts the temporal trajectories of sampled query points and conditions action generation on these interaction-aware flows, implemented jointly within a unified diffusion architecture. This design grounds manipulation in localized dynamics while enabling the policy to reason about broader scene-level consequences of actions. Extensive experiments on the MetaWorld benchmark show that 3D FDP achieves state-of-the-art performance across 50 tasks, particularly excelling on medium and hard settings. Beyond simulation, we validate our method on eight real-robot tasks, where it consistently outperforms prior baselines in contact-rich and non-prehensile scenarios. These results highlight 3D flow as a powerful structural prior for learning generalizable visuomotor policies, supporting the development of more robust and versatile robotic manipulation. Robot demonstrations, additional results, and code can be found at https://sites.google.com/view/3dfdp/home.
comment: 7 main scripts + 2 reference pages
Distributionally Robust Safe Motion Planning with Contextual Information
We present a distributionally robust approach for collision avoidance by incorporating contextual information. Specifically, we embed the conditional distribution of future trajectory of the obstacle conditioned on the motion of the ego agent in a reproducing kernel Hilbert space (RKHS) via the conditional kernel mean embedding operator. Then, we define an ambiguity set containing all distributions whose embedding in the RKHS is within a certain distance from the empirical estimate of conditional mean embedding learnt from past data. Consequently, a distributionally robust collision avoidance constraint is formulated, and included in the receding horizon based motion planning formulation of the ego agent. Simulation results show that the proposed approach is more successful in avoiding collision compared to approaches that do not include contextual information and/or distributional robustness in their formulation in several challenging scenarios.
Interaction-aware Lane-Changing Early Warning System in Congested Traffic
Lane changes (LCs) in congested traffic are complex, multi-vehicle interactive events that pose significant safety concerns. Providing early warnings can enable more proactive driver assistance system and support more informed decision-making for drivers under LCs. This paper presents an interaction-aware Lane-Changing Early Warning (LCEW) system designed to issue reliable early warning signals based on future trajectory predictions. We first investigate the stochastic nature of LCs, characterized by (i) variable-size multi-vehicle interactions and (ii) the direct and indirect risks resulting from these interactions. To model these stochastic interactions, a Social Spatio-Temporal Graph Convolutional Neural Network framework informed by mutual information (STGCNN-MI) is introduced to predict multi-vehicle trajectories. By leveraging a MI-based adjacency matrix, the framework enhances trajectory prediction accuracy while providing interpretable representations of vehicle interactions. Then, potential collisions between the LC vehicle and adjacent vehicles (direct risks) or among the non-adjacent vehicles (indirect risks) are identified using oriented bounding box detection applied to the predicted trajectories. Finally, a warning signal is generated to inform the LC driver of location of potential collisions within the predicted time window. Traffic simulation experiments conducted in SUMO demonstrate that the proposed interaction-aware LCEW improves both vehicle-level safety and overall traffic efficiency, while also promoting more natural behavioral adaptation.
VLN-Zero: Rapid Exploration and Cache-Enabled Neurosymbolic Vision-Language Planning for Zero-Shot Transfer in Robot Navigation
Rapid adaptation in unseen environments is essential for scalable real-world autonomy, yet existing approaches rely on exhaustive exploration or rigid navigation policies that fail to generalize. We present VLN-Zero, a two-phase vision-language navigation framework that leverages vision-language models to efficiently construct symbolic scene graphs and enable zero-shot neurosymbolic navigation. In the exploration phase, structured prompts guide VLM-based search toward informative and diverse trajectories, yielding compact scene graph representations. In the deployment phase, a neurosymbolic planner reasons over the scene graph and environmental observations to generate executable plans, while a cache-enabled execution module accelerates adaptation by reusing previously computed task-location trajectories. By combining rapid exploration, symbolic reasoning, and cache-enabled execution, the proposed framework overcomes the computational inefficiency and poor generalization of prior vision-language navigation methods, enabling robust and scalable decision-making in unseen environments. VLN-Zero achieves 2x higher success rate compared to state-of-the-art zero-shot models, outperforms most fine-tuned baselines, and reaches goal locations in half the time with 55% fewer VLM calls on average compared to state-of-the-art models across diverse environments. Codebase, datasets, and videos for VLN-Zero are available at: https://vln-zero.github.io/.
comment: Codebase, datasets, and videos for VLN-Zero are available at: https://vln-zero.github.io/
AI Agent Access (A\^3) Network: An Embodied, Communication-Aware Multi-Agent Framework for 6G Coverage
The vision of 6G communication demands autonomous and resilient networking in environments without fixed infrastructure. Yet most multi-agent reinforcement learning (MARL) approaches focus on isolated stages - exploration, relay formation, or access - under static deployments and centralized control, limiting adaptability. We propose the AI Agent Access (A\^3) Network, a unified, embodied intelligence-driven framework that transforms multi-agent networking into a dynamic, decentralized, and end-to-end system. Unlike prior schemes, the A\^3 Network integrates exploration, target user access, and backhaul maintenance within a single learning process, while supporting on-demand agent addition during runtime. Its decentralized policies ensure that even a single agent can operate independently with limited observations, while coordinated agents achieve scalable, communication-optimized coverage. By embedding link-level communication metrics into actor-critic learning, the A\^3 Network couples topology formation with robust decision-making. Numerical simulations demonstrate that the A\^3 Network not only balances exploration and communication efficiency but also delivers system-level adaptability absent in existing MARL frameworks, offering a new paradigm for 6G multi-agent networks.
Refined Barrier Conditions for Finite-Time Safety and Reach-Avoid Guarantees in Stochastic Systems
Providing finite-time probabilistic safety and reach-avoid guarantees is crucial for safety-critical stochastic systems. Existing barrier certificate methods often rely on a restrictive boundedness assumption for auxiliary functions, limiting their applicability. This paper presents refined barrier-like conditions that remove this assumption. Specifically, we establish conditions for deriving upper bounds on finite-time safety probabilities in discrete-time systems and lower bounds on finite-time reach-avoid probabilities in continuous-time systems. This key relaxation significantly expands the class of verifiable systems, especially those with unbounded state spaces, and facilitates the application of advanced optimization techniques, such as semi-definite programming with polynomial functions. The efficacy of our approach is validated through numerical examples.
Minimalistic Autonomous Stack for High-Speed Time-Trial Racing
Autonomous racing has seen significant advancements, driven by competitions such as the Indy Autonomous Challenge (IAC) and the Abu Dhabi Autonomous Racing League (A2RL). However, developing an autonomous racing stack for a full-scale car is often constrained by limited access to dedicated test tracks, restricting opportunities for real-world validation. While previous work typically requires extended development cycles and significant track time, this paper introduces a minimalistic autonomous racing stack for high-speed time-trial racing that emphasizes rapid deployment and efficient system integration with minimal on-track testing. The proposed stack was validated on real speedways, achieving a top speed of 206 km/h within just 11 hours' practice run on the track with 325 km in total. Additionally, we present the system performance analysis, including tracking accuracy, vehicle dynamics, and safety considerations, offering insights for teams seeking to rapidly develop and deploy an autonomous racing stack with limited track access.
comment: The data associated with this paper is available at https://doi.org/10.5281/zenodo.17187680
Federated Aggregation of Demand Flexibility
This paper proposes a federated framework for demand flexibility aggregation to support grid operations. Unlike existing geometric methods that rely on a static, pre-defined base set as the geometric template for aggregation, our framework establishes a true federated process by enabling the collaborative optimization of this base set without requiring the participants sharing sensitive data with the aggregator. Specifically, we first formulate the base set optimization problem as a bilevel program. Using optimal solution functions, we then reformulate the bilevel program into a single-level, unconstrained learning task. By exploiting the decomposable structure of the overall gradient, we further design a decentralized gradient-based algorithm to solve this learning task. The entire framework, encompassing base set optimization, aggregation, and disaggregation, operates by design without exchanging raw user data. Numerical results demonstrate that our proposed framework unlocks substantially more flexibility than the approaches with static base sets, thus providing a promising framework for efficient and privacy-enhanced approaches to coordinate demand flexibility at scale.
comment: Longer version of a paper to be submitted to IEEE Transactions on Smart Grid
Modular Machine Learning with Applications to Genetic Circuit Composition
In several applications, including in synthetic biology, one often has input/output data on a system composed of many modules, and although the modules' input/output functions and signals may be unknown, knowledge of the composition architecture can significantly reduce the amount of training data required to learn the system's input/output mapping. Learning the modules' input/output functions is also necessary for designing new systems from different composition architectures. Here, we propose a modular learning framework, which incorporates prior knowledge of the system's compositional structure to (a) identify the composing modules' input/output functions from the system's input/output data and (b) achieve this by using a reduced amount of data compared to what would be required without knowledge of the compositional structure. To achieve this, we introduce the notion of modular identifiability, which allows recovery of modules' input/output functions from a subset of the system's input/output data, and provide theoretical guarantees on a class of systems motivated by genetic circuits. We demonstrate the theory on computational studies showing that a neural network (NNET) that accounts for the compositional structure can learn the composing modules' input/output functions and predict the system's output on inputs outside of the training set distribution. By contrast, a neural network that is agnostic of the structure is unable to predict on inputs that fall outside of the training set distribution. By reducing the need for experimental data and allowing module identification, this framework offers the potential to ease the design of synthetic biological circuits and of multi-module systems more generally.
From Space to Time: Enabling Adaptive Safety with Learned Value Functions via Disturbance Recasting
The widespread deployment of autonomous systems in safety-critical environments such as urban air mobility hinges on ensuring reliable, performant, and safe operation under varying environmental conditions. One such approach, value function-based safety filters, minimally modifies a nominal controller to ensure safety. Recent advances leverage offline learned value functions to scale these safety filters to high-dimensional systems. However, these methods assume detailed priors on all possible sources of model mismatch, in the form of disturbances in the environment -- information that is rarely available in real world settings. Even in well-mapped environments like urban canyons or industrial sites, drones encounter complex, spatially-varying disturbances arising from payload-drone interaction, turbulent airflow, and other environmental factors. We introduce SPACE2TIME, which enables safe and adaptive deployment of offline-learned safety filters under unknown, spatially-varying disturbances. The key idea is to reparameterize spatial variations in disturbance as temporal variations, enabling the use of precomputed value functions during online operation. We validate SPACE2TIME on a quadcopter through extensive simulations and hardware experiments, demonstrating significant improvement over baselines.
comment: The first three authors contributed equally. This work has been accepted for publication at the Conference on Robot Learning
Metriplectic Conditional Flow Matching for Dissipative Dynamics
Metriplectic conditional flow matching (MCFM) learns dissipative dynamics without violating first principles. Neural surrogates often inject energy and destabilize long-horizon rollouts; MCFM instead builds the conservative-dissipative split into both the vector field and a structure preserving sampler. MCFM trains via conditional flow matching on short transitions, avoiding long rollout adjoints. In inference, a Strang-prox scheme alternates a symplectic update with a proximal metric step, ensuring discrete energy decay; an optional projection enforces strict decay when a trusted energy is available. We provide continuous and discrete time guarantees linking this parameterization and sampler to conservation, monotonic dissipation, and stable rollouts. On a controlled mechanical benchmark, MCFM yields phase portraits closer to ground truth and markedly fewer energy-increase and positive energy rate events than an equally expressive unconstrained neural flow, while matching terminal distributional fit.
Four-Transistor Bipolar Series-Parallel Module Structure for Cascaded Bridge and Modular Multilevel Circuits
With their great scalability and flexibility, cascaded-bridge and modular multilevel converters have enabled a variety of energy applications, such as offshore wind power, high-voltage dc power transmission, power-quality management, and cutting-edge medical instrumentation. The incorporation of parallel connectivity between modules equips systems with advantages such as sensorless balancing, switched-capacitor energy exchange, and reduced impedance. However, existing topologies require many individual switches -- eight transistors per module. Efforts to use fewer switches, instead, have previously compromised their functionality. We propose a new module topology, named the direction-selective parallel (DiSeP) structure, which requires only four transistors per module -- the same as an H bridge -- but can achieve bidirectional equilibration, bipolar module output, and inter-module switched-capacitor features. This topology is highly attractive for existing converters with cascaded bridge elements, as the addition of only four diodes enables key features such as sensorless balancing and inter-module energy exchange. Thus, the module can outcompete H bridges in their applications, as it adds parallel modes without any additional transistors. Compared to double-H bridges (CH2B), it saves as many as half of the transistors. We elaborate on its working principles and key design considerations. We validate our theories on an experimental prototype with six modules. This prototype attains a total voltage harmonic distortion plus noise (THD+N) of 10.3% and a peak efficiency of 96.3%. Furthermore, the modules achieve autonomous sensorless balancing under open-loop control.
linrax: A JAX Compatible, Simplex Method Linear Program Solver
We present linrax, the first simplex based linear program (LP) solver compatible with the JAX ecosystem. In many control algorithms, LPs are often automatically generated and frequently solved either offline or online in the control loop. This motivates the design of linrax, which is especially suited for compilation into a complex JAX-based pipeline as a subroutine. We discuss the challenges associated with implementing a general purpose LP solver under strict design requirements from JAX. Notably, we can solve general problems which may include dependent constraints-something not possible with existing JAX-compatible LP solvers that use first-order techniques and may fail to converge. We demonstrate the utility of linrax through several examples, including a robust control synthesis pipeline for a nonlinear vehicle model using automatic differentiation through a LP-based reachable set framework.
comment: 8 pages, 3 figures
Robust Near-Optimal Nonlinear Target Enclosing Guidance
This paper proposes a nonlinear optimal guidance law that enables a pursuer to enclose a target within arbitrary geometric patterns, which extends beyond conventional circular encirclement. The design operates using only relative state measurements and formulates a target enclosing guidance law in which the vehicle's lateral acceleration serves as the steering control, making it well-suited for aerial vehicles with turning constraints. Our approach generalizes and extends existing guidance strategies that are limited to target encirclement and provides a degree of optimality. At the same time, the exact information of the target's maneuver is unnecessary during the design. The guidance law is developed within the framework of a state-dependent Riccati equation (SDRE), thereby providing a systematic way to handle nonlinear dynamics through a pseudo-linear representation to design locally optimal feedback guidance commands through state-dependent weighting matrices. While SDRE ensures near-optimal performance in the absence of strong disturbances, we further augment the design to incorporate an integral sliding mode manifold to compensate when disturbances push the system away from the nominal trajectory, and demonstrate that the design provides flexibility in the sense that the (possibly time-varying) stand-off curvature could also be treated as unknown. Simulations demonstrate the efficacy of the proposed approach.
Crater Observing Bio-inspired Rolling Articulator (COBRA)
NASA aims to establish a sustainable human basecamp on the Moon as a stepping stone for future missions to Mars and beyond. The discovery of water ice on the Moon's craters located in permanently shadowed regions, which can provide drinking water, oxygen, and rocket fuel, is therefore of critical importance. However, current methods to access lunar ice deposits are limited. While rovers have been used to explore the lunar surface for decades, they face significant challenges in navigating harsh terrains, such as permanently shadowed craters, due to the high risk of immobilization. This report introduces COBRA (Crater Observing Bio-inspired Rolling Articulator), a multi-modal snake-style robot designed to overcome mobility challenges in Shackleton Crater's rugged environment. COBRA combines slithering and tumbling locomotion to adapt to various crater terrains. In snake mode, it uses sidewinding to traverse flat or low inclined surfaces, while in tumbling mode, it forms a circular barrel by linking its head and tail, enabling rapid movement with minimal energy on steep slopes. Equipped with an onboard computer, stereo camera, inertial measurement unit, and joint encoders, COBRA facilitates real-time data collection and autonomous operation. This paper highlights COBRAs robustness and efficiency in navigating extreme terrains through both simulations and experimental validation.
Automatic and Scalable Safety Verification using Interval Reachability with Subspace Sampling
Interval refinement is a technique for reducing the conservatism of traditional interval based reachability methods by lifting the system to a higher dimension using new auxiliary variables and exploiting the introduced structure through a refinement procedure. We present a novel, efficiently scaling, automatic refinement strategy based on a subspace sampling argument and motivated by reducing the number of interval operations through sparsity. Unlike previous methods, we guarantee that refined bounds shrink as additional auxiliary variables are added. This additionally encourages automation of the lifting phase by allowing larger groups of auxiliary variables to be considered. We implement our strategy in JAX, a high-performance computational toolkit for Python and demonstrate its efficacy on several examples, including regulating a multi-agent platoon to the origin while avoiding an obstacle.
comment: 6 pages, 3 figures. Updated to correct a small error in the dynamics presented in equation (12)
The First Open-Source Framework for Learning Stability Certificate from Data
Before 2025, no open-source system existed that could learn Lyapunov stability certificates directly from noisy, real-world flight data. No tool could answer the critical question: is this controller still stabilizable-especially when its closed-loop system is a total black box. We broke that boundary. This year, we released the first-ever open-source framework that can learn Lyapunov functions from trajectory data under realistic, noise-corrupted conditions. Unlike statistical anomaly detectors, our method does not merely flag deviations-it directly determines whether the system can still be proven stable. Applied to public data from the 2024 SAS severe turbulence incident, our method revealed that, within just 60 seconds of the aircrafts descent becoming abnormal, no Lyapunov function could be constructed to certify system stability. Moreover, this is the first known data-driven stability-theoretic method ever applied to a civil airliner accident. And our approach works with zero access to the controller logic-a breakthrough for commercial aircraft where control laws are proprietary and opaque. The implementation of the proposed framework is open-sourced and available at: https://github.com/HansOersted/stability
Frequency-Varying Optimization: A Control Framework for New Dynamic Frequency Response Services
To address the variability of renewable generation, initiatives have been launched globally to provide faster and more effective frequency responses. In the UK, the National Energy System Operator (NESO) has introduced a suite of three new dynamic services, where aggregation of assets is expected to play a key role. For an Aggregated Response Unit (ARU), the required level of frequency response varies with grid frequency, resulting in a frequency-varying equality constraint that assets should meet collectively. We show that the optimal coordination of an ARU constitutes a Frequency-Varying Optimization (FVO) problem, in which the optimal trajectory for each asset evolves dynamically. To facilitate online optimization, we reformulate the FVO problem into Tracking of the Optimal Trajectory (TOT) problems, with algorithms proposed for two scenarios: one where the asset dynamics are negligible, and another where they must be accounted for. Under reasonable conditions, the ARU converges to the optimal trajectory within a fixed time, and within the maximum delivery time requested by NESO. The proposed framework can be readily distributed to coordinate a large number of assets. Numerical results verify the effectiveness and scalability of the proposed control framework.
comment: in IEEE Transactions on Power Systems, 2025
Addressing Model Inaccuracies in Transmission Network Reconfiguration via Diverse Alternatives
The ongoing energy transition places significant pressure on the transmission network due to increasing shares of renewables and electrification. To mitigate grid congestion, transmission system operators need decision support tools to suggest remedial actions, such as transmission network reconfigurations or redispatch. However, these tools are prone to model inaccuracies and may not provide relevant suggestions with regard to important unmodeled constraints or operator preferences. We propose a human-in-the-loop modeling-to-generate alternatives (HITL-MGA) approach to address these shortcomings by generating diverse topology reconfiguration alternatives. Case studies on the IEEE 57-bus and IEEE 118-bus systems show the method can leverage expert feedback and improve the quality of the suggested remedial actions.
comment: This preprint is currently under peer review
The Bayesian Separation Principle for Data-driven Control
In this paper we investigate the existence of a separation principle between model identification and control design in the context of model predictive control. First, we clarify that such a separation principle holds asymptotically in the number of data in a Fisherian context, and show that it holds universally, i.e. regardless of the data size, in a Bayesian context. Then, by formulating model predictive control within a Gaussian regression framework, we describe how the Bayesian separation principle can be used to derive computable, uncertainty-aware expressions for the control cost and optimal input sequence, thereby bridging direct and indirect data-driven approaches. Numerical results in both linear and nonlinear scenarios illustrate that the proposed approach outperform nominal methods that neglect uncertainty, highlighting the advantages of incorporating uncertainty in the control design process.
comment: 16 pages, 2 figures
Drift Plus Optimistic Penalty - A Learning Framework for Stochastic Network Optimization with Improved Regret Bounds
We consider the problem of joint routing and scheduling in queueing networks, where the edge transmission costs are unknown. At each time-slot, the network controller receives noisy observations of transmission costs only for those edges it selects for transmission. The network controller's objective is to make routing and scheduling decisions so that the total expected cost is minimized. This problem exhibits an exploration-exploitation trade-off, however, previous bandit-style solutions cannot be directly applied to this problem due to the queueing dynamics. In order to ensure network stability, the network controller needs to optimize throughput and cost simultaneously. We show that the best achievable cost is lower bounded by the solution to a static optimization problem, and develop a network control policy using techniques from Lyapunov drift-plus-penalty optimization and multi-arm bandits. We show that the policy achieves a sub-linear regret of order $O(\sqrt{T}\log T)$, as compared to the best policy that has complete knowledge of arrivals and costs. Finally, we evaluate the proposed policy using simulations and show that its regret is indeed sub-linear.
Occlusion-Aware Consistent Model Predictive Control for Robot Navigation in Occluded Obstacle-Dense Environments
Ensuring safety and motion consistency for robot navigation in occluded, obstacle-dense environments is a critical challenge. In this context, this study presents an occlusion-aware Consistent Model Predictive Control (CMPC) strategy. To account for the occluded obstacles, it incorporates adjustable risk regions that represent their potential future locations. Subsequently, dynamic risk boundary constraints are developed online to ensure safety. The CMPC then constructs multiple locally optimal trajectory branches (each tailored to different risk regions) to strike a balance between safety and performance. A shared consensus segment is generated to ensure smooth transitions between branches without significant velocity fluctuations, further preserving motion consistency. To facilitate high computational efficiency and ensure coordination across local trajectories, we use the alternating direction method of multipliers (ADMM) to decompose the CMPC into manageable sub-problems for parallel solving. The proposed strategy is validated through simulations and real-world experiments on an Ackermann-steering robot platform. The results demonstrate the effectiveness of the proposed CMPC strategy through comparisons with baseline approaches in occluded, obstacle-dense environments.
Stratified Topological Autonomy for Long-Range Coordination (STALC)
In this paper, we present Stratified Topological Autonomy for Long-Range Coordination (STALC), a hierarchical planning approach for coordinated multi-robot maneuvering in real-world environments with significant inter-robot spatial and temporal dependencies. At its core, STALC consists of a multi-robot graph-based planner which combines a topological graph with a novel, computationally efficient mixed-integer programming formulation to generate highly-coupled multi-robot plans in seconds. To enable autonomous planning across different spatial and temporal scales, we construct our graphs so that they capture connectivity between free-space regions and other problem-specific features, such as traversability or risk. We then use receding-horizon planners to achieve local collision avoidance and formation control. To evaluate our approach, we consider a multi-robot reconnaissance scenario where robots must autonomously coordinate to navigate through an environment while minimizing the risk of detection by observers. Through simulation-based experiments, we show that our approach is able to scale to address complex multi-robot planning scenarios. Through hardware experiments, we demonstrate our ability to generate graphs from real-world data and successfully plan across the entire hierarchy to achieve shared objectives.
comment: This work has been submitted to the IEEE for possible publication
Low-pass sampling in Model Predictive Path Integral Control
Model Predictive Path Integral (MPPI) control is a widely used sampling-based approach for real-time control, offering flexibility in handling arbitrary dynamics and cost functions. However, the original MPPI suffers from high-frequency noise in the sampled control trajectories, leading to actuator wear and inefficient exploration. In this work, we introduce Low-Pass Model Predictive Path Integral Control (LP-MPPI), which integrates low-pass filtering into the sampling process to eliminate detrimental high-frequency components and improve the effectiveness of the control trajectories exploration. Unlike prior approaches, LP-MPPI provides direct and interpretable control over the frequency spectrum of sampled trajectories, enhancing sampling efficiency and control smoothness. Through extensive evaluations in Gymnasium environments, simulated quadruped locomotion, and real-world F1TENTH autonomous racing, we demonstrate that LP-MPPI consistently outperforms state-of-the-art MPPI variants, achieving significant performance improvements while reducing control signal chattering.
Universal Dynamics with Globally Controlled Analog Quantum Simulators
Analog quantum simulators with global control fields have emerged as powerful platforms for exploring complex quantum phenomena. Recent breakthroughs, such as the coherent control of thousands of atoms, highlight the growing potential for quantum applications at scale. Despite these advances, a fundamental theoretical question remains unresolved: to what extent can such systems realize universal quantum dynamics under global control? Here we establish a necessary and sufficient condition for universal quantum computation using only global pulse control, proving that a broad class of analog quantum simulators is, in fact, universal. We further extend this framework to fermionic and bosonic systems, including modern platforms such as ultracold atoms in optical superlattices. Crucially, to connect the theoretical possibility with experimental reality, we introduce a new control technique into the experiment - direct quantum optimal control. This method enables the synthesis of complex effective Hamiltonians and allows us to incorporate realistic hardware constraints. To show its practical power, we experimentally engineer three-body interactions outside the blockade regime and demonstrate topological dynamics on a Rydberg atom array. Using the new control framework, we overcome key experimental challenges, including hardware limitations and atom position fluctuations in the non-blockade regime, by identifying smooth, short-duration pulses that achieve high-fidelity dynamics. Experimental measurements reveal dynamical signatures of symmetry-protected-topological edge modes, confirming both the expressivity and feasibility of our approach. Our work opens a new avenue for quantum simulation beyond native hardware Hamiltonians, enabling the engineering of effective multi-body interactions and advancing the frontier of quantum information processing with globally-controlled analog platforms.
comment: 10 pages, 5 figures with Methods. HYH, AMG, and LC contributed equally to this work
Sliding Speed Influences Electrovibration-Induced Finger Friction Dynamics on Touchscreens
Electrovibration technology enables tactile texture rendering on capacitive touchscreens by modulating friction between the finger and the screen through electrostatic attraction forces, generated by applying an alternating voltage signal to the screen. Accurate signal calibration is essential for robust texture rendering but remains challenging due to variations in sliding speed, applied force, and individual skin mechanics, all of which unpredictably affect frictional behavior. Here, we investigate how exploration conditions affect electrovibration-induced finger friction on touchscreens and the role of skin mechanics in this process. Ten participants slid their index fingers across an electrovibration-enabled touchscreen at five sliding speeds ($20\sim100$ mm/s) and applied force levels ($0.2\sim0.6$ N). Contact forces and skin accelerations were measured while amplitude modulated voltage signals spanning the tactile frequency range were applied to the screen. We modeled the finger-touchscreen friction response as a first-order system and the skin mechanics as a mass-spring-damper system. Results showed that sliding speed influenced the friction response's cutoff frequency, along with the estimated finger moving mass and stiffness. For every $1$ mm/s increase in speed, the cutoff frequency, the finger moving mass, and stiffness increased by $13.8$ Hz, $3.23\times 10^{-5}$ kg, and $4.04$ N/m, respectively. Correlation analysis revealed that finger stiffness had a greater impact on the cutoff frequency than moving mass. Notably, we observed a substantial inter-participant variability in both finger-display interaction and skin mechanics parameters. Finally, we developed a speed-dependent friction model to support consistent and perceptually stable electrovibration-based haptic feedback across varying user conditions.
comment: 26 pages, 14 figures, Tribology journal
Optimization-Based Ramping Reserve Allocation of BESS for AGC Enhancement
The transient behavior of Automatic Generation Control (AGC) systems is a critical aspect of power system operation. Therefore, fully extracting the potential of Battery Energy Storage Systems (BESSs) for AGC enhancement is of paramount importance. In light of the challenges posed by diverse resource interconnections and the variability associated, we propose an online optimization scheme that can adapt to changes in an unknown and variable environment. To leverage the synergy between BESSs and Conventional Generators (CGs), we devise a variant of the Area Injection Error (AIE) as a measure to quantify the ramping needs. Based on this measure, we develop a distributed optimization algorithm with adaptive learning rates for the allocation of the ramping reserve. The algorithm restores a larger learning rate for compliance with the ramping needs upon detecting a potentially destabilizing event. We demonstrate the effectiveness and scalability of the proposed scheme through comprehensive case studies. It is shown that the proposed scheme can improve the transient behavior of the AGC system by bridging the gap in ramping capability.
DANSE: Data-driven Non-linear State Estimation of Model-free Process in Unsupervised Learning Setup
We address the tasks of Bayesian state estimation and forecasting for a model-free process in an unsupervised learning setup. For a model-free process, we do not have any a-priori knowledge of the process dynamics. In the article, we propose DANSE -- a Data-driven Nonlinear State Estimation method. DANSE provides a closed-form posterior of the state of the model-free process, given linear measurements of the state. In addition, it provides a closed-form posterior for forecasting. A data-driven recurrent neural network (RNN) is used in DANSE to provide the parameters of a prior of the state. The prior depends on the past measurements as input, and then we find the closed-form posterior of the state using the current measurement as input. The data-driven RNN captures the underlying non-linear dynamics of the model-free process. The training of DANSE, mainly learning the parameters of the RNN, is executed using an unsupervised learning approach. In unsupervised learning, we have access to a training dataset comprising only a set of measurement data trajectories, but we do not have any access to the state trajectories. Therefore, DANSE does not have access to state information in the training data and can not use supervised learning. Using simulated linear and non-linear process models (Lorenz attractor and Chen attractor), we evaluate the unsupervised learning-based DANSE. We show that the proposed DANSE, without knowledge of the process model and without supervised learning, provides a competitive performance against model-driven methods, such as the Kalman filter (KF), extended KF (EKF), unscented KF (UKF), a data-driven deep Markov model (DMM) and a recently proposed hybrid method called KalmanNet. In addition, we show that DANSE works for high-dimensional state estimation.
comment: 12 pages, Accepted for publication in IEEE Transactions in Signal Processing. Added a fix in the latest ArXiV version for Fig. 4 (and related figs) due to a slight misalignment in MSE calculations. Relative performance trends are unchanged and consistent
Recent advances in data-driven methods for degradation modelling across applications
Understanding degradation is crucial for ensuring the longevity and performance of materials, systems, and organisms. To illustrate the similarities across applications, this article provides a review of data-based method in materials science, engineering, and medicine. The methods analyzed in this paper include regression analysis, factor analysis, cluster analysis, Markov Chain Monte Carlo, Bayesian statistics, hidden Markov models, nonparametric Bayesian modeling of time series, supervised learning, and deep learning. The review provides an overview of degradation models, referencing books and methods, and includes detailed tables highlighting the applications and insights offered in medicine, power engineering, and material science. It also discusses the classification of methods, emphasizing statistical inference, dynamic prediction, machine learning, and hybrid modeling techniques. Overall, this review enhances understanding of degradation modelling across diverse domains.
A generalized global Hartman-Grobman theorem for asymptotically stable semiflows
Recently, Kvalheim and Sontag provided a generalized global Hartman-Grobman theorem for equilibria under asymptotically stable continuous vector fields. By leveraging topological properties of Lyapunov functions, their theorem works without assuming hyperbolicity. We extend their theorem to a class of possibly discontinuous vector fields, in particular, to vector fields generating asymptotically stable semiflows.
comment: Technical note related to the update of 10.48550/arXiv.2411.03277. Newest version: 6 pages, 5 figures, the only change is the presentation. Under review
Online Regularized Statistical Learning in Reproducing Kernel Hilbert Space With Non-Stationary Data
We study the convergence of recursive regularized learning algorithms in the reproducing kernel Hilbert space (RKHS) with dependent and non-stationary online data streams. Firstly, we introduce the concept of random Tikhonov regularization path and decompose the tracking error of the algorithm's output for the regularization path into random difference equations in RKHS, whose non-homogeneous terms are martingale difference sequences. Investigating the mean square asymptotic stability of the equations, we show that if the regularization path is slowly time-varying, then the algorithm's output achieves mean square consistency with the regularization path. Leveraging operator theory, particularly the monotonicity of the inverses of operators and the spectral decomposition of compact operators, we introduce the RKHS persistence of excitation condition (i.e. there exists a fixed-length time period, such that the conditional expectation of the operators induced by the input data accumulated over every period has a uniformly strictly positive compact lower bound) and develop a dominated convergence method to prove the mean square consistency between the algorithm's output and an unknown function. Finally, for independent and non-identically distributed data streams, the algorithm achieves the mean square consistency if the input data's marginal probability measures are slowly time-varying and the average measure over each fixed-length time period has a uniformly strictly positive lower bound.
LogicGuard: Improving Embodied LLM agents through Temporal Logic based Critics
Large language models (LLMs) have shown promise in zero-shot and single step reasoning and decision making problems, but in long horizon sequential planning tasks, their errors compound, often leading to unreliable or inefficient behavior. We introduce LogicGuard, a modular actor-critic architecture in which an LLM actor is guided by a trajectory level LLM critic that communicates through Linear Temporal Logic (LTL). Our setup combines the reasoning strengths of language models with the guarantees of formal logic. The actor selects high-level actions from natural language observations, while the critic analyzes full trajectories and proposes new LTL constraints that shield the actor from future unsafe or inefficient behavior. LogicGuard supports both fixed safety rules and adaptive, learned constraints, and is model-agnostic: any LLM-based planner can serve as the actor, with LogicGuard acting as a logic-generating wrapper. We formalize planning as graph traversal under symbolic constraints, allowing LogicGuard to analyze failed or suboptimal trajectories and generate new temporal logic rules that improve future behavior. To demonstrate generality, we evaluate LogicGuard across two distinct settings: short-horizon general tasks and long-horizon specialist tasks. On the Behavior benchmark of 100 household tasks, LogicGuard increases task completion rates by 25% over a baseline InnerMonologue planner. On the Minecraft diamond-mining task, which is long-horizon and requires multiple interdependent subgoals, LogicGuard improves both efficiency and safety compared to SayCan and InnerMonologue. These results show that enabling LLMs to supervise each other through temporal logic yields more reliable, efficient and safe decision-making for both embodied agents.
comment: Modified version of prior LTLCrit work with new robotics dataset
Spectrum Opportunities for the Wireless Future: From Direct-to-Device Satellite Applications to 6G Cellular
For next-generation wireless networks, both the upper mid-band and terahertz spectra are gaining global attention. This article provides an in-depth analysis of recent regulatory rulings and spectrum preferences issued by international standard bodies. We highlight promising frequency bands earmarked for 6G and beyond, and offer examples that illuminate the passive service protections and spectrum feasibility for coexistence between terrestrial and non-terrestrial wireless networks.
Energy Management for Renewable-Colocated Artificial Intelligence Data Centers
We develop an energy management system (EMS) for artificial intelligence (AI) data centers with colocated renewable generation. Under a cost-minimizing framework, the EMS of renewable-colocated data center (RCDC) co-optimizes AI workload scheduling, on-site renewable utilization, and electricity market participation. Within both wholesale and retail market participation models, the economic benefit of the RCDC operation is maximized. Empirical evaluations using real-world traces of electricity prices, data center power consumption, and renewable generation demonstrate significant electricity cost reduction from renewable and AI data center colocations.
Uncertainty-aware Latent Safety Filters for Avoiding Out-of-Distribution Failures
Recent advances in generative world models have enabled classical safe control methods, such as Hamilton-Jacobi (HJ) reachability, to generalize to complex robotic systems operating directly from high-dimensional sensor observations. However, obtaining comprehensive coverage of all safety-critical scenarios during world model training is extremely challenging. As a result, latent safety filters built on top of these models may miss novel hazards and even fail to prevent known ones, overconfidently misclassifying risky out-of-distribution (OOD) situations as safe. To address this, we introduce an uncertainty-aware latent safety filter that proactively steers robots away from both known and unseen failures. Our key idea is to use the world model's epistemic uncertainty as a proxy for identifying unseen potential hazards. We propose a principled method to detect OOD world model predictions by calibrating an uncertainty threshold via conformal prediction. By performing reachability analysis in an augmented state space-spanning both the latent representation and the epistemic uncertainty-we synthesize a latent safety filter that can reliably safeguard arbitrary policies from both known and unseen safety hazards. In simulation and hardware experiments on vision-based control tasks with a Franka manipulator, we show that our uncertainty-aware safety filter preemptively detects potential unsafe scenarios and reliably proposes safe, in-distribution actions. Video results can be found on the project website at https://cmu-intentlab.github.io/UNISafe
comment: Conference on Robot Learning (CoRL 2025)
Drift Plus Optimistic Penalty -- A Learning Framework for Stochastic Network Optimization with Improved Regret Bounds
We consider the problem of joint routing and scheduling in queueing networks, where the edge transmission costs are unknown. At each time-slot, the network controller receives noisy observations of transmission costs only for those edges it selects for transmission. The network controller's objective is to make routing and scheduling decisions so that the total expected cost is minimized. This problem exhibits an exploration-exploitation trade-off, however, previous bandit-style solutions cannot be directly applied to this problem due to the queueing dynamics. In order to ensure network stability, the network controller needs to optimize throughput and cost simultaneously. We show that the best achievable cost is lower bounded by the solution to a static optimization problem, and develop a network control policy using techniques from Lyapunov drift-plus-penalty optimization and multi-arm bandits. We show that the policy achieves a sub-linear regret of order $O(\sqrt{T}\log T)$, as compared to the best policy that has complete knowledge of arrivals and costs. Finally, we evaluate the proposed policy using simulations and show that its regret is indeed sub-linear.
Systems and Control (EESS)
Policy Gradient Bounds in Multitask LQR
We analyze the performance of policy gradient in multitask linear quadratic regulation (LQR), where the system and cost parameters differ across tasks. The main goal of multitask LQR is to find a controller with satisfactory performance on every task. Prior analyses on relevant contexts fail to capture closed-loop task similarities, resulting in conservative performance guarantees. To account for such similarities, we propose bisimulation-based measures of task heterogeneity. Our measures employ new bisimulation functions to bound the cost gradient distance between a pair of tasks in closed loop with a common stabilizing controller. Employing these measures, we derive suboptimality bounds for both the multitask optimal controller and the asymptotic policy gradient controller with respect to each of the tasks. We further provide conditions under which the policy gradient iterates remain stabilizing for every system. For multiple random sets of certain tasks, we observe that our bisimulation-based measures improve upon baseline measures of task heterogeneity dramatically.
Proactive-reactive detection and mitigation of intermittent faults in robot swarms
Intermittent faults are transient errors that sporadically appear and disappear. Although intermittent faults pose substantial challenges to reliability and coordination, existing studies of fault tolerance in robot swarms focus instead on permanent faults. One reason for this is that intermittent faults are prohibitively difficult to detect in the fully self-organized ad-hoc networks typical of robot swarms, as their network topologies are transient and often unpredictable. However, in the recently introduced self-organizing nervous systems (SoNS) approach, robot swarms are able to self-organize persistent network structures for the first time, easing the problem of detecting intermittent faults. To address intermittent faults in robot swarms that have persistent networks, we propose a novel proactive-reactive strategy to detection and mitigation, based on self-organized backup layers and distributed consensus in a multiplex network. Proactively, the robots self-organize dynamic backup paths before faults occur, adapting to changes in the primary network topology and the robots' relative positions. Reactively, robots use one-shot likelihood ratio tests to compare information received along different paths in the multiplex network, enabling early fault detection. Upon detection, communication is temporarily rerouted in a self-organized way, until the detected fault resolves. We validate the approach in representative scenarios of faulty positional data occurring during formation control, demonstrating that intermittent faults are prevented from disrupting convergence to desired formations, with high fault detection accuracy and low rates of false positives.
Watts and Drops: Co-Scheduling Power and Water in Desalination Plants
We develop a mathematical framework to jointly schedule water and electricity in a profit-maximizing renewable colocated water desalination plant that integrates both thermal and membrane based technologies. The price-taking desalination plant sells desalinated water to a water utility at a given price and engages in bidirectional electricity transactions with the grid, purchasing or selling power based on its net electricity demand. We show that the optimal scheduling policy depends on the plant's internal renewable generation and follows a simple threshold structure. Under the optimal policy, thermal based water output decreases monotonically with renewable output, while membrane based water output increases monotonically. We characterize the structure and intuition behind the threshold policy and examine key special properties.
comment: 5 pages, 6 figures. To appear in Proceedings of the 61st Allerton Conference on Communication, Control, and Computing
Robust Synchronous Reference Frame Phase-Looked Loop (PLL) with Feed-Forward Frequency Estimation
Synchronous reference frame phase-looked loop (SRF-PLL) techniques are widely used for interfacing and control applications in the power systems and energy conversion at large. Since a PLL system synchronizes its output with an exogenous harmonic signal, often 3-phases voltage or current, the locking of the frequency and phase angle depends on the performance of the feedback loop with at least two integrator terms, and on the distortions of the measured input quantities. For the conventional SRF-PLL with a proportional-integral (PI) control in feedback, we are providing a robust design which maximizes the phase margin and uses the normalization scheme for yielding the loop insensitive to the input amplitude variations. The main improvement in the transient behavior and also in tracking of frequency ramps is achieved by using the robust feed-forward frequency estimator, which is model-free and suitable for the noisy and time-varying harmonic signals. The proposed feed-forward-feedback SRF-PLL scheme is experimentally evaluated on the 3-phases harmonic currents from standard PMSM drives with varying angular speeds and loads. Both, the tracked angular frequency and locked phase angle are assessed as performance metrics of the robust SRF-PLL scheme with feedforwarding.
comment: 8 pages, 9 figures
A Fast Initialization Method for Neural Network Controllers: A Case Study of Image-based Visual Servoing Control for the multicopter Interception
Reinforcement learning-based controller design methods often require substantial data in the initial training phase. Moreover, the training process tends to exhibit strong randomness and slow convergence. It often requires considerable time or high computational resources. Another class of learning-based method incorporates Lyapunov stability theory to obtain a control policy with stability guarantees. However, these methods generally require an initially stable neural network control policy at the beginning of training. Evidently, a stable neural network controller can not only serve as an initial policy for reinforcement learning, allowing the training to focus on improving controller performance, but also act as an initial state for learning-based Lyapunov control methods. Although stable controllers can be designed using traditional control theory, designers still need to have a great deal of control design knowledge to address increasingly complicated control problems. The proposed neural network rapid initialization method in this paper achieves the initial training of the neural network control policy by constructing datasets that conform to the stability conditions based on the system model. Furthermore, using the image-based visual servoing control for multicopter interception as a case study, simulations and experiments were conducted to validate the effectiveness and practical performance of the proposed method. In the experiment, the trained control policy attains a final interception velocity of 15 m/s.
AI-Enabled Smart Hygiene System for Real-Time Glucose Detection
This research presents a smart urinary health monitoring system incorporating a coplanar waveguide (CPW)-fed slot-loop antenna biosensor designed to analyse various urine samples. The antenna demonstrates distinct resonant frequency shifts when exposed to five specific urine conditions, deviating from its baseline 1.42 GHz operation. These measurable frequency variations enable the antenna to function as an effective microwave sensor for urinary biomarker detection. A potential artificial intelligence-based Convolutional Neural Networks Long Short-Term Memory (CNN-LSTM) framework is also discussed to overcome the limitations of overlapping frequency responses, aiming to improve the accuracy of health condition detection. These components contribute to the development of a smart toilet system that displays real-time health information on a wall-mounted urinal screen, without requiring any user effort or behavioural change.
MAPPO for Edge Server Monitoring
In this paper, we consider a goal-oriented communication problem for edge server monitoring, where jobs arrive intermittently at multiple dispatchers and must be assigned to shared edge servers with finite queues and time-varying availability. Accurate knowledge of server status is critical for sustaining high throughput, yet remains challenging under dynamic workloads and partial observability. To address this challenge, each dispatcher maintains server knowledge through two complementary mechanisms: (i) active status queries that provide instantaneous updates at a communication cost, and (ii) job execution feedback that reveals server conditions opportunistically. We formulate a cooperative multi-agent distributed decision-making problem in which dispatchers jointly optimize query scheduling to balance throughput against communication overhead. To solve this problem, we propose a Multi-Agent Proximal Policy Optimization (MAPPO)-based algorithm that leverages centralized training with decentralized execution (CTDE) to learn distributed query-and-dispatch policies under partial and stale observations. Numerical evaluations show that MAPPO achieves superior throughput-cost tradeoffs and significantly outperforms baseline strategies, achieving on average a 30% improvement over the closest baseline.
comment: 6 pages, 4 figures. Accepted to IEEE MILCOM 2025
A Weighted Least Squares Error Hetero-functional Graph State Estimator of the American Multi-modal Energy System
As one of the most pressing challenges of the 21st century, global climate change demands a host of changes across at least four critical energy infrastructures: the electric grid, the natural gas system, the oil system, and the coal system. In the context of the United States, this paper refers to this system-of-systems as ``The American Multi-Modal Energy System (AMES)". These combined changes necessitate an understanding of the AMES interdependencies, both structurally and behaviorally, to develop and enact effective policies. This work focuses on behavioral analysis methods to provide examples of how to analyze system behavior and the critical matter and energy flows through the system. Building upon past works, two regions of the AMES are modeled, and their behavior is analyzed using Hetero-functional Graph Theory (HFGT). More specifically, the work presents a weighted least square error state estimation model of the AMES. State estimation has played a major role in the operation and development of the American Electric Power System. This work extends the state estimation analysis beyond the single-operand electric grid environment into the heterogeneous environment of the AMES. Employing a data-driven and model-based systems engineering approach in combination with HFGT, a Weighted Least Squares Error Hetero-functional Graph State Estimation (WLSEHFGSE) optimization program is developed to estimate the optimal flows of mass and energy through the AMES. This work is the first to integrate state estimation methods with HFGT. Furthermore, it demonstrates how such a WLSEHFGSE recovers the mass and energy flows in a system-of-systems like the AMES with asset-level granularity.
comment: 27 pages, 3 Tables, 11 Figures
Adaptive Override Control under High-Relative-Degree Nonovershooting Constraints
This paper considers the problem of adaptively overriding unsafe actions of a nominal controller in the presence of high-relative-degree nonovershooting constraints and parametric uncertainties. To prevent the design from being coupled with high-order derivatives of the parameter estimation error, we adopt a modular design approach in which the controller and the parameter identifier are designed separately. The controller module ensures that any safety violations caused by parametric uncertainties remain bounded, provided that the parameter estimation error and its first-order derivative are either bounded or square-integrable. The identifier module, in turn, guarantees that these requirements on the parameter estimation error are satisfied. Both theoretical analysis and simulation results demonstrate that the closed-loop safety violation is bounded by a tunable function of the initial estimation error. Moreover, as time increases, the parameter estimate converges to the true value, and the amount of safety violation decreases accordingly.
Frequency-Varying Optimization: A Control Framework for New Dynamic Frequency Response Services
To address the variability of renewable generation, initiatives have been launched globally to provide faster and more effective frequency responses. In the UK, the National Energy System Operator (NESO) has introduced a suite of three new dynamic services, where aggregation of assets is expected to play a key role. For an Aggregated Response Unit (ARU), the required level of frequency response varies with grid frequency, resulting in a frequency-varying equality constraint that assets should meet collectively. We show that the optimal coordination of an ARU constitutes a Frequency-Varying Optimization (FVO) problem, in which the optimal trajectory for each asset evolves dynamically. To facilitate online optimization, we reformulate the FVO problem into Tracking of the Optimal Trajectory (TOT) problems, with algorithms proposed for two scenarios: one where the asset dynamics are negligible, and another where they must be accounted for. Under reasonable conditions, the ARU converges to the optimal trajectory within a fixed time, and within the maximum delivery time requested by NESO. The proposed framework can be readily distributed to coordinate a large number of assets. Numerical results verify the effectiveness and scalability of the proposed control framework.
On the Boundary of the Robust Admissible Set in State and Input Constrained Nonlinear Systems
In this paper, we consider nonlinear control systems subject to bounded disturbances and to both state and input constraints. We introduce the definition of robust admissible set - the set of all initial states from which the state and input constraints can be satisfied for all times against all admissible disturbances. We focus on its boundary that can be decomposed into the usable part on the state constraint boundary and the barrier, interior to the state constraints. We show that, at the intersection of these two components, the boundary of the admissible set must be tangent to the state constraints and separate the interior of the robust admissible set and its complement. Moreover, we prove that the barrier must satisfy a saddle-point principle on a Hamiltonian, in the spirit of Pontryagin's maximum principle, thus providing a direct computational tool. Lastly, we illustrate our results by calculating the robust admissible set for an adaptive cruise control example.
comment: 18 pages, 4 figures
Integration of Concentrated Solar Power Plants in Renewable-Only VPP with Electrical and Thermal Demands: A Two-Stage Robust Bidding Approach
This paper proposes the integration of Concentrated Solar Power Plant (CSP) in the Renewable-only virtual power plant (RVPP) for bidding in the electricity day-ahead and secondary reserve markets, as well as trading thermal energy through a heat purchase agreement. A reformulated two-stage robust optimization approach is introduced to account for multiple uncertainties, including electricity prices, non-dispatchable renewable energy sources electrical production, CSP thermal production, and uncertainties in electrical and thermal demand consumption. The provision of energy and reserve by the thermal storage of CSP is modeled using an adjustable approach, which allocates a share of energy for up and down reserves based on the profitability of the RVPP. Simulations are conducted for several case studies to demonstrate the effectiveness and computational efficiency of the proposed approach under different RVPP operator decisions against uncertain parameters and various trading strategies for electricity and thermal energy. The simulation results show that integrating CSP into RVPP enhances RVPP flexibility for both electrical and thermal trading. Furthermore, the results indicate that the profitability of the RVPP increases when all trading options are considered, across different levels of conservatism adopted by the RVPP operator in response to uncertain parameters.
Guaranteed Robust Nonlinear MPC via Disturbance Feedback
Robots must satisfy safety-critical state and input constraints despite disturbances and model mismatch. We introduce a robust model predictive control (RMPC) formulation that is fast, scalable, and compatible with real-time implementation. Our formulation guarantees robust constraint satisfaction, input-to-state stability (ISS) and recursive feasibility. The key idea is to decompose the uncertain nonlinear system into (i) a nominal nonlinear dynamic model, (ii) disturbance-feedback controllers, and (iii) bounds on the model error. These components are optimized jointly using sequential convex programming. The resulting convex subproblems are solved efficiently using a recent disturbance-feedback MPC solver. The approach is validated across multiple dynamics, including a rocket-landing problem with steerable thrust. An open-source implementation is available at https://github.com/antoineleeman/robust-nonlinear-mpc.
comment: Code: https://github.com/antoineleeman/robust-nonlinear-mpc
An Extended Kalman Filter for Systems with Infinite-Dimensional Measurements
This article examines state estimation in discrete-time nonlinear stochastic systems with finite-dimensional states and infinite-dimensional measurements, motivated by real-world applications such as vision-based localization and tracking. We develop an extended Kalman filter (EKF) for real-time state estimation, with the measurement noise modeled as an infinite-dimensional random field. When applied to vision-based state estimation, the measurement Jacobians required to implement the EKF are shown to correspond to image gradients. This result provides a novel system-theoretic justification for the use of image gradients as features for vision-based state estimation, contrasting with their (often heuristic) introduction in many computer-vision pipelines. We demonstrate the practical utility of the EKF on a public real-world dataset involving the localization of an aerial drone using video from a downward-facing monocular camera. The EKF is shown to outperform VINS-MONO, an established visual-inertial odometry algorithm, in some cases achieving mean squared error reductions of up to an order of magnitude.
comment: 8 pages
Dual Iterative Learning Control for Multiple-Input Multiple-Output Dynamics with Validation in Robotic Systems
Solving motion tasks autonomously and accurately is a core ability for intelligent real-world systems. To achieve genuine autonomy across multiple systems and tasks, key challenges include coping with unknown dynamics and overcoming the need for manual parameter tuning, which is especially crucial in complex Multiple-Input Multiple-Output (MIMO) systems. This paper presents MIMO Dual Iterative Learning Control (DILC), a novel data-driven iterative learning scheme for simultaneous tracking control and model learning, without requiring any prior system knowledge or manual parameter tuning. The method is designed for repetitive MIMO systems and integrates seamlessly with established iterative learning control methods. We provide monotonic convergence conditions for both reference tracking error and model error in linear time-invariant systems. The DILC scheme -- rapidly and autonomously -- solves various motion tasks in high-fidelity simulations of an industrial robot and in multiple nonlinear real-world MIMO systems, without requiring model knowledge or manually tuning the algorithm. In our experiments, many reference tracking tasks are solved within 10-20 trials, and even complex motions are learned in less than 100 iterations. We believe that, because of its rapid and autonomous learning capabilities, DILC has the potential to serve as an efficient building block within complex learning frameworks for intelligent real-world systems.
comment: 11 pages, 4 figures
Verification and Synthesis of Discrete-Time Control Barrier Functions
Discrete-time Control Barrier Functions (DTCBFs) have recently attracted interest for guaranteeing safety and synthesizing safe controllers for discrete-time dynamical systems. This paper addresses the open challenges of verifying candidate DTCBFs and synthesizing DTCBFs for general nonlinear discrete-time systems with input constraints and arbitrary safe sets. In particular, we propose a branch-and-bound method, inspired by the $\alpha$BB algorithm, for the verification of candidate DTCBFs in both cases, whether a corresponding control policy is known or unknown. We prove that this method, in a finite number of iterations, either verifies a given candidate function as a valid DTCBF or falsifies it by providing a counterexample (within predefined tolerances). As a second main contribution, we propose a novel bilevel optimization approach to synthesize a DTCBF and a corresponding control policy in finite time. This involves determining the unknown coefficients of a parameterized DTCBF and a parameterized control policy. Furthermore, we introduce various strategies to reduce the computational burden of the bilevel approach. We also demonstrate our methods using numerical case studies.
comment: To appear in IEEE Transactions on Automatic Control
3D Flow Diffusion Policy: Visuomotor Policy Learning via Generating Flow in 3D Space
Learning robust visuomotor policies that generalize across diverse objects and interaction dynamics remains a central challenge in robotic manipulation. Most existing approaches rely on direct observation-to-action mappings or compress perceptual inputs into global or object-centric features, which often overlook localized motion cues critical for precise and contact-rich manipulation. We present 3D Flow Diffusion Policy (3D FDP), a novel framework that leverages scene-level 3D flow as a structured intermediate representation to capture fine-grained local motion cues. Our approach predicts the temporal trajectories of sampled query points and conditions action generation on these interaction-aware flows, implemented jointly within a unified diffusion architecture. This design grounds manipulation in localized dynamics while enabling the policy to reason about broader scene-level consequences of actions. Extensive experiments on the MetaWorld benchmark show that 3D FDP achieves state-of-the-art performance across 50 tasks, particularly excelling on medium and hard settings. Beyond simulation, we validate our method on eight real-robot tasks, where it consistently outperforms prior baselines in contact-rich and non-prehensile scenarios. These results highlight 3D flow as a powerful structural prior for learning generalizable visuomotor policies, supporting the development of more robust and versatile robotic manipulation. Robot demonstrations, additional results, and code can be found at https://sites.google.com/view/3dfdp/home.
comment: 7 main scripts + 2 reference pages
Distributionally Robust Safe Motion Planning with Contextual Information
We present a distributionally robust approach for collision avoidance by incorporating contextual information. Specifically, we embed the conditional distribution of future trajectory of the obstacle conditioned on the motion of the ego agent in a reproducing kernel Hilbert space (RKHS) via the conditional kernel mean embedding operator. Then, we define an ambiguity set containing all distributions whose embedding in the RKHS is within a certain distance from the empirical estimate of conditional mean embedding learnt from past data. Consequently, a distributionally robust collision avoidance constraint is formulated, and included in the receding horizon based motion planning formulation of the ego agent. Simulation results show that the proposed approach is more successful in avoiding collision compared to approaches that do not include contextual information and/or distributional robustness in their formulation in several challenging scenarios.
Interaction-aware Lane-Changing Early Warning System in Congested Traffic
Lane changes (LCs) in congested traffic are complex, multi-vehicle interactive events that pose significant safety concerns. Providing early warnings can enable more proactive driver assistance system and support more informed decision-making for drivers under LCs. This paper presents an interaction-aware Lane-Changing Early Warning (LCEW) system designed to issue reliable early warning signals based on future trajectory predictions. We first investigate the stochastic nature of LCs, characterized by (i) variable-size multi-vehicle interactions and (ii) the direct and indirect risks resulting from these interactions. To model these stochastic interactions, a Social Spatio-Temporal Graph Convolutional Neural Network framework informed by mutual information (STGCNN-MI) is introduced to predict multi-vehicle trajectories. By leveraging a MI-based adjacency matrix, the framework enhances trajectory prediction accuracy while providing interpretable representations of vehicle interactions. Then, potential collisions between the LC vehicle and adjacent vehicles (direct risks) or among the non-adjacent vehicles (indirect risks) are identified using oriented bounding box detection applied to the predicted trajectories. Finally, a warning signal is generated to inform the LC driver of location of potential collisions within the predicted time window. Traffic simulation experiments conducted in SUMO demonstrate that the proposed interaction-aware LCEW improves both vehicle-level safety and overall traffic efficiency, while also promoting more natural behavioral adaptation.
VLN-Zero: Rapid Exploration and Cache-Enabled Neurosymbolic Vision-Language Planning for Zero-Shot Transfer in Robot Navigation
Rapid adaptation in unseen environments is essential for scalable real-world autonomy, yet existing approaches rely on exhaustive exploration or rigid navigation policies that fail to generalize. We present VLN-Zero, a two-phase vision-language navigation framework that leverages vision-language models to efficiently construct symbolic scene graphs and enable zero-shot neurosymbolic navigation. In the exploration phase, structured prompts guide VLM-based search toward informative and diverse trajectories, yielding compact scene graph representations. In the deployment phase, a neurosymbolic planner reasons over the scene graph and environmental observations to generate executable plans, while a cache-enabled execution module accelerates adaptation by reusing previously computed task-location trajectories. By combining rapid exploration, symbolic reasoning, and cache-enabled execution, the proposed framework overcomes the computational inefficiency and poor generalization of prior vision-language navigation methods, enabling robust and scalable decision-making in unseen environments. VLN-Zero achieves 2x higher success rate compared to state-of-the-art zero-shot models, outperforms most fine-tuned baselines, and reaches goal locations in half the time with 55% fewer VLM calls on average compared to state-of-the-art models across diverse environments. Codebase, datasets, and videos for VLN-Zero are available at: https://vln-zero.github.io/.
comment: Codebase, datasets, and videos for VLN-Zero are available at: https://vln-zero.github.io/
AI Agent Access (A\^3) Network: An Embodied, Communication-Aware Multi-Agent Framework for 6G Coverage
The vision of 6G communication demands autonomous and resilient networking in environments without fixed infrastructure. Yet most multi-agent reinforcement learning (MARL) approaches focus on isolated stages - exploration, relay formation, or access - under static deployments and centralized control, limiting adaptability. We propose the AI Agent Access (A\^3) Network, a unified, embodied intelligence-driven framework that transforms multi-agent networking into a dynamic, decentralized, and end-to-end system. Unlike prior schemes, the A\^3 Network integrates exploration, target user access, and backhaul maintenance within a single learning process, while supporting on-demand agent addition during runtime. Its decentralized policies ensure that even a single agent can operate independently with limited observations, while coordinated agents achieve scalable, communication-optimized coverage. By embedding link-level communication metrics into actor-critic learning, the A\^3 Network couples topology formation with robust decision-making. Numerical simulations demonstrate that the A\^3 Network not only balances exploration and communication efficiency but also delivers system-level adaptability absent in existing MARL frameworks, offering a new paradigm for 6G multi-agent networks.
Refined Barrier Conditions for Finite-Time Safety and Reach-Avoid Guarantees in Stochastic Systems
Providing finite-time probabilistic safety and reach-avoid guarantees is crucial for safety-critical stochastic systems. Existing barrier certificate methods often rely on a restrictive boundedness assumption for auxiliary functions, limiting their applicability. This paper presents refined barrier-like conditions that remove this assumption. Specifically, we establish conditions for deriving upper bounds on finite-time safety probabilities in discrete-time systems and lower bounds on finite-time reach-avoid probabilities in continuous-time systems. This key relaxation significantly expands the class of verifiable systems, especially those with unbounded state spaces, and facilitates the application of advanced optimization techniques, such as semi-definite programming with polynomial functions. The efficacy of our approach is validated through numerical examples.
Minimalistic Autonomous Stack for High-Speed Time-Trial Racing
Autonomous racing has seen significant advancements, driven by competitions such as the Indy Autonomous Challenge (IAC) and the Abu Dhabi Autonomous Racing League (A2RL). However, developing an autonomous racing stack for a full-scale car is often constrained by limited access to dedicated test tracks, restricting opportunities for real-world validation. While previous work typically requires extended development cycles and significant track time, this paper introduces a minimalistic autonomous racing stack for high-speed time-trial racing that emphasizes rapid deployment and efficient system integration with minimal on-track testing. The proposed stack was validated on real speedways, achieving a top speed of 206 km/h within just 11 hours' practice run on the track with 325 km in total. Additionally, we present the system performance analysis, including tracking accuracy, vehicle dynamics, and safety considerations, offering insights for teams seeking to rapidly develop and deploy an autonomous racing stack with limited track access.
comment: The data associated with this paper is available at https://doi.org/10.5281/zenodo.17187680
Federated Aggregation of Demand Flexibility
This paper proposes a federated framework for demand flexibility aggregation to support grid operations. Unlike existing geometric methods that rely on a static, pre-defined base set as the geometric template for aggregation, our framework establishes a true federated process by enabling the collaborative optimization of this base set without requiring the participants sharing sensitive data with the aggregator. Specifically, we first formulate the base set optimization problem as a bilevel program. Using optimal solution functions, we then reformulate the bilevel program into a single-level, unconstrained learning task. By exploiting the decomposable structure of the overall gradient, we further design a decentralized gradient-based algorithm to solve this learning task. The entire framework, encompassing base set optimization, aggregation, and disaggregation, operates by design without exchanging raw user data. Numerical results demonstrate that our proposed framework unlocks substantially more flexibility than the approaches with static base sets, thus providing a promising framework for efficient and privacy-enhanced approaches to coordinate demand flexibility at scale.
comment: Longer version of a paper to be submitted to IEEE Transactions on Smart Grid
Modular Machine Learning with Applications to Genetic Circuit Composition
In several applications, including in synthetic biology, one often has input/output data on a system composed of many modules, and although the modules' input/output functions and signals may be unknown, knowledge of the composition architecture can significantly reduce the amount of training data required to learn the system's input/output mapping. Learning the modules' input/output functions is also necessary for designing new systems from different composition architectures. Here, we propose a modular learning framework, which incorporates prior knowledge of the system's compositional structure to (a) identify the composing modules' input/output functions from the system's input/output data and (b) achieve this by using a reduced amount of data compared to what would be required without knowledge of the compositional structure. To achieve this, we introduce the notion of modular identifiability, which allows recovery of modules' input/output functions from a subset of the system's input/output data, and provide theoretical guarantees on a class of systems motivated by genetic circuits. We demonstrate the theory on computational studies showing that a neural network (NNET) that accounts for the compositional structure can learn the composing modules' input/output functions and predict the system's output on inputs outside of the training set distribution. By contrast, a neural network that is agnostic of the structure is unable to predict on inputs that fall outside of the training set distribution. By reducing the need for experimental data and allowing module identification, this framework offers the potential to ease the design of synthetic biological circuits and of multi-module systems more generally.
From Space to Time: Enabling Adaptive Safety with Learned Value Functions via Disturbance Recasting
The widespread deployment of autonomous systems in safety-critical environments such as urban air mobility hinges on ensuring reliable, performant, and safe operation under varying environmental conditions. One such approach, value function-based safety filters, minimally modifies a nominal controller to ensure safety. Recent advances leverage offline learned value functions to scale these safety filters to high-dimensional systems. However, these methods assume detailed priors on all possible sources of model mismatch, in the form of disturbances in the environment -- information that is rarely available in real world settings. Even in well-mapped environments like urban canyons or industrial sites, drones encounter complex, spatially-varying disturbances arising from payload-drone interaction, turbulent airflow, and other environmental factors. We introduce SPACE2TIME, which enables safe and adaptive deployment of offline-learned safety filters under unknown, spatially-varying disturbances. The key idea is to reparameterize spatial variations in disturbance as temporal variations, enabling the use of precomputed value functions during online operation. We validate SPACE2TIME on a quadcopter through extensive simulations and hardware experiments, demonstrating significant improvement over baselines.
comment: The first three authors contributed equally. This work has been accepted for publication at the Conference on Robot Learning
Metriplectic Conditional Flow Matching for Dissipative Dynamics
Metriplectic conditional flow matching (MCFM) learns dissipative dynamics without violating first principles. Neural surrogates often inject energy and destabilize long-horizon rollouts; MCFM instead builds the conservative-dissipative split into both the vector field and a structure preserving sampler. MCFM trains via conditional flow matching on short transitions, avoiding long rollout adjoints. In inference, a Strang-prox scheme alternates a symplectic update with a proximal metric step, ensuring discrete energy decay; an optional projection enforces strict decay when a trusted energy is available. We provide continuous and discrete time guarantees linking this parameterization and sampler to conservation, monotonic dissipation, and stable rollouts. On a controlled mechanical benchmark, MCFM yields phase portraits closer to ground truth and markedly fewer energy-increase and positive energy rate events than an equally expressive unconstrained neural flow, while matching terminal distributional fit.
Four-Transistor Bipolar Series-Parallel Module Structure for Cascaded Bridge and Modular Multilevel Circuits
With their great scalability and flexibility, cascaded-bridge and modular multilevel converters have enabled a variety of energy applications, such as offshore wind power, high-voltage dc power transmission, power-quality management, and cutting-edge medical instrumentation. The incorporation of parallel connectivity between modules equips systems with advantages such as sensorless balancing, switched-capacitor energy exchange, and reduced impedance. However, existing topologies require many individual switches -- eight transistors per module. Efforts to use fewer switches, instead, have previously compromised their functionality. We propose a new module topology, named the direction-selective parallel (DiSeP) structure, which requires only four transistors per module -- the same as an H bridge -- but can achieve bidirectional equilibration, bipolar module output, and inter-module switched-capacitor features. This topology is highly attractive for existing converters with cascaded bridge elements, as the addition of only four diodes enables key features such as sensorless balancing and inter-module energy exchange. Thus, the module can outcompete H bridges in their applications, as it adds parallel modes without any additional transistors. Compared to double-H bridges (CH2B), it saves as many as half of the transistors. We elaborate on its working principles and key design considerations. We validate our theories on an experimental prototype with six modules. This prototype attains a total voltage harmonic distortion plus noise (THD+N) of 10.3% and a peak efficiency of 96.3%. Furthermore, the modules achieve autonomous sensorless balancing under open-loop control.
linrax: A JAX Compatible, Simplex Method Linear Program Solver
We present linrax, the first simplex based linear program (LP) solver compatible with the JAX ecosystem. In many control algorithms, LPs are often automatically generated and frequently solved either offline or online in the control loop. This motivates the design of linrax, which is especially suited for compilation into a complex JAX-based pipeline as a subroutine. We discuss the challenges associated with implementing a general purpose LP solver under strict design requirements from JAX. Notably, we can solve general problems which may include dependent constraints-something not possible with existing JAX-compatible LP solvers that use first-order techniques and may fail to converge. We demonstrate the utility of linrax through several examples, including a robust control synthesis pipeline for a nonlinear vehicle model using automatic differentiation through a LP-based reachable set framework.
comment: 8 pages, 3 figures
Robust Near-Optimal Nonlinear Target Enclosing Guidance
This paper proposes a nonlinear optimal guidance law that enables a pursuer to enclose a target within arbitrary geometric patterns, which extends beyond conventional circular encirclement. The design operates using only relative state measurements and formulates a target enclosing guidance law in which the vehicle's lateral acceleration serves as the steering control, making it well-suited for aerial vehicles with turning constraints. Our approach generalizes and extends existing guidance strategies that are limited to target encirclement and provides a degree of optimality. At the same time, the exact information of the target's maneuver is unnecessary during the design. The guidance law is developed within the framework of a state-dependent Riccati equation (SDRE), thereby providing a systematic way to handle nonlinear dynamics through a pseudo-linear representation to design locally optimal feedback guidance commands through state-dependent weighting matrices. While SDRE ensures near-optimal performance in the absence of strong disturbances, we further augment the design to incorporate an integral sliding mode manifold to compensate when disturbances push the system away from the nominal trajectory, and demonstrate that the design provides flexibility in the sense that the (possibly time-varying) stand-off curvature could also be treated as unknown. Simulations demonstrate the efficacy of the proposed approach.
Crater Observing Bio-inspired Rolling Articulator (COBRA)
NASA aims to establish a sustainable human basecamp on the Moon as a stepping stone for future missions to Mars and beyond. The discovery of water ice on the Moon's craters located in permanently shadowed regions, which can provide drinking water, oxygen, and rocket fuel, is therefore of critical importance. However, current methods to access lunar ice deposits are limited. While rovers have been used to explore the lunar surface for decades, they face significant challenges in navigating harsh terrains, such as permanently shadowed craters, due to the high risk of immobilization. This report introduces COBRA (Crater Observing Bio-inspired Rolling Articulator), a multi-modal snake-style robot designed to overcome mobility challenges in Shackleton Crater's rugged environment. COBRA combines slithering and tumbling locomotion to adapt to various crater terrains. In snake mode, it uses sidewinding to traverse flat or low inclined surfaces, while in tumbling mode, it forms a circular barrel by linking its head and tail, enabling rapid movement with minimal energy on steep slopes. Equipped with an onboard computer, stereo camera, inertial measurement unit, and joint encoders, COBRA facilitates real-time data collection and autonomous operation. This paper highlights COBRAs robustness and efficiency in navigating extreme terrains through both simulations and experimental validation.
Automatic and Scalable Safety Verification using Interval Reachability with Subspace Sampling
Interval refinement is a technique for reducing the conservatism of traditional interval based reachability methods by lifting the system to a higher dimension using new auxiliary variables and exploiting the introduced structure through a refinement procedure. We present a novel, efficiently scaling, automatic refinement strategy based on a subspace sampling argument and motivated by reducing the number of interval operations through sparsity. Unlike previous methods, we guarantee that refined bounds shrink as additional auxiliary variables are added. This additionally encourages automation of the lifting phase by allowing larger groups of auxiliary variables to be considered. We implement our strategy in JAX, a high-performance computational toolkit for Python and demonstrate its efficacy on several examples, including regulating a multi-agent platoon to the origin while avoiding an obstacle.
comment: 6 pages, 3 figures. Updated to correct a small error in the dynamics presented in equation (12)
The First Open-Source Framework for Learning Stability Certificate from Data
Before 2025, no open-source system existed that could learn Lyapunov stability certificates directly from noisy, real-world flight data. No tool could answer the critical question: is this controller still stabilizable-especially when its closed-loop system is a total black box. We broke that boundary. This year, we released the first-ever open-source framework that can learn Lyapunov functions from trajectory data under realistic, noise-corrupted conditions. Unlike statistical anomaly detectors, our method does not merely flag deviations-it directly determines whether the system can still be proven stable. Applied to public data from the 2024 SAS severe turbulence incident, our method revealed that, within just 60 seconds of the aircrafts descent becoming abnormal, no Lyapunov function could be constructed to certify system stability. Moreover, this is the first known data-driven stability-theoretic method ever applied to a civil airliner accident. And our approach works with zero access to the controller logic-a breakthrough for commercial aircraft where control laws are proprietary and opaque. The implementation of the proposed framework is open-sourced and available at: https://github.com/HansOersted/stability
Frequency-Varying Optimization: A Control Framework for New Dynamic Frequency Response Services
To address the variability of renewable generation, initiatives have been launched globally to provide faster and more effective frequency responses. In the UK, the National Energy System Operator (NESO) has introduced a suite of three new dynamic services, where aggregation of assets is expected to play a key role. For an Aggregated Response Unit (ARU), the required level of frequency response varies with grid frequency, resulting in a frequency-varying equality constraint that assets should meet collectively. We show that the optimal coordination of an ARU constitutes a Frequency-Varying Optimization (FVO) problem, in which the optimal trajectory for each asset evolves dynamically. To facilitate online optimization, we reformulate the FVO problem into Tracking of the Optimal Trajectory (TOT) problems, with algorithms proposed for two scenarios: one where the asset dynamics are negligible, and another where they must be accounted for. Under reasonable conditions, the ARU converges to the optimal trajectory within a fixed time, and within the maximum delivery time requested by NESO. The proposed framework can be readily distributed to coordinate a large number of assets. Numerical results verify the effectiveness and scalability of the proposed control framework.
comment: in IEEE Transactions on Power Systems, 2025
Addressing Model Inaccuracies in Transmission Network Reconfiguration via Diverse Alternatives
The ongoing energy transition places significant pressure on the transmission network due to increasing shares of renewables and electrification. To mitigate grid congestion, transmission system operators need decision support tools to suggest remedial actions, such as transmission network reconfigurations or redispatch. However, these tools are prone to model inaccuracies and may not provide relevant suggestions with regard to important unmodeled constraints or operator preferences. We propose a human-in-the-loop modeling-to-generate alternatives (HITL-MGA) approach to address these shortcomings by generating diverse topology reconfiguration alternatives. Case studies on the IEEE 57-bus and IEEE 118-bus systems show the method can leverage expert feedback and improve the quality of the suggested remedial actions.
comment: This preprint is currently under peer review
The Bayesian Separation Principle for Data-driven Control
In this paper we investigate the existence of a separation principle between model identification and control design in the context of model predictive control. First, we clarify that such a separation principle holds asymptotically in the number of data in a Fisherian context, and show that it holds universally, i.e. regardless of the data size, in a Bayesian context. Then, by formulating model predictive control within a Gaussian regression framework, we describe how the Bayesian separation principle can be used to derive computable, uncertainty-aware expressions for the control cost and optimal input sequence, thereby bridging direct and indirect data-driven approaches. Numerical results in both linear and nonlinear scenarios illustrate that the proposed approach outperform nominal methods that neglect uncertainty, highlighting the advantages of incorporating uncertainty in the control design process.
comment: 16 pages, 2 figures
Drift Plus Optimistic Penalty - A Learning Framework for Stochastic Network Optimization with Improved Regret Bounds
We consider the problem of joint routing and scheduling in queueing networks, where the edge transmission costs are unknown. At each time-slot, the network controller receives noisy observations of transmission costs only for those edges it selects for transmission. The network controller's objective is to make routing and scheduling decisions so that the total expected cost is minimized. This problem exhibits an exploration-exploitation trade-off, however, previous bandit-style solutions cannot be directly applied to this problem due to the queueing dynamics. In order to ensure network stability, the network controller needs to optimize throughput and cost simultaneously. We show that the best achievable cost is lower bounded by the solution to a static optimization problem, and develop a network control policy using techniques from Lyapunov drift-plus-penalty optimization and multi-arm bandits. We show that the policy achieves a sub-linear regret of order $O(\sqrt{T}\log T)$, as compared to the best policy that has complete knowledge of arrivals and costs. Finally, we evaluate the proposed policy using simulations and show that its regret is indeed sub-linear.
Occlusion-Aware Consistent Model Predictive Control for Robot Navigation in Occluded Obstacle-Dense Environments
Ensuring safety and motion consistency for robot navigation in occluded, obstacle-dense environments is a critical challenge. In this context, this study presents an occlusion-aware Consistent Model Predictive Control (CMPC) strategy. To account for the occluded obstacles, it incorporates adjustable risk regions that represent their potential future locations. Subsequently, dynamic risk boundary constraints are developed online to ensure safety. The CMPC then constructs multiple locally optimal trajectory branches (each tailored to different risk regions) to strike a balance between safety and performance. A shared consensus segment is generated to ensure smooth transitions between branches without significant velocity fluctuations, further preserving motion consistency. To facilitate high computational efficiency and ensure coordination across local trajectories, we use the alternating direction method of multipliers (ADMM) to decompose the CMPC into manageable sub-problems for parallel solving. The proposed strategy is validated through simulations and real-world experiments on an Ackermann-steering robot platform. The results demonstrate the effectiveness of the proposed CMPC strategy through comparisons with baseline approaches in occluded, obstacle-dense environments.
Stratified Topological Autonomy for Long-Range Coordination (STALC)
In this paper, we present Stratified Topological Autonomy for Long-Range Coordination (STALC), a hierarchical planning approach for coordinated multi-robot maneuvering in real-world environments with significant inter-robot spatial and temporal dependencies. At its core, STALC consists of a multi-robot graph-based planner which combines a topological graph with a novel, computationally efficient mixed-integer programming formulation to generate highly-coupled multi-robot plans in seconds. To enable autonomous planning across different spatial and temporal scales, we construct our graphs so that they capture connectivity between free-space regions and other problem-specific features, such as traversability or risk. We then use receding-horizon planners to achieve local collision avoidance and formation control. To evaluate our approach, we consider a multi-robot reconnaissance scenario where robots must autonomously coordinate to navigate through an environment while minimizing the risk of detection by observers. Through simulation-based experiments, we show that our approach is able to scale to address complex multi-robot planning scenarios. Through hardware experiments, we demonstrate our ability to generate graphs from real-world data and successfully plan across the entire hierarchy to achieve shared objectives.
comment: This work has been submitted to the IEEE for possible publication
Low-pass sampling in Model Predictive Path Integral Control
Model Predictive Path Integral (MPPI) control is a widely used sampling-based approach for real-time control, offering flexibility in handling arbitrary dynamics and cost functions. However, the original MPPI suffers from high-frequency noise in the sampled control trajectories, leading to actuator wear and inefficient exploration. In this work, we introduce Low-Pass Model Predictive Path Integral Control (LP-MPPI), which integrates low-pass filtering into the sampling process to eliminate detrimental high-frequency components and improve the effectiveness of the control trajectories exploration. Unlike prior approaches, LP-MPPI provides direct and interpretable control over the frequency spectrum of sampled trajectories, enhancing sampling efficiency and control smoothness. Through extensive evaluations in Gymnasium environments, simulated quadruped locomotion, and real-world F1TENTH autonomous racing, we demonstrate that LP-MPPI consistently outperforms state-of-the-art MPPI variants, achieving significant performance improvements while reducing control signal chattering.
Universal Dynamics with Globally Controlled Analog Quantum Simulators
Analog quantum simulators with global control fields have emerged as powerful platforms for exploring complex quantum phenomena. Recent breakthroughs, such as the coherent control of thousands of atoms, highlight the growing potential for quantum applications at scale. Despite these advances, a fundamental theoretical question remains unresolved: to what extent can such systems realize universal quantum dynamics under global control? Here we establish a necessary and sufficient condition for universal quantum computation using only global pulse control, proving that a broad class of analog quantum simulators is, in fact, universal. We further extend this framework to fermionic and bosonic systems, including modern platforms such as ultracold atoms in optical superlattices. Crucially, to connect the theoretical possibility with experimental reality, we introduce a new control technique into the experiment - direct quantum optimal control. This method enables the synthesis of complex effective Hamiltonians and allows us to incorporate realistic hardware constraints. To show its practical power, we experimentally engineer three-body interactions outside the blockade regime and demonstrate topological dynamics on a Rydberg atom array. Using the new control framework, we overcome key experimental challenges, including hardware limitations and atom position fluctuations in the non-blockade regime, by identifying smooth, short-duration pulses that achieve high-fidelity dynamics. Experimental measurements reveal dynamical signatures of symmetry-protected-topological edge modes, confirming both the expressivity and feasibility of our approach. Our work opens a new avenue for quantum simulation beyond native hardware Hamiltonians, enabling the engineering of effective multi-body interactions and advancing the frontier of quantum information processing with globally-controlled analog platforms.
comment: 10 pages, 5 figures with Methods. HYH, AMG, and LC contributed equally to this work
Sliding Speed Influences Electrovibration-Induced Finger Friction Dynamics on Touchscreens
Electrovibration technology enables tactile texture rendering on capacitive touchscreens by modulating friction between the finger and the screen through electrostatic attraction forces, generated by applying an alternating voltage signal to the screen. Accurate signal calibration is essential for robust texture rendering but remains challenging due to variations in sliding speed, applied force, and individual skin mechanics, all of which unpredictably affect frictional behavior. Here, we investigate how exploration conditions affect electrovibration-induced finger friction on touchscreens and the role of skin mechanics in this process. Ten participants slid their index fingers across an electrovibration-enabled touchscreen at five sliding speeds ($20\sim100$ mm/s) and applied force levels ($0.2\sim0.6$ N). Contact forces and skin accelerations were measured while amplitude modulated voltage signals spanning the tactile frequency range were applied to the screen. We modeled the finger-touchscreen friction response as a first-order system and the skin mechanics as a mass-spring-damper system. Results showed that sliding speed influenced the friction response's cutoff frequency, along with the estimated finger moving mass and stiffness. For every $1$ mm/s increase in speed, the cutoff frequency, the finger moving mass, and stiffness increased by $13.8$ Hz, $3.23\times 10^{-5}$ kg, and $4.04$ N/m, respectively. Correlation analysis revealed that finger stiffness had a greater impact on the cutoff frequency than moving mass. Notably, we observed a substantial inter-participant variability in both finger-display interaction and skin mechanics parameters. Finally, we developed a speed-dependent friction model to support consistent and perceptually stable electrovibration-based haptic feedback across varying user conditions.
comment: 26 pages, 14 figures, Tribology journal
Optimization-Based Ramping Reserve Allocation of BESS for AGC Enhancement
The transient behavior of Automatic Generation Control (AGC) systems is a critical aspect of power system operation. Therefore, fully extracting the potential of Battery Energy Storage Systems (BESSs) for AGC enhancement is of paramount importance. In light of the challenges posed by diverse resource interconnections and the variability associated, we propose an online optimization scheme that can adapt to changes in an unknown and variable environment. To leverage the synergy between BESSs and Conventional Generators (CGs), we devise a variant of the Area Injection Error (AIE) as a measure to quantify the ramping needs. Based on this measure, we develop a distributed optimization algorithm with adaptive learning rates for the allocation of the ramping reserve. The algorithm restores a larger learning rate for compliance with the ramping needs upon detecting a potentially destabilizing event. We demonstrate the effectiveness and scalability of the proposed scheme through comprehensive case studies. It is shown that the proposed scheme can improve the transient behavior of the AGC system by bridging the gap in ramping capability.
DANSE: Data-driven Non-linear State Estimation of Model-free Process in Unsupervised Learning Setup
We address the tasks of Bayesian state estimation and forecasting for a model-free process in an unsupervised learning setup. For a model-free process, we do not have any a-priori knowledge of the process dynamics. In the article, we propose DANSE -- a Data-driven Nonlinear State Estimation method. DANSE provides a closed-form posterior of the state of the model-free process, given linear measurements of the state. In addition, it provides a closed-form posterior for forecasting. A data-driven recurrent neural network (RNN) is used in DANSE to provide the parameters of a prior of the state. The prior depends on the past measurements as input, and then we find the closed-form posterior of the state using the current measurement as input. The data-driven RNN captures the underlying non-linear dynamics of the model-free process. The training of DANSE, mainly learning the parameters of the RNN, is executed using an unsupervised learning approach. In unsupervised learning, we have access to a training dataset comprising only a set of measurement data trajectories, but we do not have any access to the state trajectories. Therefore, DANSE does not have access to state information in the training data and can not use supervised learning. Using simulated linear and non-linear process models (Lorenz attractor and Chen attractor), we evaluate the unsupervised learning-based DANSE. We show that the proposed DANSE, without knowledge of the process model and without supervised learning, provides a competitive performance against model-driven methods, such as the Kalman filter (KF), extended KF (EKF), unscented KF (UKF), a data-driven deep Markov model (DMM) and a recently proposed hybrid method called KalmanNet. In addition, we show that DANSE works for high-dimensional state estimation.
comment: 12 pages, Accepted for publication in IEEE Transactions in Signal Processing. Added a fix in the latest ArXiV version for Fig. 4 (and related figs) due to a slight misalignment in MSE calculations. Relative performance trends are unchanged and consistent
Recent advances in data-driven methods for degradation modelling across applications
Understanding degradation is crucial for ensuring the longevity and performance of materials, systems, and organisms. To illustrate the similarities across applications, this article provides a review of data-based method in materials science, engineering, and medicine. The methods analyzed in this paper include regression analysis, factor analysis, cluster analysis, Markov Chain Monte Carlo, Bayesian statistics, hidden Markov models, nonparametric Bayesian modeling of time series, supervised learning, and deep learning. The review provides an overview of degradation models, referencing books and methods, and includes detailed tables highlighting the applications and insights offered in medicine, power engineering, and material science. It also discusses the classification of methods, emphasizing statistical inference, dynamic prediction, machine learning, and hybrid modeling techniques. Overall, this review enhances understanding of degradation modelling across diverse domains.
A generalized global Hartman-Grobman theorem for asymptotically stable semiflows
Recently, Kvalheim and Sontag provided a generalized global Hartman-Grobman theorem for equilibria under asymptotically stable continuous vector fields. By leveraging topological properties of Lyapunov functions, their theorem works without assuming hyperbolicity. We extend their theorem to a class of possibly discontinuous vector fields, in particular, to vector fields generating asymptotically stable semiflows.
comment: Technical note related to the update of 10.48550/arXiv.2411.03277. Newest version: 6 pages, 5 figures, the only change is the presentation. Under review
Online Regularized Statistical Learning in Reproducing Kernel Hilbert Space With Non-Stationary Data
We study the convergence of recursive regularized learning algorithms in the reproducing kernel Hilbert space (RKHS) with dependent and non-stationary online data streams. Firstly, we introduce the concept of random Tikhonov regularization path and decompose the tracking error of the algorithm's output for the regularization path into random difference equations in RKHS, whose non-homogeneous terms are martingale difference sequences. Investigating the mean square asymptotic stability of the equations, we show that if the regularization path is slowly time-varying, then the algorithm's output achieves mean square consistency with the regularization path. Leveraging operator theory, particularly the monotonicity of the inverses of operators and the spectral decomposition of compact operators, we introduce the RKHS persistence of excitation condition (i.e. there exists a fixed-length time period, such that the conditional expectation of the operators induced by the input data accumulated over every period has a uniformly strictly positive compact lower bound) and develop a dominated convergence method to prove the mean square consistency between the algorithm's output and an unknown function. Finally, for independent and non-identically distributed data streams, the algorithm achieves the mean square consistency if the input data's marginal probability measures are slowly time-varying and the average measure over each fixed-length time period has a uniformly strictly positive lower bound.
LogicGuard: Improving Embodied LLM agents through Temporal Logic based Critics
Large language models (LLMs) have shown promise in zero-shot and single step reasoning and decision making problems, but in long horizon sequential planning tasks, their errors compound, often leading to unreliable or inefficient behavior. We introduce LogicGuard, a modular actor-critic architecture in which an LLM actor is guided by a trajectory level LLM critic that communicates through Linear Temporal Logic (LTL). Our setup combines the reasoning strengths of language models with the guarantees of formal logic. The actor selects high-level actions from natural language observations, while the critic analyzes full trajectories and proposes new LTL constraints that shield the actor from future unsafe or inefficient behavior. LogicGuard supports both fixed safety rules and adaptive, learned constraints, and is model-agnostic: any LLM-based planner can serve as the actor, with LogicGuard acting as a logic-generating wrapper. We formalize planning as graph traversal under symbolic constraints, allowing LogicGuard to analyze failed or suboptimal trajectories and generate new temporal logic rules that improve future behavior. To demonstrate generality, we evaluate LogicGuard across two distinct settings: short-horizon general tasks and long-horizon specialist tasks. On the Behavior benchmark of 100 household tasks, LogicGuard increases task completion rates by 25% over a baseline InnerMonologue planner. On the Minecraft diamond-mining task, which is long-horizon and requires multiple interdependent subgoals, LogicGuard improves both efficiency and safety compared to SayCan and InnerMonologue. These results show that enabling LLMs to supervise each other through temporal logic yields more reliable, efficient and safe decision-making for both embodied agents.
comment: Modified version of prior LTLCrit work with new robotics dataset
Spectrum Opportunities for the Wireless Future: From Direct-to-Device Satellite Applications to 6G Cellular
For next-generation wireless networks, both the upper mid-band and terahertz spectra are gaining global attention. This article provides an in-depth analysis of recent regulatory rulings and spectrum preferences issued by international standard bodies. We highlight promising frequency bands earmarked for 6G and beyond, and offer examples that illuminate the passive service protections and spectrum feasibility for coexistence between terrestrial and non-terrestrial wireless networks.
Energy Management for Renewable-Colocated Artificial Intelligence Data Centers
We develop an energy management system (EMS) for artificial intelligence (AI) data centers with colocated renewable generation. Under a cost-minimizing framework, the EMS of renewable-colocated data center (RCDC) co-optimizes AI workload scheduling, on-site renewable utilization, and electricity market participation. Within both wholesale and retail market participation models, the economic benefit of the RCDC operation is maximized. Empirical evaluations using real-world traces of electricity prices, data center power consumption, and renewable generation demonstrate significant electricity cost reduction from renewable and AI data center colocations.
Uncertainty-aware Latent Safety Filters for Avoiding Out-of-Distribution Failures
Recent advances in generative world models have enabled classical safe control methods, such as Hamilton-Jacobi (HJ) reachability, to generalize to complex robotic systems operating directly from high-dimensional sensor observations. However, obtaining comprehensive coverage of all safety-critical scenarios during world model training is extremely challenging. As a result, latent safety filters built on top of these models may miss novel hazards and even fail to prevent known ones, overconfidently misclassifying risky out-of-distribution (OOD) situations as safe. To address this, we introduce an uncertainty-aware latent safety filter that proactively steers robots away from both known and unseen failures. Our key idea is to use the world model's epistemic uncertainty as a proxy for identifying unseen potential hazards. We propose a principled method to detect OOD world model predictions by calibrating an uncertainty threshold via conformal prediction. By performing reachability analysis in an augmented state space-spanning both the latent representation and the epistemic uncertainty-we synthesize a latent safety filter that can reliably safeguard arbitrary policies from both known and unseen safety hazards. In simulation and hardware experiments on vision-based control tasks with a Franka manipulator, we show that our uncertainty-aware safety filter preemptively detects potential unsafe scenarios and reliably proposes safe, in-distribution actions. Video results can be found on the project website at https://cmu-intentlab.github.io/UNISafe
comment: Conference on Robot Learning (CoRL 2025)
Drift Plus Optimistic Penalty -- A Learning Framework for Stochastic Network Optimization with Improved Regret Bounds
We consider the problem of joint routing and scheduling in queueing networks, where the edge transmission costs are unknown. At each time-slot, the network controller receives noisy observations of transmission costs only for those edges it selects for transmission. The network controller's objective is to make routing and scheduling decisions so that the total expected cost is minimized. This problem exhibits an exploration-exploitation trade-off, however, previous bandit-style solutions cannot be directly applied to this problem due to the queueing dynamics. In order to ensure network stability, the network controller needs to optimize throughput and cost simultaneously. We show that the best achievable cost is lower bounded by the solution to a static optimization problem, and develop a network control policy using techniques from Lyapunov drift-plus-penalty optimization and multi-arm bandits. We show that the policy achieves a sub-linear regret of order $O(\sqrt{T}\log T)$, as compared to the best policy that has complete knowledge of arrivals and costs. Finally, we evaluate the proposed policy using simulations and show that its regret is indeed sub-linear.
Multiagent Systems
Proactive-reactive detection and mitigation of intermittent faults in robot swarms
Intermittent faults are transient errors that sporadically appear and disappear. Although intermittent faults pose substantial challenges to reliability and coordination, existing studies of fault tolerance in robot swarms focus instead on permanent faults. One reason for this is that intermittent faults are prohibitively difficult to detect in the fully self-organized ad-hoc networks typical of robot swarms, as their network topologies are transient and often unpredictable. However, in the recently introduced self-organizing nervous systems (SoNS) approach, robot swarms are able to self-organize persistent network structures for the first time, easing the problem of detecting intermittent faults. To address intermittent faults in robot swarms that have persistent networks, we propose a novel proactive-reactive strategy to detection and mitigation, based on self-organized backup layers and distributed consensus in a multiplex network. Proactively, the robots self-organize dynamic backup paths before faults occur, adapting to changes in the primary network topology and the robots' relative positions. Reactively, robots use one-shot likelihood ratio tests to compare information received along different paths in the multiplex network, enabling early fault detection. Upon detection, communication is temporarily rerouted in a self-organized way, until the detected fault resolves. We validate the approach in representative scenarios of faulty positional data occurring during formation control, demonstrating that intermittent faults are prevented from disrupting convergence to desired formations, with high fault detection accuracy and low rates of false positives.
Application Management in C-ITS: Orchestrating Demand-Driven Deployments and Reconfigurations SC 2025
Vehicles are becoming increasingly automated and interconnected, enabling the formation of cooperative intelligent transport systems (C-ITS) and the use of offboard services. As a result, cloud-native techniques, such as microservices and container orchestration, play an increasingly important role in their operation. However, orchestrating applications in a large-scale C-ITS poses unique challenges due to the dynamic nature of the environment and the need for efficient resource utilization. In this paper, we present a demand-driven application management approach that leverages cloud-native techniques - specifically Kubernetes - to address these challenges. Taking into account the demands originating from different entities within the C-ITS, the approach enables the automation of processes, such as deployment, reconfiguration, update, upgrade, and scaling of microservices. Executing these processes on demand can, for example, reduce computing resource consumption and network traffic. A demand may include a request for provisioning an external supporting service, such as a collective environment model. The approach handles changing and new demands by dynamically reconciling them through our proposed application management framework built on Kubernetes and the Robot Operating System (ROS 2). We demonstrate the operation of our framework in the C-ITS use case of collective environment perception and make the source code of the prototypical framework publicly available at https://github.com/ika-rwth-aachen/application_manager .
comment: 7 pages, 2 figures, 2 tables; Accepted to be published as part of the 2025 IEEE International Conference on Intelligent Transportation Systems (ITSC 2025), Gold Coast, Australia, November 18-21, 2025
Knowledge Base-Aware Orchestration: A Dynamic, Privacy-Preserving Method for Multi-Agent Systems
Multi-agent systems (MAS) are increasingly tasked with solving complex, knowledge-intensive problems where effective agent orchestration is critical. Conventional orchestration methods rely on static agent descriptions, which often become outdated or incomplete. This limitation leads to inefficient task routing, particularly in dynamic environments where agent capabilities continuously evolve. We introduce Knowledge Base-Aware (KBA) Orchestration, a novel approach that augments static descriptions with dynamic, privacy-preserving relevance signals derived from each agent's internal knowledge base (KB). In the proposed framework, when static descriptions are insufficient for a clear routing decision, the orchestrator prompts the subagents in parallel. Each agent then assesses the task's relevance against its private KB, returning a lightweight ACK signal without exposing the underlying data. These collected signals populate a shared semantic cache, providing dynamic indicators of agent suitability for future queries. By combining this novel mechanism with static descriptions, our method achieves more accurate and adaptive task routing preserving agent autonomy and data confidentiality. Benchmarks show that our KBA Orchestration significantly outperforms static description-driven methods in routing precision and overall system efficiency, making it suitable for large-scale systems that require higher accuracy than standard description-driven routing.
The Heterogeneous Multi-Agent Challenge ECAI 2025
Multi-Agent Reinforcement Learning (MARL) is a growing research area which gained significant traction in recent years, extending Deep RL applications to a much wider range of problems. A particularly challenging class of problems in this domain is Heterogeneous Multi-Agent Reinforcement Learning (HeMARL), where agents with different sensors, resources, or capabilities must cooperate based on local information. The large number of real-world situations involving heterogeneous agents makes it an attractive research area, yet underexplored, as most MARL research focuses on homogeneous agents (e.g., a swarm of identical robots). In MARL and single-agent RL, standardized environments such as ALE and SMAC have allowed to establish recognized benchmarks to measure progress. However, there is a clear lack of such standardized testbed for cooperative HeMARL. As a result, new research in this field often uses simple environments, where most algorithms perform near optimally, or uses weakly heterogeneous MARL environments.
comment: 7 pages. To Appear at ECAI 2025
Exploring Model Kinship for Merging Large Language Models EMNLP 2025
Model merging has emerged as a key technique for enhancing the capabilities and efficiency of Large Language Models (LLMs). The open-source community has driven model evolution by iteratively merging existing models, yet a principled understanding of the gains and underlying factors in model merging remains limited. In this work, we study model evolution through iterative merging, drawing an analogy to biological evolution, and introduce the concept of model kinship, the degree of similarity or relatedness between LLMs. Through comprehensive empirical analysis, we show that model kinship is closely linked to the performance improvements achieved by merging, providing a useful criterion for selecting candidate models. Building on this insight, we propose a new model merging strategy: Top-k Greedy Merging with Model Kinship, which can improve benchmark performance. Specifically, we discover that incorporating model kinship as a guiding criterion enables continuous merging while mitigating performance degradation caused by local optima, thereby facilitating more effective model evolution. Code is available at https://github.com/zjunlp/ModelKinship.
comment: EMNLP 2025 Findings
OpenLens AI: Fully Autonomous Research Agent for Health Infomatics
Health informatics research is characterized by diverse data modalities, rapid knowledge expansion, and the need to integrate insights across biomedical science, data analytics, and clinical practice. These characteristics make it particularly well-suited for agent-based approaches that can automate knowledge exploration, manage complex workflows, and generate clinically meaningful outputs. Recent progress in large language model (LLM)-based agents has demonstrated promising capabilities in literature synthesis, data analysis, and even end-to-end research execution. However, existing systems remain limited for health informatics because they lack mechanisms to interpret medical visualizations and often overlook domain-specific quality requirements. To address these gaps, we introduce OpenLens AI, a fully automated framework tailored to health informatics. OpenLens AI integrates specialized agents for literature review, data analysis, code generation, and manuscript preparation, enhanced by vision-language feedback for medical visualization and quality control for reproducibility. The framework automates the entire research pipeline, producing publication-ready LaTeX manuscripts with transparent and traceable workflows, thereby offering a domain-adapted solution for advancing health informatics research.
Formal Methods for An Iterated Volunteer's Dilemma
Game theory provides a framework for studying communication dynamics and emergent phenomena arising from rational agent interactions. We present a model framework for the Volunteer's Dilemma with four key contributions: (1) formulating it as a stochastic concurrent nn n-player game, (2) developing properties to verify model correctness and reachability, (3) constructing strategy synthesis graphs to identify optimal game trajectories, and (4) analyzing parameter correlations with expected local and global rewards over finite time horizons.
comment: 9 pages, 4 figures, 5 tables